Nagios vs. Zabbix – Whats the difference

There’s one big difference between Zabbix and Nagios which is a absolute game changer for me…

I’m into monitoring for over 12 years now and I designed and built installations of all sizes for a lot of different companies. Most of these monitoring systems were based on Nagios, Icinga or self developed software. Nagios was my absolute favorite and for a long time and I absolutely focussed on solutions based on Nagios. But in the last year one of my customers insisted on building a solution with Zabbix, which I hardly knew at this point and had never before used in a real world scenario. So I had to get the documentation and learn the principles behind this piece of software. At first I was underwhelmed of the user interface and it’s limited repertoire of checks. Also many parts of the documentation are not as detailed as they should be. But I had no choice and therefore I had to deal with Zabbix.

Most comparisations of Nagios and Zabbix describe the differences in setup and configuration in detail and how to get the same results you are used to get from Nagios. But the big difference is not within the UI or how a check is configured, it’s the fundamental principle of how the decision for triggering an alert is made.

In Nagios a check plugin contains everything you need for monitoring a single aspect of a service. I gathers a piece of information and decides about the operational status of this service based on given thresholds. Nagios receives a numeric value for the service status and a set of optional performance data for statistical reasons or rendering as a graph. This enables the administrator to extend Nagios with a vast number of independent plugins to monitor every kind of application in any programming language you like.

Zabbix by the other hand has segregated the different steps of data gathering, information processing and alerting into different stages within the Zabbix core. One connects to a service via a so called item to get a piece of information and store it into the database. Zabbix brings a lot of built in checks to conveniently connect to standard services or get operational parameters of a running server. Whenever you have to extend this standard set you realise that this is an exceptional case. As soon as the data from an item runs into the Zabbix database you can write a trigger to match against a threshold. Whenever this condition is met the trigger is fired ad the associated action is executed. This can be the sending of an alert or execution of an arbitrary script.

The advantage in Zabbix‘ approach to decide whether a service is ok or not is that it is independent from a single item and a single value. It can use the full range of past events and also of every other item instead of comparing the last gathered value with a fixed threshold. So it’s possible to compare the current value to the one from last week or even last year. You can combine independently determined values or predict how long a resource will last based on historical data. This is so much more powerful than the restricted context of an independent all-in-one script.

For example I use this ability to calculate the load over all working nodes of a cluster and to escalate over different severities and therefore different notification paths dependent on which fraction of a cluster is affected. In my monitoring every hard disk partition has the same two trigger levels: Send an e-mail 18 hours and a text message 4 hours before the partition will get full. 5 % left on /boot? No problem because this partition didn’t grow over the last year. 50 % left on /data? Seems to be critical, because one hour ago it was 75 %.

Another asset is the auto discovery which is able to automatically create new configuration items based on newly detected resources. This enables me to add my hard disk checks to every new partition without one single manual step or to create items and triggers for any new service. So I’ve got a self extending monitoring which needs a lot less attention then any other monitoring system I built before.

I still love Nagios for it’s reliability and flexibility but in a lot of scenarios I now prefer the abilities of Zabbix.

When do you need a private docker registry?

This posting is meant to clarify what a docker registry is and in which case it is advantageous to run your own private registry for your container-based workflow.

A docker registry – what was that again?

A registry is a vital building block in the docker ecosystem. The registry acts as a central content delivery and storage service for Docker images. Every docker registry follows the same structure for namespaces github once made popular.

{Service-URL}/{project_namespace}/{repository_name}:{docker-tag}

That makes sense because every docker image is based on a file named Dockerfile which is essentially just sourcecode managed in a version controlled repository.

The best known registry is the public docker hub. For your private registry a lightweight yet scalable reference implementation exits. It provides a standardized REST API to integrate the registry in your private docker environment.

Why do I need my own private registry?

In an environment which depends on containerized applications, every packaging and deployment workflow consists of building and shipping docker images from development to production. Given the fact that docker enables very short development cycles, you can imagine that a registry does not only contain very volatile data, but also receives lots of inbound traffic while simultaneously serving multiple clients with outbound image transfers. Availability, access control and performance are the three main benefits a private registry provides.

Are there any prior requirements?

Yes, there is at least one big requirement: You need to have a well defined continuous integration workflow. Without that, your registry will turn into a messy heap of low quality or even hazardous container images. Chances are good that these images can compromise your systems leading to a maintenance nightmare.

This shouldn’t be a surprise, because the same is true with every other software project which has not been through quality control. The difference is that containers are such a powerful abstraction compared to conventional software deployment artifacts, that much more harm can be done than before. Just think about a basic three-tier web application. Maybe you install them using docker-compose up and it starts the database, the middleware and the frontend(s) all at once including inter-container networking and data volume management. This is awesome! But it hides a lot of the inherent complexity of this stack which makes it harder to „peel the onion“ of layers and components when an error occurs.

Let me give you a common workflow example:
In a continuous integration (CI) workflow, a commit on your version control system may trigger a docker build on your CI server which pushes a new docker image to your registry which in turn gets consumed by one or more client nodes where the new container gets executed. This may happen in only a few seconds since the Dockerfile had been committed to version control.

Of course everybody could build docker images on their own computers before deploying them or even build the image directly on the target system. But when it comes to CI and automated workflows, you most certainly want a trusted place where images are built following your company’s policies. Now you can enforce automated quality control and integrity checks.

Some examples are:

  • Checking the size of the container
  • Execute automated security checks
  • Do complex regression tests
  • Quality assurance
  • Keeping track of build histories and versions

All this should be done before pushing the image on your registry server.

More than just a data store

Besides the functional requirements, your registry also has to meet your company’s requirements for availability, security and trust. Examples are:

  • Service Authentication and Authorization
  • Fine grained access control for groups and/or individuals
  • Proper transport security (TLS/SSL)
  • Logging
  • Service monitoring for availability, performance and integrity

A public registry such as the abovementioned docker hub provides a foundation of publicly available open-source images. However, for your company’s private images, a private registry is the way to go. The reason why should be obvious by now.

Various Solutions

The easiest but maybe not the best solution is to run the reference implementation of the docker registry and integrate it with your CI processes. Afterwards you can tick the „done“-checkbox on your todo list.

If your needs exceed this basic approach, you should have a look at the various open source and commercial alternatives for image hosting. All the varieties of modern computing are possible like PaaS, on-premise and everything in between.

On awesome-docker, a curated list of resources for all things related to docker, you can find various options for hosting your private docker registry.

If you feel the need for support to get started, don’t hesitate to contact us at info@sitesitter.de