Go to file
Adrien Mahieux 0b7ba99b5e [collector/proc.plugin] Add /proc/pagetypeinfo parser (#6843)
* [proc.plugin/proc_pagetypeinfo] Initial commit

* [Fix] Generate graphs for pagetypeinfo

* [Fix] Create node/zone/type graphs

* [Fix] Use directly size and order

* [Add] Configuration handling

* [Imp] Changed SetId to identify NodeNumber

* [Fix] Standard name for chart priority and value

* [Fix] use dynamic pagesize

* [Enh] allow prefix for containerized netdata

* [Fix] global system graph always on, but for explicit no

* [Fix] Add more checks for pageorders_cnt and really use it

* [Enh] Special config value of netdata_zero_metrics_enabled

* [Fix] Check we parsed at least a valid line
2019-10-21 10:51:46 +03:00
.github netdata: Add knatsakis as codeowner, wherever paulkatsoulakis was (#7036) 2019-10-09 15:20:10 +03:00
.travis netdata/ci: nits and fixes around package release workflow (#6914) 2019-09-23 19:23:48 +02:00
backends Don't write an HTTP response 204 to logs (#7035) 2019-10-16 12:45:12 +03:00
build Add ioping plugin (#5725) 2019-04-23 14:20:00 +03:00
collectors [collector/proc.plugin] Add /proc/pagetypeinfo parser (#6843) 2019-10-21 10:51:46 +03:00
contrib Fix upgrade path from v1.17.1 to v1.18.x for deb packages (#7118) 2019-10-18 16:56:20 +03:00
daemon feat(reaper): Add process reaper support (#7059) 2019-10-17 15:35:08 +02:00
database template_foreach_fix: Fix underscore dash (#7069) 2019-10-14 14:08:21 +00:00
diagrams Fix Markdown Lint warnings (#6664) 2019-08-15 13:06:39 +02:00
docs Tutorials to support v1.18 features (#6993) 2019-10-10 16:43:46 +02:00
health Implement hangouts chat notifications (#7013) 2019-10-17 20:00:51 +00:00
libnetdata feat(reaper): Add process reaper support (#7059) 2019-10-17 15:35:08 +02:00
packaging [ci skip] create nightly packages and update changelog 2019-10-20 15:28:56 +00:00
registry Fix Markdown Lint warnings (#6664) 2019-08-15 13:06:39 +02:00
streaming Fix Markdown Lint warnings (#6664) 2019-08-15 13:06:39 +02:00
system Revert "netdata/packaging: Do not deliver edit-config as part of the distribution tarball (#6507)" 2019-07-23 11:02:09 +02:00
tests Common pattern for web and alarms together with two bug fixes (#6783) 2019-09-29 11:10:35 +02:00
web Documenting the structure of the data responses. (#7012) 2019-10-17 10:04:32 +02:00
.clang-format Add clang-format. Update Contribution guidelines. (#6677) 2019-09-20 16:44:20 +02:00
.codacy.yml Codacy js fixes (#5337) 2019-02-06 15:52:49 +01:00
.codeclimate.yml modularized all source code (#4391) 2018-10-15 23:16:42 +03:00
.csslintrc added codeclimate coverage 2017-01-06 18:01:57 +02:00
.eslintignore added codeclimate coverage 2017-01-06 18:01:57 +02:00
.eslintrc added codeclimate coverage 2017-01-06 18:01:57 +02:00
.gitattributes Add a .gitattributes file (#6381) 2019-07-05 11:54:32 +02:00
.gitignore Add CMocka unit tests (#6985) 2019-10-15 15:15:40 +03:00
.gitlab-ci.yml netdata/packaging: remove sudo 2019-08-28 18:19:04 +03:00
.lgtm.yml Split js 2 (#4581) 2018-11-08 11:37:06 +02:00
.remarkignore add CHANGELOG.md to .remarkignore (#6671) 2019-08-15 16:41:08 -07:00
.remarkrc.js NEW: local remark-lint checks and autofix support (#5898) 2019-06-03 14:46:36 +02:00
.squash.yml Squash integration (#5566) 2019-09-16 16:49:31 +02:00
.travis.yml Run the trigger for deb and rpm package build in separate stage (#7105) 2019-10-16 08:41:37 +03:00
CHANGELOG.md [ci skip] create nightly packages and update changelog 2019-10-20 15:28:56 +00:00
CMakeLists.txt [collector/proc.plugin] Add /proc/pagetypeinfo parser (#6843) 2019-10-21 10:51:46 +03:00
CODE_OF_CONDUCT.md Fix Markdown Lint warnings (#6664) 2019-08-15 13:06:39 +02:00
CONTRIBUTING.md Add clang-format. Update Contribution guidelines. (#6677) 2019-09-20 16:44:20 +02:00
CONTRIBUTORS.md Fixing broken links (#7123) 2019-10-17 21:50:29 +02:00
DOCUMENTATION.md Fixing broken links found via linkchecker (#6983) 2019-10-03 09:24:00 -07:00
HISTORICAL_CHANGELOG.md Fix Markdown Lint warnings (#6664) 2019-08-15 13:06:39 +02:00
LICENSE remove license templates; add info about SPDX to main license file 2018-09-08 15:53:07 +02:00
Makefile.am [collector/proc.plugin] Add /proc/pagetypeinfo parser (#6843) 2019-10-21 10:51:46 +03:00
README.md Update news with v1.18.1 2019-10-18 09:22:46 -07:00
REDISTRIBUTED.md Fix Markdown Lint warnings (#6664) 2019-08-15 13:06:39 +02:00
SECURITY.md Fixing broken links found via linkchecker (#6983) 2019-10-03 09:24:00 -07:00
configs.signatures Remove hard cap from page cache size to eliminate deadlocks. (#7006) 2019-10-07 11:08:21 +03:00
configure.ac Fix build when CMocka isn't installed (#7129) 2019-10-18 15:33:05 +03:00
coverity-scan.sh netdata/ci: second batch of fixes for coverity scan script and others (#6804) 2019-09-12 11:49:21 +02:00
cppcheck.sh optimized ses and added des (#4470) 2018-10-24 03:03:57 +03:00
netdata-installer.sh Partial fix for #7039 (#7060) 2019-10-17 15:30:49 +02:00
netdata.cppcheck remove static dir config 2018-09-08 15:45:34 +02:00
netdata.spec.in netdata/packaging: Make spec file more consistent with version dependencies, plus some documentation nits (#6948) 2019-09-29 11:08:53 +02:00
netlify.toml Docs: Overhaul of Getting started guide (#6811) 2019-09-24 16:00:13 +02:00
package-lock.json NPM Packages version update (#6801) 2019-09-17 17:54:09 +02:00
package.json NPM Packages version update (#6801) 2019-09-17 17:54:09 +02:00

README.md

Netdata Build Status CII Best Practices License: GPL v3+ analytics

Code Climate Codacy Badge LGTM C LGTM JS LGTM PYTHON


Netdata is distributed, real-time, performance and health monitoring for systems and applications. It is a highly optimized monitoring agent you install on all your systems and containers.

Netdata provides unparalleled insights, in real-time, of everything happening on the systems it runs (including web servers, databases, applications), using highly interactive web dashboards. It can run autonomously, without any third party components, or it can be integrated to existing monitoring tool chains (Prometheus, Graphite, OpenTSDB, Kafka, Grafana, etc).

Netdata is fast and efficient, designed to permanently run on all systems (physical & virtual servers, containers, IoT devices), without disrupting their core function.

Netdata is free, open-source software and it currently runs on Linux, FreeBSD, and MacOS.

Netdata is not hosted by the CNCF but is the 3rd most starred open-source project in the Cloud Native Computing Foundation (CNCF) landscape.


People get addicted to Netdata.
Once you use it on your systems, there is no going back! You have been warned...

image

Tweet about Netdata!

Contents

  1. How it looks - have a quick look at it
  2. User base - who uses Netdata?
  3. Quick Start - try it now on your systems
  4. Why Netdata - why people love Netdata, how it compares with other solutions
  5. News - latest news about Netdata
  6. How it works - high level diagram of how Netdata works
  7. infographic - everything about Netdata, in a page
  8. Features - what features does it have
  9. Visualization - unique visualization features
  10. What does it monitor - which metrics it collects
  11. Documentation - read the docs
  12. Community - discuss with others and get support
  13. License - check the license of Netdata
  14. Is it any good? - Yes
  15. Is it awesome? - Yes

How it looks

The following animated image, shows the top part of a typical Netdata dashboard.

peek 2018-11-11 02-40

A typical Netdata dashboard, in 1:1 timing. Charts can be panned by dragging them, zoomed in/out with SHIFT + mouse wheel, an area can be selected for zoom-in with SHIFT + mouse selection. Netdata is highly interactive and real-time, optimized to get the work done!

We have a few online demos to experience it live: https://www.netdata.cloud

User base

Netdata is used by hundreds of thousands of users all over the world. Check our GitHub watchers list. You will find people working for Amazon, Atos, Baidu, Cisco Systems, Citrix, Deutsche Telekom, DigitalOcean, Elastic, EPAM Systems, Ericsson, Google, Groupon, Hortonworks, HP, Huawei, IBM, Microsoft, NewRelic, Nvidia, Red Hat, SAP, Selectel, TicketMaster, Vimeo, and many more!

Docker pulls

We provide docker images for the most common architectures. These are statistics reported by docker hub:

netdata/netdata (official) firehol/netdata (deprecated) titpetric/netdata (donated)

Registry

When you install multiple Netdata, they are integrated into one distributed application, via a Netdata registry. This is a web browser feature and it allows us to count the number of unique users and unique Netdata servers installed. The following information comes from the global public Netdata registry we run:

User Base Monitored Servers Sessions Served

in the last 24 hours:
New Users Today New Machines Today Sessions Today

Quick Start

To install Netdata from source on any Linux system (physical, virtual, container, IoT, edge) and keep it up to date with our nightly releases automatically, run the following:

# make sure you run `bash` for your shell
bash

# install Netdata directly from GitHub source
bash <(curl -Ss https://my-netdata.io/kickstart.sh)

To learn more about the pros and cons of using nightly vs. stable releases, see our notice about the two options.

The above command will:

  • Install any required packages on your system (it will ask you to confirm before doing so)
  • Compile it, install it, and start it.

More installation methods and additional options can be found at the installation page.

To try Netdata in a docker container, run this:

docker run -d --name=netdata \
  -p 19999:19999 \
  -v /etc/passwd:/host/etc/passwd:ro \
  -v /etc/group:/host/etc/group:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  netdata/netdata

For more information about running Netdata in docker, check the docker installation page.

image

From Netdata v1.12 and above, anonymous usage information is collected by default and sent to Google Analytics. To read more about the information collected and how to opt-out, check the anonymous statistics page.

Why Netdata

Netdata has a quite different approach to monitoring.

Netdata is a monitoring agent you install on all your systems. It is:

  • a metrics collector - for system and application metrics (including web servers, databases, containers, etc)
  • a time-series database - all stored in memory (does not touch the disks while it runs)
  • a metrics visualizer - super fast, interactive, modern, optimized for anomaly detection
  • an alarms notification engine - an advanced watchdog for detecting performance and availability issues

All the above, are packaged together in a very flexible, extremely modular, distributed application.

This is how Netdata compares to other monitoring solutions:

Netdata others (open-source and commercial)
High resolution metrics (1s granularity) Low resolution metrics (10s granularity at best)
Monitors everything, thousands of metrics per node Monitor just a few metrics
UI is super fast, optimized for anomaly detection UI is good for just an abstract view
Meaningful presentation, to help you understand the metrics You have to know the metrics before you start
Install and get results immediately Long preparation is required to get any useful results
Use it for troubleshooting performance problems Use them to get statistics of past performance
Kills the console for tracing performance issues The console is always required for troubleshooting
Requires zero dedicated resources Require large dedicated resources

Netdata is open-source, free, super fast, very easy, completely open, extremely efficient, flexible and integrate-able.

It has been designed by SysAdmins, DevOps and Developers for troubleshooting performance problems, not just visualize metrics.

News

Oct 18th, 2019 - Netdata v1.18.1 released!

Release v1.18.1 contains 17 bug fixes, 5 improvements, and 5 documentation updates.

The patch has several bug fixes, mainly related to FreeBSD and the binary package generation process.

Netdata can now send notifications to Google Hangouts Chat!

On certain systems, the slabinfo plugin introduced in v1.18.0 added thousands of new metrics. We decided the collector's usefulness to most users didn't justify the increase in resource requirements. This release disables the collector by default.

Finally, we added a chart under Netdata Monitoring to present a better view of the RAM used by the database engine (dbengine). The chart doesn't currently take into consideration the RAM used for slave nodes, so we intend to add more related charts in the future.


Oct 10th, 2019 - Netdata v1.18.0 released!

Release v1.18.0 contains 5 new collectors, 16 bug fixes, 27 improvements, and 20 documentation updates.

The database engine is now the default method of storing metrics in Netdata. You immediately get more efficient and configurable long-term metrics storage without any work on your part. By saving recent metrics in RAM and "spilling" historical metrics to disk for long-term storage, the database engine is laying the foundation for many more improvements to distributed metrics.

We even have a tutorial on switching to the database engine and getting the most from it. Or, just read up on how performant the database engine really is.

Both our python.d and go.d plugins now have more intelligent auto-detection by periodically dump a list of active modules to disk. When Netdata starts, such as after a reboot, the plugins use this list of known services to re-establish metrics collection much more reliably. No more worrying if the service or application you need to monitor starts up minutes after Netdata.

Two of our new collectors will help those with Hadoop big data infrastructures. The HDFS and Zookeeper collection modules come with essential alarms requested by our community and Netdata's auto-detection capabilities to keep the required configuration to an absolute minimum. Read up on the process via our HDFS and Zookeeper tutorial.

Speaking of new collectors—we also added the ability to collect metrics from SLAB cache, Gearman, and vCenter Server Appliances.

Before v1.18, if you wanted to create alarms for each dimension in a single chart, you need to write separate entities for each dimension—not very efficient or user-friendly. New dimension templates fix that hassle. Now, a single entity can automatically generate alarms for any number of dimensions in a chart, even those you weren't aware of! Our tutorial on dimension templates has all the details.

v1.18 brings support for installing Netdata on offline or air-gapped systems. To help users comply with strict security policies, our installation scripts can now install Netdata using previously-downloaded tarball and checksums instead of downloading them at runtime. We have guides for installing offline via kickstart.sh or kickstart-static64.sh in our installation documentation. We're excited to bring real-time monitoring to once-inaccessible systems!


Sep 12th, 2019 - Netdata v1.17.1 released!

Release v1.17.1 contains 2 bug fixes, 6 improvements, and 2 documentation updates.

The main reason for the patch release is an essential fix to the repeating alarm notifications we introduced in v1.17.0. If you enabled repeating notifications, Netdata would not then send CLEAR notifications for the selected alarms.

The release also includes a significant improvement to Netdata's auto-detection capabilities, especially after a system restart. Netdata now remembers which python.d plugin jobs were successfully collecting data the last time it was running, and retries to run those jobs for 5 minutes before giving up. As a result, you no longer have to worry if your system starts Netdata before the monitored services have had a chance to start properly. We will complete the same improvement for go.d plugins in v1.18.0.

We also made some improvements to our binary packages and added a neat sample custom dashboard that can show charts from multiple Netdata agents.


Sep 3rd, 2019 - Netdata v1.17.0 released!

Release v1.17.0 contains 38 bug fixes, 33 improvements, and 20 documentation updates.

You can now change the data collection frequency at will, without losing previously collected values. A major improvement to the new database engine allows you not only to store metrics at variable granularity, but also to autoscale the time axis of the charts, depending on the data collection frequencies used during the presented time.

You can also now monitor VM performance from one or more vCenter servers with a new VSphere collector. In addition, the proc plugin now also collects ZRAM device performance metrics and the apps plugin monitors process uptime for the defined process groups.

Continuing our efforts to integrate with as many existing solutions as possible, you can now directly archive metrics from Netdata to MongoDB via a new backend.

Netdata badges now support international (UTF8) characters! We also made our URL parser smarter, not only for international character support, but also for other strange API queries.

We also added .DEB packages to our binary distribution repositories at Packagecloud, a new collector for Linux zram device metrics, and support for plain text email notifications.

This release includes several fixes and improvements to the TLS encryption feature we introduced in v1.16.0. First, encryption slave-to-master streaming connections wasn't working as intended. And second, our community helped us discover cases where HTTP requests were not correctly redirected to HTTPS with TLS enabled. This release mitigates those issues and improves TLS support overall.

Finally, we improved the way Netdata displays charts with no metrics. By default, Netdata displays charts for disks, memory, and networks only when the associated metrics are not zero. Users could enable these charts permanently using the corresponding configuration options, but they would need to change more than 200 options. With this new improvement, users can enable all charts with zero values using a single, global configuration parameter.


Jul 9th, 2019 - Netdata v1.16.0 released!

Release v1.16.0 contains 40 bug fixes, 31 improvements and 20 documentation updates

Binary distributions. To improve the security, speed and reliability of new Netdata installations, we are delivering our own, industry standard installation method, with binary package distributions. The RPM binaries for the most common OSs are already available on packagecloud and well have the DEB ones available very soon. All distributions are considered in Beta and, as always, we depend on our amazing community for feedback on improvements.

Netdata now supports TLS encryption! You can secure the communication to the web server, the streaming connections from slaves to the master and the connection to an openTSDB backend.

This version also brings two long-awaited features to Netdatas health monitoring:

  • The health management API introduced in v1.12 allowed you to easily disable alarms and/or notifications while Netdata was running. However, those changes were not persisted across Netdata restarts. Since part of routine maintenance activities may involve completely restarting a monitoring node, Netdata now saves these configurations to disk, every time you issue a command to change the silencer settings. The new LIST command of the API allows you to view at any time which alarms are currently disabled or silenced.
  • A way for Netdata to repeatedly send alarm notifications for some, or all active alarms, at a frequency of your choosing. As a result, you will no longer have to worry about missing a notification, forgetting about a raised alarm. The default is still to only send a single notification, so that existing users are not surprised by a different behavior.

As always, weve introduced new collectors, 5 of them this time:

  • Of special interest to people with Windows servers in their infrastructure is the WMI collector, though we are fully aware that we need to continue our efforts to do a proper port to Windows.
  • The new perf plugin collects system-wide CPU performance statistics from Performance Monitoring Units (PMU) using the perf_event_open() system call. You can read a wonderful article on why this is useful here.
  • The other three are collectors to monitor Dnsmasq DHCP leases, Riak KV servers and Pihole instances.

Finally, the DB Engine introduced in v1.15.0 now uses much less memory and is more robust than before.


May 21st, 2019 - Netdata v1.15.0 released!

Release v1.15.0 contains 11 bug fixes and 30 improvements.

We are very happy and proud to be able to include two major improvements in this release: The aggregated node view and the new database engine.

Aggregated node view

The No. 1 request from our community has been a better way to view and manage their Netdata installations, via an aggregated view. The node menu with the simple list of hosts on the agent UI just didn't do it for people with hundreds, or thousands of instances. This release introduces the node view, which uses the power of Netdata Cloud to deliver powerful views of a Netdata-based monitoring infrastructure. You can read more about Netdata Cloud and the future of Netdata here.

New database engine

Historically, Netdata has required a lot of memory for long-term metrics storage. To mitigate this we've been building a new DB engine for several months and will continue improving until it can become the default memory mode for new Netdata installations. The version included in release v1.15.0 already permits longer-term storage of compressed data and we'll continue reducing the required memory in following releases.

Other major additions

We have added support for the AWS Kinesis backend and new collectors for OpenVPN, the Tengine web server, ScaleIO (VxFlex OS), ioping-like latency metrics and Energi Core node instances.

We now have a new, "text-only" chart type, cpu limits for v2 cgroups, docker swarm metrics and improved documentation.

We continued improving the Kubernetes helmchart with liveness probes for slaves, persistence options, a fix for a Cannot allocate memory issue and easy configuration for the kubelet, kube-proxy and coredns collectors.

Finally, we built a process to quickly replace any problematic nightly builds and added more automated CI tests to prevent such builds from being published in the first place.


Apr 26th, 2019 - Netdata v1.14.0 released!

Release 1.14 contains 14 bug fixes and 24 improvements.

The release introduces major additions to Kubernetes monitoring, with tens of new charts for Kubelet, kube-proxy and coredns metrics, as well as significant improvements to the Netdata helm chart.

Two new collectors were added, to monitor Docker hub and Docker engine metrics.

Finally, v1.14 adds support for version 2 cgroups, OpenLDAP over TLS, NVIDIA SMI free and per process memory and configurable syslog facilities.


Mar 14th, 2019 - Netdata v1.13.0 released!

Release 1.13.0 contains 14 bug fixes and 8 improvements.

Netdata has taken the first step into the world of Kubernetes, with a beta version of a Helm chart for deployment to a k8s cluster and proper naming of the cgroup containers. We have big plans for Kubernetes, so stay tuned!

A major refactoring of the python.d plugin has resulted in a dramatic decrease of the required memory, making Netdata even more resource efficient.

We also added charts for IPC shared memory segments and total memory used.


Feb 28th, 2019 - Netdata v1.12.2 released!

Patch release 1.12.2 contains 7 bug fixes and 4 improvements.

The main motivation behind a new patch release is the introduction of a stable release channel. A "stable" installation and update channel was always on our roadmap, but it became a necessity when we realized that our users in China could not use the nightly releases published on Google Cloud. The "stable" channel is based on our official GitHub releases and uses assets hosted on GitHub.

We are also introducing a new Oracle DB collector module, implemented in Python.


Feb 21st, 2019 - Netdata v1.12.1 released!

Patch release 1.12.1 contains 22 bug fixes and 8 improvements.


Feb 14th, 2019 - Netdata v1.12.0 released!

Release 1.12 is made out of 211 pull requests and 22 bug fixes. The key improvements are:

  • Introducing netdata.cloud, the free Netdata service for all Netdata users
  • High performance plugins with go.d.plugin (data collection orchestrator written in Go)
  • 7 new data collectors and 11 rewrites of existing data collectors for improved performance
  • A new management API for all Netdata servers
  • Bind different functions of the Netdata APIs to different ports
  • Improved installation and updates

Nov 22nd, 2018 - Netdata v1.11.1 released!

  • Improved internal database to support values above 64bit.
  • New data collection plugins: openldap, tor, nvidia_smi.
  • Improved data collection plugins: Netdata now supports monitoring network interface aliases, smartd_log, cpufreq, sensors.
  • Health monitoring improvements: network interface congestion alarm restored, alerta.io, conntrack_max.
  • my-netdatamenu has been refactored.
  • Packaging: openrc service definition got a few improvements.

Sep 18, 2018 - Netdata has its own organization

Netdata used to be a firehol.org project, accessible as firehol/netdata.

Netdata now has its own github organization netdata, so all github URLs are now netdata/netdata. The old github URLs, repo clones, forks, etc redirect automatically to the new repo.

How it works

Netdata is a highly efficient, highly modular, metrics management engine. Its lockless design makes it ideal for concurrent operations on the metrics.

image

This is how it works:

Function Description Documentation
Collect Multiple independent data collection workers are collecting metrics from their sources using the optimal protocol for each application and push the metrics to the database. Each data collection worker has lockless write access to the metrics it collects. collectors
Store Metrics are stored in RAM in a round robin database (ring buffer), using a custom made floating point number for minimal footprint. database
Check A lockless independent watchdog is evaluating health checks on the collected metrics, triggers alarms, maintains a health transaction log and dispatches alarm notifications. health
Stream An lockless independent worker is streaming metrics, in full detail and in real-time, to remote Netdata servers, as soon as they are collected. streaming
Archive A lockless independent worker is down-sampling the metrics and pushes them to backend time-series databases. backends
Query Multiple independent workers are attached to the internal web server, servicing API requests, including data queries. web/api

The result is a highly efficient, low latency system, supporting multiple readers and one writer on each metric.

Infographic

This is a high level overview of Netdata feature set and architecture. Click it to to interact with it (it has direct links to documentation).

image

Features

finger-video

This is what you should expect from Netdata:

General

  • 1s granularity - the highest possible resolution for all metrics.
  • Unlimited metrics - collects all the available metrics, the more the better.
  • 1% CPU utilization of a single core - it is super fast, unbelievably optimized.
  • A few MB of RAM - by default it uses 25MB RAM. You size it.
  • Zero disk I/O - while it runs, it does not load or save anything (except error and access logs).
  • Zero configuration - auto-detects everything, it can collect up to 10000 metrics per server out of the box.
  • Zero maintenance - You just run it, it does the rest.
  • Zero dependencies - it is even its own web server, for its static web files and its web API (though its plugins may require additional libraries, depending on the applications monitored).
  • Scales to infinity - you can install it on all your servers, containers, VMs and IoTs. Metrics are not centralized by default, so there is no limit.
  • Several operating modes - Autonomous host monitoring (the default), headless data collector, forwarding proxy, store and forward proxy, central multi-host monitoring, in all possible configurations. Each node may have different metrics retention policy and run with or without health monitoring.

Health Monitoring & Alarms

Integrations

  • time-series dbs - can archive its metrics to Graphite, OpenTSDB, Prometheus, AWS Kinesis, MongoDB, JSON document DBs, in the same or lower resolution (lower: to prevent it from congesting these servers due to the amount of data collected). Netdata also supports Prometheus remote write API which allows storing metrics to Elasticsearch, Gnocchi, InfluxDB, Kafka, PostgreSQL/TimescaleDB, Splunk, VictoriaMetrics and a lot of other storage providers.

Visualization

  • Stunning interactive dashboards - mouse, touchpad and touch-screen friendly in 2 themes: slate (dark) and white.
  • Amazingly fast visualization - responds to all queries in less than 1 ms per metric, even on low-end hardware.
  • Visual anomaly detection - the dashboards are optimized for detecting anomalies visually.
  • Embeddable - its charts can be embedded on your web pages, wikis and blogs. You can even use Atlassian's Confluence as a monitoring dashboard.
  • Customizable - custom dashboards can be built using simple HTML (no javascript necessary).

Positive and negative values

To improve clarity on charts, Netdata dashboards present positive values for metrics representing read, input, inbound, received and negative values for metrics representing write, output, outbound, sent.

positive-and-negative-values

Netdata charts showing the bandwidth and packets of a network interface. received is positive and sent is negative.

Autoscaled y-axis

Netdata charts automatically zoom vertically, to visualize the variation of each metric within the visible time-frame.

non-zero-based

A zero based stacked chart, automatically switches to an auto-scaled area chart when a single dimension is selected.

Charts are synchronized

Charts on Netdata dashboards are synchronized to each other. There is no master chart. Any chart can be panned or zoomed at any time, and all other charts will follow.

charts-are-synchronized

Charts are panned by dragging them with the mouse. Charts can be zoomed in/out withSHIFT + mouse wheel while the mouse pointer is over a chart.

The visible time-frame (pan and zoom) is propagated from Netdata server to Netdata server, when navigating via the node menu.

Highlighted time-frame

To improve visual anomaly detection across charts, the user can highlight a time-frame (by pressing ALT + mouse selection) on all charts.

highlighted-timeframe

A highlighted time-frame can be given by pressing ALT + mouse selection on any chart. Netdata will highlight the same range on all charts.

Highlighted ranges are propagated from Netdata server to Netdata server, when navigating via the node menu.

What does it monitor

Netdata data collection is extensible - you can monitor anything you can get a metric for. Its Plugin API supports all programing languages (anything can be a Netdata plugin, BASH, python, perl, node.js, java, Go, ruby, etc).

  • For better performance, most system related plugins (cpu, memory, disks, filesystems, networking, etc) have been written in C.
  • For faster development and easier contributions, most application related plugins (databases, web servers, etc) have been written in python.

APM (Application Performance Monitoring)

  • statsd - Netdata is a fully featured statsd server.
  • Go expvar - collects metrics exposed by applications written in the Go programming language using the expvar package.
  • Spring Boot - monitors running Java Spring Boot applications that expose their metrics with the use of the Spring Boot Actuator included in Spring Boot library.
  • uWSGI - collects performance metrics from uWSGI applications.

System Resources

Memory

  • ram - collects info about RAM usage.
  • swap - collects info about swap memory usage.
  • available memory - collects the amount of RAM available for userspace processes.
  • committed memory - collects the amount of RAM committed to userspace processes.
  • Page Faults - collects the system page faults (major and minor).
  • writeback memory - collects the system dirty memory and writeback activity.
  • huge pages - collects the amount of RAM used for huge pages.
  • KSM - collects info about Kernel Same Merging (memory dedupper).
  • Numa - collects Numa info on systems that support it.
  • slab - collects info about the Linux kernel memory usage.

Disks

Filesystems

  • BTRFS - detailed disk space allocation and usage.
  • Ceph - OSD usage, Pool usage, number of objects, etc.
  • NFS file servers and clients - NFS v2, v3, v4: I/O, cache, read ahead, RPC calls
  • Samba - performance metrics of Samba SMB2 file sharing.
  • ZFS - detailed performance and resource usage.

Networking

  • Network Stack - everything about the networking stack (both IPv4 and IPv6 for all protocols: TCP, UDP, SCTP, UDPLite, ICMP, Multicast, Broadcast, etc), and all network interfaces (per interface: bandwidth, packets, errors, drops).
  • Netfilter - everything about the netfilter connection tracker.
  • SynProxy - collects performance data about the linux SYNPROXY (DDoS).
  • NFacct - collects accounting data from iptables.
  • Network QoS - the only tool that visualizes network tc classes in real-time.
  • FPing - to measure latency and packet loss between any number of hosts.
  • ISC dhcpd - pools utilization, leases, etc.
  • AP - collects Linux access point performance data (hostapd).
  • SNMP - SNMP devices can be monitored too (although you will need to configure these).
  • port_check - checks TCP ports for availability and response time.

Virtual Private Networks

  • OpenVPN - collects status per tunnel.
  • LibreSwan - collects metrics per IPSEC tunnel.
  • Tor - collects Tor traffic statistics.

Processes

  • System Processes - running, blocked, forks, active.
  • Applications - by grouping the process tree and reporting CPU, memory, disk reads, disk writes, swap, threads, pipes, sockets - per process group.
  • systemd - monitors systemd services using CGROUPS.

Users

  • Users and User Groups resource usage - by summarizing the process tree per user and group, reporting: CPU, memory, disk reads, disk writes, swap, threads, pipes, sockets.
  • logind - collects sessions, users and seats connected.

Containers and VMs

  • Containers - collects resource usage for all kinds of containers, using CGROUPS (systemd-nspawn, lxc, lxd, docker, kubernetes, etc).
  • libvirt VMs - collects resource usage for all kinds of VMs, using CGROUPS.
  • dockerd - collects docker health metrics.

Web Servers

  • Apache and lighttpd - mod-status (v2.2, v2.4) and cache log statistics, for multiple servers.
  • IPFS - bandwidth, peers.
  • LiteSpeed - reads the litespeed rtreport files to collect metrics.
  • Nginx - stub-status, for multiple servers.
  • Nginx+ - connects to multiple nginx_plus servers (local or remote) to collect real-time performance metrics.
  • PHP-FPM - multiple instances, each reporting connections, requests, performance, etc.
  • Tomcat - accesses, threads, free memory, volume, etc.
  • web server access.log files - extracting in real-time, web server and proxy performance metrics and applying several health checks, etc.
  • HTTP check - checks one or more web servers for HTTP status code and returned content.

Proxies, Balancers, Accelerators

  • HAproxy - bandwidth, sessions, backends, etc.
  • Squid - multiple servers, each showing: clients bandwidth and requests, servers bandwidth and requests.
  • Traefik - connects to multiple traefik instances (local or remote) to collect API metrics (response status code, response time, average response time and server uptime).
  • Varnish - threads, sessions, hits, objects, backends, etc.
  • IPVS - collects metrics from the Linux IPVS load balancer.

Database Servers

  • CouchDB - reads/writes, request methods, status codes, tasks, replication, per-db, etc.
  • MemCached - multiple servers, each showing: bandwidth, connections, items, etc.
  • MongoDB - operations, clients, transactions, cursors, connections, asserts, locks, etc.
  • MySQL and mariadb - multiple servers, each showing: bandwidth, queries/s, handlers, locks, issues, tmp operations, connections, binlog metrics, threads, innodb metrics, and more.
  • PostgreSQL - multiple servers, each showing: per database statistics (connections, tuples read - written - returned, transactions, locks), backend processes, indexes, tables, write ahead, background writer and more.
  • Proxy SQL - collects Proxy SQL backend and frontend performance metrics.
  • Redis - multiple servers, each showing: operations, hit rate, memory, keys, clients, slaves.
  • RethinkDB - connects to multiple rethinkdb servers (local or remote) to collect real-time metrics.

Message Brokers

Search and Indexing

  • ElasticSearch - search and index performance, latency, timings, cluster statistics, threads statistics, etc.

DNS Servers

  • bind_rndc - parses named.stats dump file to collect real-time performance metrics. All versions of bind after 9.6 are supported.
  • dnsdist - performance and health metrics.
  • ISC Bind (named) - multiple servers, each showing: clients, requests, queries, updates, failures and several per view metrics. All versions of bind after 9.9.10 are supported.
  • NSD - queries, zones, protocols, query types, transfers, etc.
  • PowerDNS - queries, answers, cache, latency, etc.
  • unbound - performance and resource usage metrics.
  • dns_query_time - DNS query time statistics.

Time Servers

  • chrony - uses the chronyc command to collect chrony statistics (Frequency, Last offset, RMS offset, Residual freq, Root delay, Root dispersion, Skew, System time).
  • ntpd - connects to multiple ntpd servers (local or remote) to provide statistics of system variables and optional also peer variables.

Mail Servers

  • Dovecot - POP3/IMAP servers.
  • Exim - message queue (emails queued).
  • Postfix - message queue (entries, size).

Hardware Sensors

  • IPMI - enterprise hardware sensors and events.
  • lm-sensors - temperature, voltage, fans, power, humidity, etc.
  • Nvidia - collects information for Nvidia GPUs.
  • RPi - Raspberry Pi temperature sensors.
  • w1sensor - collects data from connected 1-Wire sensors.

UPSes

  • apcupsd - load, charge, battery voltage, temperature, utility metrics, output metrics.
  • NUT - load, charge, battery voltage, temperature, utility metrics, output metrics.
  • Linux Power Supply - collects metrics reported by power supply drivers on Linux.

Social Sharing Servers

  • RetroShare - connects to multiple retroshare servers (local or remote) to collect real-time performance metrics.

Security

  • Fail2Ban - monitors the fail2ban log file to check all bans for all active jails.

Authentication, Authorization, Accounting (AAA, RADIUS, LDAP) Servers

  • FreeRadius - uses the radclient command to provide freeradius statistics (authentication, accounting, proxy-authentication, proxy-accounting).

Telephony Servers

  • opensips - connects to an opensips server (localhost only) to collect real-time performance metrics.

Household Appliances

  • SMA webbox - connects to multiple remote SMA webboxes to collect real-time performance metrics of the photovoltaic (solar) power generation.
  • Fronius - connects to multiple remote Fronius Symo servers to collect real-time performance metrics of the photovoltaic (solar) power generation.
  • StiebelEltron - collects the temperatures and other metrics from your Stiebel Eltron heating system using their Internet Service Gateway (ISG web).

Game Servers

  • SpigotMC - monitors Spigot Minecraft server ticks per second and number of online players using the Minecraft remote console.

Distributed Computing

  • BOINC - monitors task states for local and remote BOINC client software using the remote GUI RPC interface. Also provides alarms for a handful of error conditions.

Media Streaming Servers

  • IceCast - collects the number of listeners for active sources.

Monitoring Systems

  • Monit - collects metrics about monit targets (filesystems, applications, networks).

Provisioning Systems

  • Puppet - connects to multiple Puppet Server and Puppet DB instances (local or remote) to collect real-time status metrics.

You can easily extend Netdata, by writing plugins that collect data from any source, using any computer language.


Documentation

The Netdata documentation is at https://docs.netdata.cloud. But you can also find it inside the repo, so by just navigating the repo on github you can find all the documentation.

Here is a quick list:

Directory Description
installer Instructions to install Netdata on your systems.
docker Instructions to install Netdata using docker.
daemon Information about the Netdata daemon and its configuration.
collectors Information about data collection plugins.
health How Netdata's health monitoring works, how to create your own alarms and how to configure alarm notification methods.
streaming How to build hierarchies of Netdata servers, by streaming metrics between them.
backends Long term archiving of metrics to industry standard time-series databases, like prometheus, graphite, opentsdb.
web/api Learn how to query the Netdata API and the queries it supports.
web/api/badges Learn how to generate badges (SVG images) from live data.
web/gui/custom Learn how to create custom Netdata dashboards.
web/gui/confluence Learn how to create Netdata dashboards on Atlassian's Confluence.

You can also check all the other directories. Most of them have plenty of documentation.

Community

We welcome contributions. So, feel free to join the team.

To report bugs, or get help, use GitHub Issues.

You can also find Netdata on:

License

Netdata is GPLv3+.

Netdata re-distributes other open-source tools and libraries. Please check the third party licenses.

Is it any good?

Yes.

When people first hear about a new product, they frequently ask if it is any good. A Hacker News user remarked:

Note to self: Starting immediately, all raganwald projects will have a “Is it any good?” section in the readme, and the answer shall be “yes.".

So, we follow the tradition...

Is it awesome?

These people seem to like it.