netdata/collectors/COLLECTORS.md

36 KiB

Supported collectors list

Netdata uses collectors to help you gather metrics from your favorite applications and services and view them in real-time, interactive charts. The following list includes collectors for both external services/applications and internal system metrics.

Learn more about how collectors work, and then learn how to enable or configure any of the below collectors using the same process.

Some collectors have both Go and Python versions as we continue our effort to migrate all collectors to Go. In these cases, Netdata always prioritizes the Go version, and we highly recommend you use the Go versions for the best experience.

If you want to use a Python version of a collector, you need to explicitly disable the Go version, and enable the Python version. Netdata then skips the Go version and attempts to load the Python version and its accompanying configuration file.

If you don't see the app/service you'd like to monitor in this list:

  • Check out our GitHub issues. Use the search bar to look for previous discussions about that collector—we may be looking for assistance from users such as yourself!
  • If you don't see the collector there, you can make a feature request on GitHub.
  • If you have basic software development skills, you can add your own plugin in Go or Python

Supported Collectors List:

Service and application collectors

The Netdata Agent auto-detects and collects metrics from all of the services and applications below. You can also configure any of these collectors according to your setup and infrastructure.

Generic

  • Prometheus endpoints: Gathers metrics from any number of Prometheus endpoints, with support to autodetect more than 600 services and applications.
  • Pandas: A Python collector that gathers metrics from a pandas dataframe. Pandas is a high level data processing library in Python that can read various formats of data from local files or web endpoints. Custom processing and transformation logic can also be expressed as part of the collector configuration.

APM (application performance monitoring)

  • Go applications: Monitor any Go application that exposes its metrics with the expvar package from the Go standard library.
  • Java Spring Boot 2 applications: Monitor running Java Spring Boot 2 applications that expose their metrics with the use of the Spring Boot Actuator.
  • statsd: Implement a high performance statsd server for Netdata.
  • phpDaemon: Collect worker statistics (total, active, idle), and uptime for web and network applications.
  • uWSGI: Monitor performance metrics exposed by the uWSGI Stats Server.

Containers and VMs

  • Docker containers: Monitor the health and performance of individual Docker containers using the cgroups collector plugin.
  • DockerD: Collect container health statistics.
  • Docker Engine: Collect runtime statistics from the docker daemon using the metrics-address feature.
  • Docker Hub: Collect statistics about Docker repositories, such as pulls, starts, status, time since last update, and more.
  • Libvirt: Monitor the health and performance of individual Libvirt containers using the cgroups collector plugin.
  • LXC: Monitor the health and performance of individual LXC containers using the cgroups collector plugin.
  • LXD: Monitor the health and performance of individual LXD containers using the cgroups collector plugin.
  • systemd-nspawn: Monitor the health and performance of individual systemd-nspawn containers using the cgroups collector plugin.
  • vCenter Server Appliance: Monitor appliance system, components, and software update health statuses via the Health API.
  • vSphere: Collect host and virtual machine performance metrics.
  • Xen/XCP-ng: Collect XenServer and XCP-ng metrics using libxenstat.

Data stores

  • CockroachDB: Monitor various database components using _status/vars endpoint.
  • Consul: Capture service and unbound checks status (passing, warning, critical, maintenance).
  • Couchbase: Gather per-bucket metrics from any number of instances of the distributed JSON document database.
  • CouchDB: Monitor database health and performance metrics (reads/writes, HTTP traffic, replication status, etc).
  • MongoDB: Collect memory-caching system performance metrics and reads the server's response to stats command (stats interface).
  • MySQL: Collect database global, replication and per user statistics.
  • OracleDB: Monitor database performance and health metrics.
  • Pika: Gather metric, such as clients, memory usage, queries, and more from the Redis interface-compatible database.
  • Postgres: Collect database health and performance metrics.
  • ProxySQL: Monitor database backend and frontend performance metrics.
  • Redis: Monitor status from any number of database instances by reading the server's response to the INFO ALL command.
  • RethinkDB: Collect database server and cluster statistics.
  • Riak KV: Collect database stats from the /stats endpoint.
  • Zookeeper: Monitor application health metrics reading the server's response to the mntr command.
  • Memcached: Collect memory-caching system performance metrics.

Distributed computing

  • BOINC: Monitor the total number of tasks, open tasks, and task states for the distributed computing client.
  • Gearman: Collect application summary (queued, running) and per-job worker statistics (queued, idle, running).

Email

  • Dovecot: Collect email server performance metrics by reading the server's response to the EXPORT global command.
  • EXIM: Uses the exim tool to monitor the queue length of a mail/message transfer agent (MTA).
  • Postfix: Uses the postqueue tool to monitor the queue length of a mail/message transfer agent (MTA).

Kubernetes

  • Kubelet: Monitor one or more instances of the Kubelet agent and collects metrics on number of pods/containers running, volume of Docker operations, and more.
  • kube-proxy: Collect metrics, such as syncing proxy rules and REST client requests, from one or more instances of kube-proxy.
  • Service discovery: Find what services are running on a cluster's pods, converts that into configuration files, and exports them so they can be monitored by Netdata.

Logs

  • Fluentd: Gather application plugins metrics from an endpoint provided by in_monitor plugin.
  • Logstash: Monitor JVM threads, memory usage, garbage collection statistics, and more.
  • OpenVPN status logs: Parse server log files and provide summary (client, traffic) metrics.
  • Squid web server logs: Tail Squid access logs to return the volume of requests, types of requests, bandwidth, and much more.
  • Web server logs (Go version for Apache, NGINX): Tail access logs and provide very detailed web server performance statistics. This module is able to parse 200k+ rows in less than half a second.
  • Web server logs (Apache, NGINX): Tail access log file and collect web server/caching proxy metrics.

Messaging

  • ActiveMQ: Collect message broker queues and topics statistics using the ActiveMQ Console API.
  • Beanstalk: Collect server and tube-level statistics, such as CPU usage, jobs rates, commands, and more.
  • Pulsar: Collect summary, namespaces, and topics performance statistics.
  • RabbitMQ (Go): Collect message broker overview, system and per virtual host metrics.
  • RabbitMQ (Python): Collect message broker global and per virtual host metrics.
  • VerneMQ: Monitor MQTT broker health and performance metrics. It collects all available info for both MQTTv3 and v5 communication

Network

  • Bind 9: Collect nameserver summary performance statistics via a web interface (statistics-channels feature).
  • Chrony: Monitor the precision and statistics of a local chronyd server.
  • CoreDNS: Measure DNS query round trip time.
  • Dnsmasq: Automatically detects all configured Dnsmasq DHCP ranges and Monitor their utilization.
  • DNSdist: Collect load-balancer performance and health metrics.
  • Dnsmasq DNS Forwarder: Gather queries, entries, operations, and events for the lightweight DNS forwarder.
  • DNS Query Time: Monitor the round trip time for DNS queries in milliseconds.
  • Freeradius: Collect server authentication and accounting statistics from the status server.
  • Libreswan: Collect bytes-in, bytes-out, and uptime metrics.
  • Icecast: Monitor the number of listeners for active sources.
  • ISC Bind (RDNC): Collect nameserver summary performance statistics using the rndc tool.
  • ISC DHCP: Reads a dhcpd.leases file and collects metrics on total active leases, pool active leases, and pool utilization.
  • OpenLDAP: Provides statistics information from the OpenLDAP (slapd) server.
  • NSD: Monitor nameserver performance metrics using the nsd-control tool.
  • NTP daemon: Monitor the system variables of the local ntpd daemon (optionally including variables of the polled peers) using the NTP Control Message Protocol via a UDP socket.
  • OpenSIPS: Collect server health and performance metrics using the opensipsctl tool.
  • OpenVPN: Gather server summary (client, traffic) and per user metrics (traffic, connection time) stats using management-interface.
  • Pi-hole: Monitor basic (DNS queries, clients, blocklist) and extended (top clients, top permitted, and blocked domains) statistics using the PHP API.
  • PowerDNS Authoritative Server: Monitor one or more instances of the nameserver software to collect questions, events, and latency metrics.
  • PowerDNS Recursor: Gather incoming/outgoing questions, drops, timeouts, and cache usage from any number of DNS recursor instances.
  • RetroShare: Monitor application bandwidth, peers, and DHT metrics.
  • Tor: Capture traffic usage statistics using the Tor control port.
  • Unbound: Collect DNS resolver summary and extended system and per thread metrics via the remote-control interface.

Provisioning

  • Puppet: Monitor the status of Puppet Server and Puppet DB.

Remote devices

  • AM2320: Monitor sensor temperature and humidity.
  • Access point: Monitor client, traffic and signal metrics using the aw tool.
  • APC UPS: Capture status information using the apcaccess tool.
  • Energi Core: Monitor blockchain indexes, memory usage, network usage, and transactions of wallet instances.
  • UPS/PDU: Read the status of UPS/PDU devices using the upsc tool.
  • SNMP devices: Gather data using the SNMP protocol.
  • 1-Wire sensors: Monitor sensor temperature.
  • Elasticsearch: Collect dozens of metrics on search engine performance from local nodes and local indices. Includes cluster health and statistics.
  • Solr: Collect application search requests, search errors, update requests, and update errors statistics.

Storage

  • Ceph: Monitor the Ceph cluster usage and server data consumption.
  • HDFS: Monitor health and performance metrics for filesystem datanodes and namenodes.
  • IPFS: Collect file system bandwidth, peers, and repo metrics.
  • Scaleio: Monitor storage system, storage pools, and SDCS health and performance metrics via VxFlex OS Gateway API.
  • Samba: Collect file sharing metrics using the smbstatus tool.

Web

  • Apache: Collect Apache web server performance metrics via the server-status?auto endpoint.
  • HAProxy: Collect frontend, backend, and health metrics.
  • HTTP endpoints: Monitor any HTTP endpoint's availability and response time.
  • Lighttpd: Collect web server performance metrics using the server-status?auto endpoint.
  • Lighttpd2: Collect web server performance metrics using the server-status?format=plain endpoint.
  • Litespeed: Collect web server data (network, connection, requests, cache) by reading .rtreport* files.
  • Nginx: Monitor web server status information by gathering metrics via ngx_http_stub_status_module.
  • Nginx VTS: Gathers metrics from any Nginx deployment with the virtual host traffic status module enabled, including metrics on uptime, memory usage, and cache, and more.
  • PHP-FPM: Collect application summary and processes health metrics by scraping the status page (/status?full).
  • TCP endpoints: Monitor any TCP endpoint's availability and response time.
  • Spigot Minecraft servers: Monitor average ticket rate and number of users.
  • Squid: Monitor client and server bandwidth/requests by gathering data from the Cache Manager component.
  • Tengine: Monitor web server statistics using information provided by ngx_http_reqstat_module.
  • Tomcat: Collect web server performance metrics from the Manager App (/manager/status?XML=true).
  • Traefik: Uses Traefik's Health API to provide statistics.
  • Varnish: Provides HTTP accelerator global, backends (VBE), and disks (SMF) statistics using the varnishstat tool.
  • x509 check: Monitor certificate expiration time.
  • Whois domain expiry: Checks the remaining time until a given domain is expired.

System collectors

The Netdata Agent can collect these system- and hardware-level metrics using a variety of collectors, some of which (such as proc.plugin) collect multiple types of metrics simultaneously.

Applications

  • Fail2ban: Parses configuration files to detect all jails, then uses log files to report ban rates and volume of banned IPs.
  • Monit: Monitor statuses of targets (service-checks) using the XML stats interface.
  • WMI (Windows Management Instrumentation) exporter: Collect CPU, memory, network, disk, OS, system, and log-in metrics scraping wmi_exporter.

Disks and filesystems

  • BCACHE: Monitor BCACHE statistics with the the proc.plugin collector.
  • Block devices: Gather metrics about the health and performance of block devices using the the proc.plugin collector.
  • Btrfs: Monitors Btrfs filesystems with the the proc.plugin collector.
  • Device mapper: Gather metrics about the Linux device mapper with the proc collector.
  • Disk space: Collect disk space usage metrics on Linux mount points.
  • Clock synchronization: Collect the system clock synchronization status on Linux.
  • Files and directories: Gather metrics about the existence, modification time, and size of files or directories.
  • ioping.plugin: Measure disk read/write latency.
  • NFS file servers and clients: Gather operations, utilization, and space usage using the the proc.plugin collector.
  • RAID arrays: Collect health, disk status, operation status, and more with the the proc.plugin collector.
  • Veritas Volume Manager: Gather metrics about the Veritas Volume Manager (VVM).
  • ZFS: Monitor bandwidth and utilization of ZFS disks/partitions using the proc collector.

eBPF

  • Files: Provides information about how often a system calls kernel functions related to file descriptors using the eBPF collector.
  • Virtual file system (VFS): Monitor IO, errors, deleted objects, and more for kernel virtual file systems (VFS) using the eBPF collector.
  • Processes: Monitor threads, task exits, and errors using the eBPF collector.

Hardware

  • Adaptec RAID: Monitor logical and physical devices health metrics using the arcconf tool.
  • CUPS: Monitor CUPS.
  • FreeIPMI: Uses libipmimonitoring-dev or libipmimonitoring-devel to monitor the number of sensors, temperatures, voltages, currents, and more.
  • Hard drive temperature: Monitor the temperature of storage devices.
  • HP Smart Storage Arrays: Monitor controller, cache module, logical and physical drive state, and temperature using the ssacli tool.
  • MegaRAID controllers: Collect adapter, physical drives, and battery stats using the megacli tool.
  • NVIDIA GPU: Monitor performance metrics (memory usage, fan speed, pcie bandwidth utilization, temperature, and more) using the nvidia-smi tool.
  • Sensors: Reads system sensors information (temperature, voltage, electric current, power, and more) from /sys/devices/.
  • S.M.A.R.T: Reads SMART Disk Monitoring daemon logs.

Memory

  • Available memory: Tracks changes in available RAM using the the proc.plugin collector.
  • Committed memory: Monitor committed memory using the proc.plugin collector.
  • Huge pages: Gather metrics about huge pages in Linux and FreeBSD with the proc.plugin collector.
  • KSM: Measure the amount of merging, savings, and effectiveness using the proc.plugin collector.
  • Numa: Gather metrics on the number of non-uniform memory access (NUMA) events every second using the proc.plugin collector.
  • Page faults: Collect the number of memory page faults per second using the proc.plugin collector.
  • RAM: Collect metrics on system RAM, available RAM, and more using the proc.plugin collector.
  • SLAB: Collect kernel SLAB details on Linux systems.
  • swap: Monitor the amount of free and used swap at every second using the proc.plugin collector.
  • Writeback memory: Collect how much memory is actively being written to disk at every second using the proc.plugin collector.

Networks

  • Access points: Visualizes data related to access points.
  • Ping: Measure network latency, jitter and packet loss between the monitored node and any number of remote network end points.
  • Netfilter: Collect netfilter firewall, connection tracker, and accounting metrics using libmnl and libnetfilter_acct.
  • Network stack: Monitor the networking stack for errors, TCP connection aborts, bandwidth, and more.
  • Network QoS: Collect traffic QoS metrics (tc) of Linux network interfaces.
  • SYNPROXY: Monitor entries uses, SYN packets received, TCP cookies, and more.

Operating systems

  • freebsd.plugin: Collect resource usage and performance data on FreeBSD systems.
  • macOS: Collect resource usage and performance data on macOS systems.

Processes

  • Applications: Gather CPU, disk, memory, network, eBPF, and other metrics per application using the apps.plugin collector.
  • systemd: Monitor the CPU and memory usage of systemd services using the cgroups.plugin collector.
  • systemd unit states: See the state (active, inactive, activating, deactivating, failed) of various systemd unit types.
  • System processes: Collect metrics on system load and total processes running using /proc/loadavg and the proc.plugin collector.
  • Uptime: Monitor the uptime of a system using the proc.plugin collector.

Resources

  • CPU frequency: Monitor CPU frequency, as set by the cpufreq kernel module, using the proc.plugin collector.
  • CPU idle: Measure CPU idle every second using the proc.plugin collector.
  • CPU performance: Collect CPU performance metrics using performance monitoring units (PMU).
  • CPU throttling: Gather metrics about thermal throttling using the /proc/stat module and the proc.plugin collector.
  • CPU utilization: Capture CPU utilization, both system-wide and per-core, using the /proc/stat module and the proc.plugin collector.
  • Entropy: Monitor the available entropy on a system using the proc.plugin collector.
  • Interprocess Communication (IPC): Monitor IPC semaphores and shared memory using the proc.plugin collector.
  • Interrupts: Monitor interrupts per second using the proc.plugin collector.
  • IdleJitter: Measure CPU latency and jitter on all operating systems.
  • SoftIRQs: Collect metrics on SoftIRQs, both system-wide and per-core, using the proc.plugin collector.
  • SoftNet: Capture SoftNet events per second, both system-wide and per-core, using the proc.plugin collector.

Users

  • systemd-logind: Monitor active sessions, users, and seats tracked by systemd-logind or elogind.
  • User/group usage: Gather CPU, disk, memory, network, and other metrics per user and user group using the apps.plugin collector.

Netdata collectors

These collectors are recursive in nature, in that they monitor some function of the Netdata Agent itself. Some collectors are described only in code and associated charts in Netdata dashboards.

  • ACLK (code only): View whether a Netdata Agent is connected to Netdata Cloud via the ACLK, the volume of queries, process times, and more.
  • Alarms: This collector creates an Alarms menu with one line plot showing the alarm states of a Netdata Agent over time.
  • Anomalies: This collector uses the Python PyOD library to perform unsupervised anomaly detection on your Netdata charts and/or dimensions.
  • Exporting (code only): Gather metrics on CPU utilization for the exporting engine, and specific metrics for each enabled exporting connector.
  • Global statistics (code only): See metrics on the CPU utilization, network traffic, volume of web clients, API responses, database engine usage, and more.

Orchestrators

Plugin orchestrators organize and run many of the above collectors.

If you're interested in developing a new collector that you'd like to contribute to Netdata, we highly recommend using the go.d.plugin.

  • go.d.plugin: An orchestrator for data collection modules written in go.
  • python.d.plugin: An orchestrator for data collection modules written in python v2/v3.
  • charts.d.plugin: An orchestrator for data collection modules written in bash v4+.

Third-party collectors

These collectors are developed and maintained by third parties and, unlike the other collectors, are not installed by default. To use a third-party collector, visit their GitHub/documentation page and follow their installation procedures.

Typical third party Python collector installation instructions

In general the below steps should be sufficient to use a third party collector.

  1. Download collector code file into folder expected by Netdata.
  2. Download default collector configuration file into folder expected by Netdata.
  3. Edit configuration file from step 2 if required.
  4. Enable collector.
  5. Restart Netdata

For example below are the steps to enable the Python ClickHouse collector.

# download python collector script to /usr/libexec/netdata/python.d/
$ sudo wget https://raw.githubusercontent.com/netdata/community/main/collectors/python.d.plugin/clickhouse/clickhouse.chart.py -O /usr/libexec/netdata/python.d/clickhouse.chart.py

# (optional) download default .conf to /etc/netdata/python.d/
$ sudo wget https://raw.githubusercontent.com/netdata/community/main/collectors/python.d.plugin/clickhouse/clickhouse.conf -O /etc/netdata/python.d/clickhouse.conf

# enable collector by adding line a new line with "clickhouse: yes" to /etc/netdata/python.d.conf file
# this will apend to the file if it already exists or create it if not
$ sudo echo "clickhouse: yes" >> /etc/netdata/python.d.conf

# (optional) edit clickhouse.conf if needed
$ sudo vi /etc/netdata/python.d/clickhouse.conf

# restart netdata 
# see docs for more information: https://learn.netdata.cloud/docs/configure/start-stop-restart
$ sudo systemctl restart netdata

Etc