Go to file
Costa Tsaousis cb7af25c09
RRD structures managed by dictionaries (#13646)
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
.devcontainer Support VS Code container devenv (#10723) 2021-03-12 16:43:41 +02:00
.github Temporary fix for command injection vulnerability in GHA workflow. (#13600) 2022-08-31 14:23:59 -04:00
.travis Overhaul build CI. (#11699) 2021-11-15 10:02:57 -05:00
aclk Obsolete RRDSET state (#13635) 2022-09-07 15:28:30 +03:00
build Make atomics a hard-dep. (#12730) 2022-05-02 17:59:40 +03:00
build_external feat: move dirs, logs, and env vars config options to separate sections (#12935) 2022-05-17 17:31:19 +03:00
claim Remove aclk_api.[ch] (#13540) 2022-08-24 10:41:14 +02:00
cli netdata doubles (#13217) 2022-06-28 17:04:37 +03:00
collectors RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
contrib fix(packaging): add CAP_NET_ADMIN for go.d.plugin (#13507) 2022-08-11 18:57:35 +03:00
daemon RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
database RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
diagrams Docs: Removed Google Analytics tags (#12145) 2022-02-17 10:37:46 +00:00
docs Add link to the performance optimization guide (#13595) 2022-08-31 10:04:53 +03:00
exporting RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
health RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
libnetdata RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
ml RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
mqtt_websockets@0ccbce11b8 Fixes vbi parser in mqtt5 implementation (#13277) 2022-06-30 17:41:52 +02:00
packaging bump go.d.plugin v0.40.0 (#13675) 2022-09-19 16:55:51 +03:00
parser Deduplicate all netdata strings (#13570) 2022-09-05 19:31:06 +03:00
registry RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
spawn Trace rwlocks of netdata (#12785) 2022-05-03 00:31:50 +03:00
streaming RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
system netdata.service: Update PIDFile to avoid systemd legacy path warning (#13504) 2022-08-15 08:51:12 -04:00
tests RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
web RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
.clang-format Fine tune clang-format (#7271) 2021-04-15 12:02:36 +03:00
.codacy.yml codacy/lgtm ignore judy sources (#13411) 2022-07-20 11:15:49 -04:00
.codeclimate.yml Remove node.d.plugin and relevant files (#12769) 2022-05-03 09:16:21 +03:00
.csslintrc added codeclimate coverage 2017-01-06 18:01:57 +02:00
.dockerignore Restore a broken symbolic link (#12923) 2022-05-16 18:46:36 +03:00
.eslintignore Bundle the react dashboard code into the agent repo directly. (#11139) 2021-05-14 11:41:16 -04:00
.eslintrc added codeclimate coverage 2017-01-06 18:01:57 +02:00
.gitattributes Add a .gitattributes file (#6381) 2019-07-05 11:54:32 +02:00
.gitignore include Judy into our source tree (#13362) 2022-07-22 16:55:06 +02:00
.gitmodules Anomaly Detection MVP (#11548) 2021-10-27 09:26:21 +03:00
.lgtm.yml codacy/lgtm ignore judy sources (#13411) 2022-07-20 11:15:49 -04:00
.mlc_config.json GitHub action markdown link check update (#10474) 2021-01-11 13:50:16 -05:00
.remarkignore add CHANGELOG.md to .remarkignore (#6671) 2019-08-15 16:41:08 -07:00
.remarkrc.js Change lint standard for lists (#10371) 2021-01-07 08:43:18 -07:00
.squash.yml Squash integration (#5566) 2019-09-16 16:49:31 +02:00
.travis.yml include Judy into our source tree (#13362) 2022-07-22 16:55:06 +02:00
.yamllint.yml Clean up YAML files in the repository. (#11570) 2021-09-27 10:16:39 -04:00
BREAKING_CHANGES.md Docs: Removed Google Analytics tags (#12145) 2022-02-17 10:37:46 +00:00
BUILD.md Docs: Removed Google Analytics tags (#12145) 2022-02-17 10:37:46 +00:00
CHANGELOG.md [ci skip] Update changelog and version for nightly build: v1.36.0-122-nightly. 2022-09-19 00:17:57 +00:00
CMakeLists.txt RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
Dockerfile Remove the confusion around the multiple Dockerfile(s) we have (#8214) 2020-03-10 08:12:26 +10:00
Dockerfile.test Spelling build (#10909) 2021-04-14 12:24:45 +03:00
HISTORICAL_CHANGELOG.md Spelling md (#10508) 2021-01-18 07:43:43 -05:00
LICENSE remove license templates; add info about SPDX to main license file 2018-09-08 15:53:07 +02:00
Makefile.am RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
README.md add discord, youtube, linkedin links to README (#13419) 2022-07-21 19:48:15 +01:00
REDISTRIBUTED.md Metric correlations (#12582) 2022-05-04 13:59:58 +03:00
build-artifacts.sh Fix the netdata-updater.sh to correctly pass REINSTALL_OPTIONS (finally) (#8808) 2020-08-25 15:25:26 +10:00
config.cmake.h.in CMake improvements part 1 (#13575) 2022-09-09 15:09:20 +02:00
configs.signatures Drop dirty dbengine pages if disk cannot keep up (#7777) 2020-02-06 21:58:13 +02:00
configure.ac RRD structures managed by dictionaries (#13646) 2022-09-19 23:46:13 +03:00
coverity-scan.sh Bump Coverity version to latest (2022.06). (#13541) 2022-08-19 08:02:18 -04:00
cppcheck.sh optimized ses and added des (#4470) 2018-10-24 03:03:57 +03:00
netdata-installer.sh Remove extra U from log message (#13514) 2022-08-13 15:22:47 +03:00
netdata.cppcheck remove static dir config 2018-09-08 15:45:34 +02:00
netdata.spec.in fix(packaging): add CAP_NET_ADMIN for go.d.plugin (#13507) 2022-08-11 18:57:35 +03:00

README.md

Netdata

Netdata is high-fidelity infrastructure monitoring and troubleshooting.
Open-source, free, preconfigured, opinionated, and always real-time.


GitHub Stars
Latest release Nightly release
Build status CII Best Practices License: GPL v3+
Code Climate LGTM C LGTM PYTHON

---

Netdata's distributed, real-time monitoring Agent collects thousands of metrics from systems, hardware, containers, and applications with zero configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices, and is perfectly safe to install on your systems mid-incident without any preparation.

You can install Netdata on most Linux distributions (Ubuntu, Debian, CentOS, and more), container platforms (Kubernetes clusters, Docker), and many other operating systems (FreeBSD, macOS). No sudo required.

Netdata is designed by system administrators, DevOps engineers, and developers to collect everything, help you visualize metrics, troubleshoot complex performance problems, and make data interoperable with the rest of your monitoring stack.

People get addicted to Netdata. Once you use it on your systems, there's no going back! You've been warned...

Users who are addicted toNetdata

Menu

Features

Netdata inaction

Here's what you can expect from Netdata:

  • 1s granularity: The highest possible resolution for all metrics.
  • Unlimited metrics: Netdata collects all the available metrics—the more, the better.
  • 1% CPU utilization of a single core: It's unbelievably optimized.
  • A few MB of RAM: The highly-efficient database engine stores per-second metrics in RAM and then "spills" historical metrics to disk long-term storage.
  • Minimal disk I/O: While running, Netdata only writes historical metrics and reads error and access logs.
  • Zero configuration: Netdata auto-detects everything, and can collect up to 10,000 metrics per server out of the box.
  • Zero maintenance: You just run it. Netdata does the rest.
  • Stunningly fast, interactive visualizations: The dashboard responds to queries in less than 1ms per metric to synchronize charts as you pan through time, zoom in on anomalies, and more.
  • Visual anomaly detection: Our UI/UX emphasizes the relationships between charts to help you detect the root cause of anomalies.
  • Machine learning (ML) features out of the box: Unsupervised ML-based anomaly detection, every second, every metric, zero-config! Metric correlations to help with short-term change detection. And other additional ML-based features to help make your life easier.
  • Scales to infinity: You can install it on all your servers, containers, VMs, and IoT devices. Metrics are not centralized by default, so there is no limit.
  • Several operating modes: Autonomous host monitoring (the default), headless data collector, forwarding proxy, store and forward proxy, central multi-host monitoring, in all possible configurations. Use different metrics retention policies per node and run with or without health monitoring.

Netdata works with tons of applications, notifications platforms, and other time-series databases:

  • 300+ system, container, and application endpoints: Collectors autodetect metrics from default endpoints and immediately visualize them into meaningful charts designed for troubleshooting. See everything we support.
  • 20+ notification platforms: Netdata's health watchdog sends warning and critical alarms to your favorite platform to inform you of anomalies just seconds after they affect your node.
  • 30+ external time-series databases: Export resampled metrics as they're collected to other local- and Cloud-based databases for best-in-class interoperability.

💡 Want to leverage the monitoring power of Netdata across entire infrastructure? View metrics from any number of distributed nodes in a single interface and unlock even more features with Netdata Cloud.

Get Netdata

User base Servers monitored Sessions served Docker Hub pulls
New users today New machines today Sessions today Docker Hub pulls today

To install Netdata from source on most Linux systems (physical, virtual, container, IoT, edge), run our one-line installation script. This script downloads and builds all dependencies, including those required to connect to Netdata Cloud if you choose, and enables automatic nightly updates and anonymous statistics.

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh

To view the Netdata dashboard, navigate to http://localhost:19999, or http://NODE:19999.

Docker

You can also try out Netdata's capabilities in a Docker container:

docker run -d --name=netdata \
  -p 19999:19999 \
  -v netdataconfig:/etc/netdata \
  -v netdatalib:/var/lib/netdata \
  -v netdatacache:/var/cache/netdata \
  -v /etc/passwd:/host/etc/passwd:ro \
  -v /etc/group:/host/etc/group:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /etc/os-release:/host/etc/os-release:ro \
  --restart unless-stopped \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  netdata/netdata

To view the Netdata dashboard, navigate to http://localhost:19999, or http://NODE:19999.

Other operating systems

See our documentation for additional operating systems, including Kubernetes, .deb/.rpm packages, and more.

Post-installation

When you're finished with installation, check out our single-node or infrastructure monitoring quickstart guides based on your use case.

Or, skip straight to configuring the Netdata Agent.

Read through Netdata's documentation, which is structured based on actions and solutions, to enable features like health monitoring, alarm notifications, long-term metrics storage, exporting to external databases, and more.

Netdata Cloud

Netdata Cloud works with Netdata's free, open-source monitoring agent to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages. Using both tools can help you turn data into insights immediately.

Get Netdata Cloud now!

How it works

Netdata is a highly efficient, highly modular, metrics management engine. Its lockless design makes it ideal for concurrent operations on the metrics.

Diagram of Netdata's corefunctionality

The result is a highly efficient, low-latency system, supporting multiple readers and one writer on each metric.

Infographic

This is a high-level overview of Netdata features and architecture. Click on it to view an interactive version, which has links to our documentation.

An infographic of how Netdataworks

Documentation

Netdata's documentation is available at Netdata Learn.

This site also hosts a number of guides to help newer users better understand how to collect metrics, troubleshoot via charts, export to external databases, and more.

Community

Netdata is an inclusive open-source project and community. Please read our Code of Conduct.

Find most of the Netdata team in our community forums. It's the best place to ask questions, find resources, and engage with passionate professionals. The team is also available and active in our Discord too.

You can also find Netdata on:

Contribute

Contributions are the lifeblood of open-source projects. While we continue to invest in and improve Netdata, we need help to democratize monitoring!

  • Read our Contributing Guide, which contains all the information you need to contribute to Netdata, such as improving our documentation, engaging in the community, and developing new features. We've made it as frictionless as possible, but if you need help, just ping us on our community forums!
  • We have a whole category dedicated to contributing and extending Netdata on our community forums
  • Found a bug? Open a GitHub issue.
  • View our Security Policy.

Package maintainers should read the guide on building Netdata from source for instructions on building each Netdata component from source and preparing a package.

License

The Netdata Agent is GPLv3+. Netdata re-distributes other open-source tools and libraries. Please check the third party licenses.

Is it any good?

Yes.

When people first hear about a new product, they frequently ask if it is any good. A Hacker News user remarked:

Note to self: Starting immediately, all raganwald projects will have a “Is it any good?” section in the readme, and the answer shall be “yes.".