Change streaming terminology to parent/child in docs (#9312)

* Intial pass through docs

* Dash instead of slash

* To parent/child

* Child nodes

* Change diagrams

* Allowlist

* Fixes for Andrew

* Remove from build_external

* Change in proc
This commit is contained in:
Joel Hans 2020-06-12 09:42:58 -07:00 committed by GitHub
parent 68f1888227
commit 2c64795b7c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
18 changed files with 210 additions and 195 deletions

View File

@ -179,8 +179,8 @@ from your Netdata):
of times within each pattern). The patterns are checked against the hostname (the localhost is always checked as
`localhost`), allowing us to filter which hosts will be sent to the backend when this Netdata is a central Netdata
aggregating multiple hosts. A pattern starting with `!` gives a negative match. So to match all hosts named `*db*`
except hosts containing `*slave*`, use `!*slave* *db*` (so, the order is important: the first pattern matching the
hostname will be used - positive or negative).
except hosts containing `*child*`, use `!*child* *db*` (so, the order is important: the first pattern
matching the hostname will be used - positive or negative).
- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any number of times
within each pattern). The patterns are checked against both chart id and chart name. A pattern starting with `!`

View File

@ -356,7 +356,7 @@ For more information check prometheus documentation.
### Streaming data from upstream hosts
The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the master/slave
The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the parent-child
functionality of Netdata this ignores any upstream hosts - so you should consider using the below in your
**prometheus.yml**:

View File

@ -10,10 +10,11 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/build_external/R
This wraps the build-system in Docker so that the host system and the target system are
decoupled. This allows:
* Cross-compilation (e.g. linux development from macOS)
* Cross-distro (e.g. using CentOS user-land while developing on Debian)
* Multi-host scenarios (e.g. master/slave configurations)
* Bleeding-edge sceneraios (e.g. using the ACLK (**currently for internal-use only**))
- Cross-compilation (e.g. linux development from macOS)
- Cross-distro (e.g. using CentOS user-land while developing on Debian)
- Multi-host scenarios (e.g. parent-child configurations)
- Bleeding-edge sceneraios (e.g. using the ACLK (**currently for internal-use only**))
The advantage of these scenarios is that they allow **reproducible** builds and testing
for developers. This is the first iteration of the build-system to allow the team to use
@ -97,19 +98,19 @@ Note: it is possible to run multiple copies of the agent using the `--scale` opt
Distro=debian Version=10 docker-compose -f projects/only-agent/docker-compose.yml up --scale agent=3
```
3. A simple master-slave scenario
3. A simple parent-child scenario
```bash
# Need to call clean-install on the configs used in the master/slave containers
docker-compose -f master-slaves/docker-compose.yml up --scale agent_slave1=2
# Need to call clean-install on the configs used in the parent-child containers
docker-compose -f parent-child/docker-compose.yml up --scale agent_child1=2
```
Note: this is not production ready yet, but it is left in so that we can see how it behaves
and improve it. Currently it produces the following problems:
* Only the base-configuration in the compose without scaling works.
* The containers are hard-coded in the compose.
* There is no way to separate the agent configurations, so running multiple agent slaves
wth the same GUID kills the master which exits with a fatal condition.
* There is no way to separate the agent configurations, so running multiple agent child nodes with the same GUID kills
the parent which exits with a fatal condition.
4. The ACLK

View File

@ -86,8 +86,8 @@ By default, Netdata will enable monitoring metrics only when they are not zero.
Netdata categorizes all block devices in 3 categories:
1. physical disks (i.e. block devices that does not have slaves and are not partitions)
2. virtual disks (i.e. block devices that have slaves - like RAID devices)
1. physical disks (i.e. block devices that do not have child devices and are not partitions)
2. virtual disks (i.e. block devices that have child devices - like RAID devices)
3. disk partitions (i.e. block devices that are part of a physical disk)
Performance metrics are enabled by default for all disk devices, except partitions and not-mounted virtual disks. Of course, you can enable/disable monitoring any block device by editing the Netdata configuration file.
@ -325,7 +325,7 @@ By default Netdata will enable monitoring metrics only when they are not zero. I
There are several alarms defined in `health.d/net.conf`.
The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alarms can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a slave or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alarm with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](/health/REFERENCE.md#alarm-line-families) line in the alarm configuration. For example, if you want to disable the `inbound packets dropped` alarm for `eth0`, set `families: !eth0 *` in the alarm definition for `template: inbound_packets_dropped`.
The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alarms can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a child or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alarm with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](/health/REFERENCE.md#alarm-line-families) line in the alarm configuration. For example, if you want to disable the `inbound packets dropped` alarm for `eth0`, set `families: !eth0 *` in the alarm definition for `template: inbound_packets_dropped`.
#### configuration

View File

@ -82,7 +82,7 @@ Please note that your data history will be lost if you have modified `history` p
| pthread stack size|auto-detected||||
| cleanup obsolete charts after seconds|`3600`|See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions|||
| gap when lost iterations above|`1`||||
| cleanup orphan hosts after seconds|`3600`|How long to wait until automatically removing from the DB a remote Netdata host (slave) that is no longer sending data.|||
| cleanup orphan hosts after seconds|`3600`|How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data.|||
| delete obsolete charts files|`yes`|See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also affects the deletion of files for obsolete dimensions|||
| delete orphan hosts files|`yes`|Set to `no` to disable non-responsive host removal.|||
| enable zero metrics|`no`|Set to `yes` to show charts when all their metrics are zero.|||

View File

@ -46,18 +46,18 @@ The `dbengine disk space` option determines the amount of disk space in **MiB**
metric values and all related metadata describing them.
Use the [**database engine calculator**](https://learn.netdata.cloud/docs/agent/database/calculator) to correctly set
`dbengine disk space` based on your needs. The calculator gives an accurate estimate based on how many slave nodes you
have, how many metrics your Agent collects, and more.
`dbengine disk space` based on your needs. The calculator gives an accurate estimate based on how many child nodes
you have, how many metrics your Agent collects, and more.
### Streaming metrics to the database engine
When streaming metrics, the Agent on the master node creates one instance of the database engine for itself, and another
instance for every slave node it receives metrics from. If you have four streaming nodes, you will have five instances
in total (`1 master + 4 slaves = 5 instances`).
When streaming metrics, the Agent on the parent node creates one instance of the database engine for itself, and another
instance for every child node it receives metrics from. If you have four streaming nodes, you will have five instances
in total (`1 parent + 4 child nodes = 5 instances`).
The Agent allocates resources for each instance separately using the `dbengine disk space` setting. If `dbengine disk
space` is set to the default `256`, each instance is given 256 MiB in disk space, which means the total disk space
required to store all instances is, roughly, `256 MiB * 1 master * 4 slaves = 1280 MiB`.
required to store all instances is, roughly, `256 MiB * 1 parent * 4 child nodes = 1280 MiB`.
See the [database engine calculator](https://learn.netdata.cloud/docs/agent/database/calculator) to help you correctly
set `dbengine disk space` and undertand the toal disk space required based on your streaming setup.
@ -90,14 +90,14 @@ validate the memory requirements for your particular system(s) and configuration
### File descriptor requirements
The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming slave or master
server). When configuring your system you should make sure there are at least 50 file descriptors available per
The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming child or
parent server). When configuring your system you should make sure there are at least 50 file descriptors available per
`dbengine` instance.
Netdata allocates 25% of the available file descriptors to its Database Engine instances. This means that only 25% of
the file descriptors that are available to the Netdata service are accessible by dbengine instances. You should take
that into account when configuring your service or system-wide file descriptor limits. You can roughly estimate that the
Netdata service needs 2048 file descriptors for every 10 streaming slave hosts when streaming is configured to use
Netdata service needs 2048 file descriptors for every 10 streaming child hosts when streaming is configured to use
`memory mode = dbengine`.
If for example one wants to allocate 65536 file descriptors to the Netdata service on a systemd system one needs to
@ -173,10 +173,10 @@ traffic so as to create the minimum possible interference with other application
## Evaluation
We have evaluated the performance of the `dbengine` API that the netdata daemon uses internally. This is **not** the
web API of netdata. Our benchmarks ran on a **single** `dbengine` instance, multiple of which can be running in a
netdata master server. We used a server with an AMD Ryzen Threadripper 2950X 16-Core Processor and 2 disk drives, a
Seagate Constellation ES.3 2TB magnetic HDD and a SAMSUNG MZQLB960HAJR-00007 960GB NAND Flash SSD.
We have evaluated the performance of the `dbengine` API that the netdata daemon uses internally. This is **not** the web
API of netdata. Our benchmarks ran on a **single** `dbengine` instance, multiple of which can be running in a Netdata
parent node. We used a server with an AMD Ryzen Threadripper 2950X 16-Core Processor and 2 disk drives, a Seagate
Constellation ES.3 2TB magnetic HDD and a SAMSUNG MZQLB960HAJR-00007 960GB NAND Flash SSD.
For our workload, we defined 32 charts with 128 metrics each, giving us a total of 4096 metrics. We defined 1 worker
thread per chart (32 threads) that generates new data points with a data generation interval of 1 second. The time axis

View File

@ -57,7 +57,7 @@ metrics. The default settings retain about two day's worth of metris on a system
[**See our database engine calculator**](https://learn.netdata.cloud/docs/agent/database/calculator) to help you
correctly set `dbengine disk space` based on your needs. The calculator gives an accurate estimate based on how many
slave nodes you have, how many metrics your Agent collects, and more.
child nodes you have, how many metrics your Agent collects, and more.
With the database engine active, you can back up your `/var/cache/netdata/dbengine/` folder to another location for
redundancy.

View File

@ -96,7 +96,7 @@ al-9866",
If Netdata can't access the `/jmx` endpoint for either a NameNode or DataNode, it will not be able to auto-detect and
collect metrics from your HDFS implementation.
Zookeeper auto-detection relies on an accessible client port and a whitelisted `mntr` command. For more details on
Zookeeper auto-detection relies on an accessible client port and a allow-listed `mntr` command. For more details on
`mntr`, see Zookeeper's documentation on [cluster
options](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_clusterOptions) and [Zookeeper
commands](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands).

View File

@ -53,7 +53,7 @@ every second.
[**See our database engine calculator**](https://learn.netdata.cloud/docs/agent/database/calculator) to help you
correctly set `dbengine disk space` based on your needs. The calculator gives an accurate estimate based on how many
slave nodes you have, how many metrics your Agent collects, and more.
child nodes you have, how many metrics your Agent collects, and more.
```conf
[global]

View File

@ -7,7 +7,7 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/usin
When you use Netdata to monitor and troubleshoot an entire infrastructure, whether that's dozens or hundreds of systems,
you need sophisticated ways of keeping everything organized. You need alarms that adapt to the system's purpose, or
whether the `master` or `slave` in a streaming setup. You need properly-labeled metrics archiving so you can sort,
whether the parent or child in a streaming setup. You need properly-labeled metrics archiving so you can sort,
correlate, and mash-up your data to your heart's content. You need to keep tabs on ephemeral Docker containers in a
Kubernetes cluster.
@ -50,7 +50,7 @@ read the status of your agent. For example, from a VPS system running Debian 10:
{
...
"host_labels": {
"_is_master": "false",
"_is_parent": "false",
"_virt_detection": "systemd-detect-virt",
"_container_detection": "none",
"_container": "unknown",
@ -73,7 +73,7 @@ You may have noticed a handful of labels that begin with an underscore (`_`). Th
When Netdata starts, it captures relevant information about the system and converts them into automatically-generated
host labels. You can use these to logically organize your systems via health entities, exporting metrics,
streaming/master status, and more.
parent-child status, and more.
They capture the following:
@ -82,29 +82,29 @@ They capture the following:
- CPU architecture, system cores, CPU frequency, RAM, and disk space
- Whether Netdata is running inside of a container, and if so, the OS and hardware details about the container's host
- What virtualization layer the system runs on top of, if any
- Whether the system is a streaming master or slave
- Whether the system is a streaming parent or child
If you want to organize your systems without manually creating host tags, try the automatic labels in some of the
features below.
## Host labels in streaming
You may have noticed the `_is_master` and `_is_slave` automatic labels from above. Host labels are also now streamed
from a slave to its master agent, which concentrates an entire infrastructure's OS, hardware, container, and
virtualization information in one place: the master.
You may have noticed the `_is_parent` and `_is_child` automatic labels from above. Host labels are also now
streamed from a child to its parent node, which concentrates an entire infrastructure's OS, hardware, container,
and virtualization information in one place: the parent.
Now, if you'd like to remind yourself of how much RAM a certain slave system has, you can simply access
`http://localhost:19999/host/SLAVE_NAME/api/v1/info` and reference the automatically-generated host labels from the
slave system. It's a vastly simplified way of accessing critical information about your infrastructure.
Now, if you'd like to remind yourself of how much RAM a certain child node has, you can access
`http://localhost:19999/host/CHILD_HOSTNAME/api/v1/info` and reference the automatically-generated host labels from the
child system. It's a vastly simplified way of accessing critical information about your infrastructure.
> ⚠️ Because automatic labels for slave nodes are accessible via API calls, and contain sensitive information like
> ⚠️ Because automatic labels for child nodes are accessible via API calls, and contain sensitive information like
> kernel and operating system versions, you should secure streaming connections with SSL. See the [streaming
> documentation](/streaming/README.md#securing-streaming-communications) for details. You may also want to use
> [access lists](/web/server/README.md#access-lists) or [expose the API only to LAN/localhost
> connections](/docs/netdata-security.md#expose-netdata-only-in-a-private-lan).
You can also use `_is_master`, `_is_slave`, and any other host labels in both health entities and metrics exporting.
Speaking of which...
You can also use `_is_parent`, `_is_child`, and any other host labels in both health entities and metrics
exporting. Speaking of which...
## Host labels in health entities
@ -138,11 +138,11 @@ Or, by using one of the automatic labels, for only webserver systems running a s
host labels: _os_name = Debian*
```
In a streaming configuration where a master agent is triggering alarms for its slaves, you could create health entities
that apply only to slaves:
In a streaming configuration where a parent node is triggering alarms for its child nodes, you could create health
entities that apply only to child nodes:
```yaml
host labels: _is_slave = true
host labels: _is_child = true
```
Or when ephemeral Docker nodes are involved:

View File

@ -40,7 +40,10 @@ There are a few cases however that raw source data are only exposed to processes
So, Netdata **plugins**, even those running with escalated capabilities or privileges, perform a **hard coded data collection job**. They do not accept commands from Netdata. The communication is strictly **unidirectional**: from the plugin towards the Netdata daemon. The original application data collected by each plugin do not leave the process they are collected, are not saved and are not transferred to the Netdata daemon. The communication from the plugins to the Netdata daemon includes only chart metadata and processed metric values.
Netdata slaves streaming metrics to upstream Netdata servers, use exactly the same protocol local plugins use. The raw data collected by the plugins of slave Netdata servers are **never leaving the host they are collected**. The only data appearing on the wire are chart metadata and metric values. This communication is also **unidirectional**: slave Netdata servers never accept commands from master Netdata servers.
Child nodes use the same protocol when streaming metrics to their parent nodes. The raw data collected by the plugins of
child Netdata servers are **never leaving the host they are collected**. The only data appearing on the wire are chart
metadata and metric values. This communication is also **unidirectional**: child nodes never accept commands from
parent Netdata servers.
## Netdata is read-only
@ -190,7 +193,10 @@ Of course, there are many more methods you could use to protect Netdata:
- If you are always under a static IP, you can use the script given above to allow direct access to your Netdata servers without authentication, from all your static IPs.
- install all your Netdata in **headless data collector** mode, forwarding all metrics in real-time to a master Netdata server, which will be protected with authentication using an nginx server running locally at the master Netdata server. This requires more resources (you will need a bigger master Netdata server), but does not require any firewall changes, since all the slave Netdata servers will not be listening for incoming connections.
- install all your Netdata in **headless data collector** mode, forwarding all metrics in real-time to a parent
Netdata server, which will be protected with authentication using an nginx server running locally at the parent
Netdata server. This requires more resources (you will need a bigger parent Netdata server), but does not require
any firewall changes, since all the child Netdata servers will not be listening for incoming connections.
## Anonymous Statistics

View File

@ -233,8 +233,8 @@ Options:
of times within each pattern). The patterns are checked against the hostname (the localhost is always checked as
`localhost`), allowing us to filter which hosts will be sent to the external database when this Netdata is a central
Netdata aggregating multiple hosts. A pattern starting with `!` gives a negative match. So to match all hosts named
`*db*` except hosts containing `*slave*`, use `!*slave* *db*` (so, the order is important: the first pattern
matching the hostname will be used - positive or negative).
`*db*` except hosts containing `*child*`, use `!*child* *db*` (so, the order is important: the first
pattern matching the hostname will be used - positive or negative).
- `send charts matching = *` includes one or more space separated patterns, using `*` as wildcard (any number of times
within each pattern). The patterns are checked against both chart id and chart name. A pattern starting with `!`

View File

@ -357,7 +357,7 @@ For more information check Prometheus documentation.
### Streaming data from upstream hosts
The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the master/slave
The `format=prometheus` parameter only exports the host's Netdata metrics. If you are using the parent-child
functionality of Netdata this ignores any upstream hosts - so you should consider using the below in your
**prometheus.yml**:

View File

@ -125,7 +125,9 @@ This is the brand new database engine capability of netdata. It is a mandatory f
#### Encryption Support (HTTPS)
This is Netdata's TLS capability that incorporates encryption on the web server and the APIs between master and slaves. Also a mandatory facility for Netdata, but remains optional for users who are limited or not interested in tight security
This is Netdata's TLS capability that incorporates encryption on the web server and the APIs between parent and child
nodes. Also a mandatory facility for Netdata, but remains optional for users who are limited or not interested in tight
security
|make/make install|netdata-installer.sh|kickstart.sh|kickstart-static64.sh|Docker image|RPM packaging|DEB packaging|
|:---------------:|:------------------:|:----------:|:-------------------:|:----------:|:-----------:|:-----------:|

View File

@ -9,7 +9,7 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/packaging/instal
Netdata is fully compatible with popular cloud providers like Google Cloud Platform (GCP), Amazon Web Services (AWS),
Azure, and others. You can install Netdata on cloud instances to monitor the apps/services running there, or use
multiple instances in a [master/slave streaming](../../../streaming/README.md) configuration.
multiple instances in a [parent-child streaming](/streaming/README.md) configuration.
In some cases, using Netdata on these cloud providers requires unique installation or configuration steps. This page
aims to document some of those steps for popular cloud providers.

View File

@ -9,8 +9,8 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/packaging/instal
Netdata works on macOS, albeit with some limitations. The number of charts displaying system metrics is limited, but you
can use any of Netdata's [external plugins](../../../collectors/plugins.d/README.md) to monitor any services you might
have installed on your macOS system. You could also use a macOS system as the master node in a [streaming
configuration](../../../streaming/README.md).
have installed on your macOS system. You could also use a macOS system as the parent node in a [streaming
configuration](/streaming/README.md).
We recommend installing Netdata with the community-created and -maintained [**Homebrew
package**](#install-netdata-with-the-homebrew-package).

View File

@ -1,64 +1,70 @@
<!--
---
title: "Streaming and replication"
description: "Replicate and mirror Netdata's metrics through real-time streaming from child to parent nodes. Then combine, correlate, and export."
custom_edit_url: https://github.com/netdata/netdata/edit/master/streaming/README.md
---
-->
# Streaming and replication
Each Netdata is able to replicate/mirror its database to another Netdata, by streaming collected
metrics, in real-time to it. This is quite different to [data archiving to third party time-series
databases](/backends/README.md).
databases](/exporting/README.md).
When Netdata streams metrics to another Netdata, the receiving one is able to perform everything a Netdata instance is capable of:
When Netdata streams metrics to another Netdata, the receiving one is able to perform everything a Netdata instance is
capable of:
- visualize them with a dashboard
- run health checks that trigger alarms and send alarm notifications
- archive metrics to a backend time-series database
- Visualize metrics with a dashboard
- Run health checks that trigger alarms and send alarm notifications
- Export metrics to a external time-series database
The nodes that send metrics are called **child** nodes, and the nodes that receive metrics are called **parent** nodes.
There are also **proxies**, which collects metrics from a child and sends it to a parent.
## Supported configurations
### Netdata without a database or web API (headless collector)
Local Netdata (`slave`), **without any database or alarms**, collects metrics and sends them to
another Netdata (`master`).
Local Netdata (child), **without any database or alarms**, collects metrics and sends them to another Netdata
(parent).
The node menu shows a list of all "databases streamed to" the master. Clicking one of those links allows the user to view the full dashboard of the `slave` Netdata. The URL has the form `http://master-host:master-port/host/slave-host/`.
The node menu shows a list of all "databases streamed to" the parent. Clicking one of those links allows the user to
view the full dashboard of the child node. The URL has the form
`http://parent-host:parent-port/host/child-host/`.
Alarms for the `slave` are served by the `master`.
Alarms for the child are served by the parent.
In this mode the `slave` is just a plain data collector. It spawns all external plugins, but instead
of maintaining a local database and accepting dashboard requests, it streams all metrics to the
`master`. The memory footprint is reduced significantly, to between 6 MiB and 40 MiB, depending on the enabled plugins. To reduce the memory usage as much as possible, refer to [running Netdata in embedded devices](/docs/Performance.md#running-netdata-in-embedded-devices).
In this mode the child is just a plain data collector. It spawns all external plugins, but instead of maintaining a
local database and accepting dashboard requests, it streams all metrics to the parent. The memory footprint is reduced
significantly, to between 6 MiB and 40 MiB, depending on the enabled plugins. To reduce the memory usage as much as
possible, refer to [running Netdata in embedded devices](/docs/Performance.md#running-netdata-in-embedded-devices).
The same `master` can collect data for any number of `slaves`.
The same parent can collect data for any number of child nodes.
### Database Replication
Local Netdata (`slave`), **with a local database (and possibly alarms)**, collects metrics and
sends them to another Netdata (`master`).
Local Netdata (child), **with a local database (and possibly alarms)**, collects metrics and
sends them to another Netdata (parent).
The user can use all the functions **at both** `http://slave-ip:slave-port/` and
`http://master-host:master-port/host/slave-host/`.
The user can use all the functions **at both** `http://child-ip:child-port/` and
`http://parent-host:parent-port/host/child-host/`.
The `slave` and the `master` may have different data retention policies for the same metrics.
The child and the parent may have different data retention policies for the same metrics.
Alarms for the `slave` are triggered by **both** the `slave` and the `master` (and actually
Alarms for the child are triggered by **both** the child and the parent (and actually
each can have different alarms configurations or have alarms disabled).
Take a note, that custom chart names, configured on the `slave`, should be in the form `type.name` to work correctly. The `master` will truncate the `type` part and substitute the original chart `type` to store the name in the database.
Take a note, that custom chart names, configured on the child, should be in the form `type.name` to work correctly. The parent will truncate the `type` part and substitute the original chart `type` to store the name in the database.
### Netdata proxies
Local Netdata (`slave`), with or without a database, collects metrics and sends them to another
Netdata (`proxy`), which may or may not maintain a database, which forwards them to another
Netdata (`master`).
Local Netdata (child), with or without a database, collects metrics and sends them to another
Netdata (**proxy**), which may or may not maintain a database, which forwards them to another
Netdata (parent).
Alarms for the slave can be triggered by any of the involved hosts that maintains a database.
Alarms for the child can be triggered by any of the involved hosts that maintains a database.
Any number of daisy chaining Netdata servers are supported, each with or without a database and
with or without alarms for the `slave` metrics.
with or without alarms for the child metrics.
### mix and match with backends
@ -96,7 +102,9 @@ monitoring (there cannot be health monitoring without a database).
`[web].mode = none` disables the API (Netdata will not listen to any ports).
This also disables the registry (there cannot be a registry without an API).
`accept a streaming request every seconds` can be used to set a limit on how often a master Netdata server will accept streaming requests from the slaves. 0 sets no limit, 1 means maximum once every second. If this is set, you may see error log entries "... too busy to accept new streaming request. Will be allowed in X secs".
`accept a streaming request every seconds` can be used to set a limit on how often a parent node will accept streaming
requests from its child nodes. 0 sets no limit, 1 means maximum once every second. If this is set, you may see error log
entries "... too busy to accept new streaming request. Will be allowed in X secs".
```
[backend]
@ -126,7 +134,7 @@ sending-receiving Netdata.
This is the section for the sending Netdata. On the receiving node, `[stream].enabled` can be `no`.
If it is `yes`, the receiving node will also stream the metrics to another node (i.e. it will be
a `proxy`).
a proxy).
```
[stream]
@ -144,7 +152,7 @@ This is an overview of how these options can be combined:
| proxy with db|not `none`|not `none`|`yes`|possible|possible|yes|
| central netdata|not `none`|not `none`|`no`|possible|possible|yes|
For the options to encrypt the data stream between the slave and the master, refer to [securing the communication](#securing-streaming-communications)
For the options to encrypt the data stream between the child and the parent, refer to [securing the communication](#securing-streaming-communications)
##### options for the receiving node
@ -166,7 +174,7 @@ all hosts pushed with this API key.
You can also add sections like this:
```sh
# replace MACHINE_GUID with the slave /var/lib/netdata/registry/netdata.public.unique.id
# replace MACHINE_GUID with the child /var/lib/netdata/registry/netdata.public.unique.id
[MACHINE_GUID]
enabled = yes
history = 3600
@ -175,7 +183,7 @@ You can also add sections like this:
allow from = *
```
The above is the receiver configuration of a single host, at the receiver end. `MACHINE_GUID` is
The above is the parent configuration of a single host, at the parent end. `MACHINE_GUID` is
the unique id the Netdata generating the metrics (i.e. the Netdata that originally collects
them `/var/lib/netdata/registry/netdata.unique.id`). So, metrics for Netdata `A` that pass through
any number of other Netdata, will have the same `MACHINE_GUID`.
@ -195,7 +203,7 @@ important: left to right, the first positive or negative match is used.
##### tracing
When a `slave` is trying to push metrics to a `master` or `proxy`, it logs entries like these:
When a child is trying to push metrics to a parent or proxy, it logs entries like these:
```
2017-02-25 01:57:44: netdata: ERROR: Failed to connect to '10.11.12.1', port '19999' (errno 111, Connection refused)
@ -207,7 +215,7 @@ When a `slave` is trying to push metrics to a `master` or `proxy`, it logs entri
2017-02-25 01:58:14: netdata: INFO : STREAM costa-pc [send]: ready - sending metrics...
```
The receiving end (`proxy` or `master`) logs entries like these:
The receiving end (proxy or parent) logs entries like these:
```
2017-02-25 01:58:04: netdata: INFO : STREAM [receive from [10.11.12.11]:33554]: new client connection.
@ -221,14 +229,14 @@ For Netdata v1.9+, streaming can also be monitored via `access.log`.
### Securing streaming communications
Netdata does not activate TLS encryption by default. To encrypt streaming connections, you first need to [enable TLS support](/web/server/README.md#enabling-tls-support) on the master. With encryption enabled on the receiving side, you need to instruct the slave to use TLS/SSL as well. On the slave's `stream.conf`, configure the destination as follows:
Netdata does not activate TLS encryption by default. To encrypt streaming connections, you first need to [enable TLS support](/web/server/README.md#enabling-tls-support) on the parent. With encryption enabled on the receiving side, you need to instruct the child to use TLS/SSL as well. On the child's `stream.conf`, configure the destination as follows:
```
[stream]
destination = host:port:SSL
```
The word `SSL` appended to the end of the destination tells the slave that connections must be encrypted.
The word `SSL` appended to the end of the destination tells the child that connections must be encrypted.
> While Netdata uses Transport Layer Security (TLS) 1.2 to encrypt communications rather than the obsolete SSL protocol,
> it's still common practice to refer to encrypted web connections as `SSL`. Many vendors, like Nginx and even Netdata
@ -237,7 +245,7 @@ The word `SSL` appended to the end of the destination tells the slave that conne
#### Certificate verification
When TLS/SSL is enabled on the slave, the default behavior will be to not connect with the master unless the server's certificate can be verified via the default chain. In case you want to avoid this check, add the following to the slave's `stream.conf` file:
When TLS/SSL is enabled on the child, the default behavior will be to not connect with the parent unless the server's certificate can be verified via the default chain. In case you want to avoid this check, add the following to the child's `stream.conf` file:
```
[stream]
@ -252,15 +260,15 @@ Given these known issues, you have two options. If you trust your certificate, y
For more details about these options, you can read about [verify locations](https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_load_verify_locations.html).
Before you changed your streaming configuration, you need to copy your trusted certificate to your slave system and add the certificate to OpenSSL's list.
Before you changed your streaming configuration, you need to copy your trusted certificate to your child system and add the certificate to OpenSSL's list.
On most Linux distributions, the `update-ca-certificates` command searches inside the `/usr/share/ca-certificates` directory for certificates. You should double-check by reading the `update-ca-certificate` manual (`man update-ca-certificate`), and then change the directory in the below commands if needed.
If you have `sudo` configured on your slave system, you can use that to run the following commands. If not, you'll have to log in as `root` to complete them.
If you have `sudo` configured on your child system, you can use that to run the following commands. If not, you'll have to log in as `root` to complete them.
```
# mkdir /usr/share/ca-certificates/netdata
# cp master_cert.pem /usr/share/ca-certificates/netdata/master_cert.crt
# cp parent_cert.pem /usr/share/ca-certificates/netdata/parent_cert.crt
# chown -R netdata.netdata /usr/share/ca-certificates/netdata/
```
@ -269,7 +277,7 @@ First, you create a new directory to store your certificates for Netdata. Next,
Next, edit the file `/etc/ca-certificates.conf` and add the following line:
```
netdata/master_cert.crt
netdata/parent_cert.crt
```
Now you update the list of certificates running the following, again either as `sudo` or `root`:
@ -281,32 +289,32 @@ Now you update the list of certificates running the following, again either as `
> Some Linux distributions have different methods of updating the certificate list. For more details, please read this
> guide on [addding trusted root certificates](https://github.com/Busindre/How-to-Add-trusted-root-certificates).
Once you update your certificate list, you can set the stream parameters for Netdata to trust the master certificate. Open `stream.conf` for editing and change the following lines:
Once you update your certificate list, you can set the stream parameters for Netdata to trust the parent certificate. Open `stream.conf` for editing and change the following lines:
```
[stream]
CApath = /etc/ssl/certs/
CAfile = /etc/ssl/certs/master_cert.pem
CAfile = /etc/ssl/certs/parent_cert.pem
```
With this configuration, the `CApath` option tells Netdata to search for trusted certificates inside `/etc/ssl/certs`. The `CAfile` option specifies the Netdata master certificate is located at `/etc/ssl/certs/master_cert.pem`. With this configuration, you can skip using the system's entire list of certificates and use Netdata's master certificate instead.
With this configuration, the `CApath` option tells Netdata to search for trusted certificates inside `/etc/ssl/certs`. The `CAfile` option specifies the Netdata parent certificate is located at `/etc/ssl/certs/parent_cert.pem`. With this configuration, you can skip using the system's entire list of certificates and use Netdata's parent certificate instead.
#### Expected behaviors
With the introduction of TLS/SSL, the master-slave communication behaves as shown in the table below, depending on the following configurations:
With the introduction of TLS/SSL, the parent-child communication behaves as shown in the table below, depending on the following configurations:
- **Master TLS (Yes/No)**: Whether the `[web]` section in `netdata.conf` has `ssl key` and `ssl certificate`.
- **Master port TLS (-/force/optional)**: Depends on whether the `[web]` section `bind to` contains a `^SSL=force` or `^SSL=optional` directive on the port(s) used for streaming.
- **Slave TLS (Yes/No)**: Whether the destination in the slave's `stream.conf` has `:SSL` at the end.
- **Slave TLS Verification (yes/no)**: Value of the slave's `stream.conf` `ssl skip certificate verification` parameter (default is no).
- **Parent TLS (Yes/No)**: Whether the `[web]` section in `netdata.conf` has `ssl key` and `ssl certificate`.
- **Parent port TLS (-/force/optional)**: Depends on whether the `[web]` section `bind to` contains a `^SSL=force` or `^SSL=optional` directive on the port(s) used for streaming.
- **Child TLS (Yes/No)**: Whether the destination in the child's `stream.conf` has `:SSL` at the end.
- **Child TLS Verification (yes/no)**: Value of the child's `stream.conf` `ssl skip certificate verification` parameter (default is no).
| Master TLS enabled|Master port SSL|Slave TLS|Slave SSL Ver.|Behavior|
| Parent TLS enabled|Parent port SSL|Child TLS|Child SSL Ver.|Behavior|
|:----------------:|:-------------:|:-------:|:------------:|:-------|
| No|-|No|no|Legacy behavior. The master-slave stream is unencrypted.|
| Yes|force|No|no|The master rejects the slave connection.|
| Yes|-/optional|No|no|The master-slave stream is unencrypted (expected situation for legacy slaves and newer masters)|
| Yes|-/force/optional|Yes|no|The master-slave stream is encrypted, provided that the master has a valid TLS/SSL certificate. Otherwise, the slave refuses to connect.|
| Yes|-/force/optional|Yes|yes|The master-slave stream is encrypted.|
| No|-|No|no|Legacy behavior. The parent-child stream is unencrypted.|
| Yes|force|No|no|The parent rejects the child connection.|
| Yes|-/optional|No|no|The parent-child stream is unencrypted (expected situation for legacy child nodes and newer parent nodes)|
| Yes|-/force/optional|Yes|no|The parent-child stream is encrypted, provided that the parent has a valid TLS/SSL certificate. Otherwise, the child refuses to connect.|
| Yes|-/force/optional|Yes|yes|The parent-child stream is encrypted.|
## Viewing remote host dashboards, using mirrored databases
@ -323,9 +331,7 @@ Auto-scaling is probably the most trendy service deployment strategy these days.
Auto-scaling detects the need for additional resources and boots VMs on demand, based on a template. Soon after they start running the applications, a load balancer starts distributing traffic to them, allowing the service to grow horizontally to the scale needed to handle the load. When demands falls, auto-scaling starts shutting down VMs that are no longer needed.
<p align="center">
<img src="https://cloud.githubusercontent.com/assets/2662304/23627426/65a9074a-02b9-11e7-9664-cd8f258a00af.png"/>
</p>
![Monitoring ephemeral nodes with Netdata](https://cloud.githubusercontent.com/assets/2662304/23627426/65a9074a-02b9-11e7-9664-cd8f258a00af.png)
What a fantastic feature for controlling infrastructure costs! Pay only for what you need for the time you need it!
@ -348,84 +354,83 @@ Following the Netdata way of monitoring, we wanted:
All monitoring solutions, including Netdata, work like this:
1. `collect metrics`, from the system and the running applications
2. `store metrics`, in a time-series database
3. `examine metrics` periodically, for triggering alarms and sending alarm notifications
4. `visualize metrics`, so that users can see what exactly is happening
1. Collect metrics from the system and the running applications
2. Store metrics in a time-series database
3. Examine metrics periodically, for triggering alarms and sending alarm notifications
4. Visualize metrics so that users can see what exactly is happening
Netdata used to be self-contained, so that all these functions were handled entirely by each server. The changes we made, allow each Netdata to be configured independently for each function. So, each Netdata can now act as:
- a `self contained system`, much like it used to be.
- a `data collector`, that collects metrics from a host and pushes them to another Netdata (with or without a local database and alarms).
- a `proxy`, that receives metrics from other hosts and pushes them immediately to other Netdata servers. Netdata proxies can also be `store and forward proxies` meaning that they are able to maintain a local database for all metrics passing through them (with or without alarms).
- a `time-series database` node, where data are kept, alarms are run and queries are served to visualise the metrics.
- A self-contained system, much like it used to be.
- A data collector that collects metrics from a host and pushes them to another Netdata (with or without a local database and alarms).
- A proxy, which receives metrics from other hosts and pushes them immediately to other Netdata servers. Netdata proxies can also be `store and forward proxies` meaning that they are able to maintain a local database for all metrics passing through them (with or without alarms).
- A time-series database node, where data are kept, alarms are run and queries are served to visualise the metrics.
### Configuring an auto-scaling setup
<p align="center">
<img src="https://cloud.githubusercontent.com/assets/2662304/23627468/96daf7ba-02b9-11e7-95ac-1f767dd8dab8.png"/>
</p>
![A diagram of an auto-scaling setup with Netdata](https://user-images.githubusercontent.com/1153921/84290043-0c1c1600-aaf8-11ea-9757-dd8dd8a8ec6c.png)
You need a Netdata `master`. This node should not be ephemeral. It will be the node where all ephemeral nodes (let's call them `slaves`) will be sending their metrics.
You need a Netdata parent. This node should not be ephemeral. It will be the node where all ephemeral child
nodes will send their metrics.
The master will need to authorize the slaves for accepting their metrics. This is done with an API key.
The parent will need to authorize child nodes to receive their metrics. This is done with an API key.
#### API keys
API keys are just random GUIDs. Use the Linux command `uuidgen` to generate one. You can use the same API key for all your `slaves`, or you can configure one API for each of them. This is entirely your decision.
API keys are just random GUIDs. Use the Linux command `uuidgen` to generate one. You can use the same API key for all your child nodes, or you can configure one API for each of them. This is entirely your decision.
We suggest to use the same API key for each ephemeral node template you have, so that all replicas of the same ephemeral node will have exactly the same configuration.
I will use this API_KEY: `11111111-2222-3333-4444-555555555555`. Replace it with your own.
#### Configuring the `master`
#### Configuring the parent
On the master, edit `/etc/netdata/stream.conf` (to edit it on your system run `/etc/netdata/edit-config stream.conf`) and set these:
On the parent, edit `/etc/netdata/stream.conf` (to edit it on your system run `/etc/netdata/edit-config stream.conf`) and set these:
```bash
[11111111-2222-3333-4444-555555555555]
# enable/disable this API key
enabled = yes
# one hour of data for each of the slaves
# one hour of data for each of the child nodes
default history = 3600
# do not save slave metrics on disk
# do not save child metrics on disk
default memory = ram
# alarms checks, only while the slave is connected
# alarms checks, only while the child is connected
health enabled by default = auto
```
_`stream.conf` on master, to enable receiving metrics from slaves using the API key._
_`stream.conf` on the parent, to enable receiving metrics from its child ndoes using the API key._
If you used many API keys, you can add one such section for each API key.
When done, restart Netdata on the `master` node. It is now ready to receive metrics.
When done, restart Netdata on the parent node. It is now ready to receive metrics.
Note that `health enabled by default = auto` will still trigger `last_collected` alarms, if a connected slave does not exit gracefully. If the `netdata` process running on the slave is
stopped, it will close the connection to the master, ensuring that no `last_collected` alarms are triggered. For example, a proper container restart would first terminate
the `netdata` process, but a system power issue would leave the connection open on the master side. In the second case, you will still receive alarms.
Note that `health enabled by default = auto` will still trigger `last_collected` alarms, if a connected child does not exit gracefully. If the `netdata` process running on the child is
stopped, it will close the connection to the parent, ensuring that no `last_collected` alarms are triggered. For example, a proper container restart would first terminate
the `netdata` process, but a system power issue would leave the connection open on the parent side. In the second case, you will still receive alarms.
#### Configuring the `slaves`
#### Configuring the child nodes
On each of the slaves, edit `/etc/netdata/stream.conf` (to edit it on your system run `/etc/netdata/edit-config stream.conf`) and set these:
On each of the child nodes, edit `/etc/netdata/stream.conf` (to edit it on your system run `/etc/netdata/edit-config stream.conf`) and set these:
```bash
[stream]
# stream metrics to another Netdata
enabled = yes
# the IP and PORT of the master
# the IP and PORT of the parent
destination = 10.11.12.13:19999
# the API key to use
api key = 11111111-2222-3333-4444-555555555555
```
_`stream.conf` on slaves, to enable pushing metrics to master at `10.11.12.13:19999`._
_`stream.conf` on child nodes, to enable pushing metrics to their parent at `10.11.12.13:19999`._
Using just the above configuration, the `slaves` will be pushing their metrics to the `master` Netdata, but they will still maintain a local database of the metrics and run health checks. To disable them, edit `/etc/netdata/netdata.conf` and set:
Using just the above configuration, the child nodes will be pushing their metrics to the parent Netdata, but they will still maintain a local database of the metrics and run health checks. To disable them, edit `/etc/netdata/netdata.conf` and set:
```bash
[global]
@ -437,9 +442,9 @@ Using just the above configuration, the `slaves` will be pushing their metrics t
enabled = no
```
_`netdata.conf` configuration on slaves, to disable the local database and health checks._
_`netdata.conf` configuration on child nodes, to disable the local database and health checks._
Keep in mind that setting `memory mode = none` will also force `[health].enabled = no` (health checks require access to a local database). But you can keep the database and disable health checks if you need to. You are however sending all the metrics to the master server, which can handle the health checking (`[health].enabled = yes`)
Keep in mind that setting `memory mode = none` will also force `[health].enabled = no` (health checks require access to a local database). But you can keep the database and disable health checks if you need to. You are however sending all the metrics to the parent node, which can handle the health checking (`[health].enabled = yes`)
#### Netdata unique id
@ -449,15 +454,15 @@ The file `/var/lib/netdata/registry/netdata.public.unique.id` contains a random
#### Troubleshooting metrics streaming
Both the sender and the receiver of metrics log information at `/var/log/netdata/error.log`.
Both parent and child nodes log information at `/var/log/netdata/error.log`.
On both master and slave do this:
Run the following on both the parent and child nodes:
```
tail -f /var/log/netdata/error.log | grep STREAM
```
If the slave manages to connect to the master you will see something like (on the master):
If the child manages to connect to the parent you will see something like (on the parent):
```
2017-03-09 09:38:52: netdata: INFO : STREAM [receive from [10.11.12.86]:38564]: new client connection.
@ -467,7 +472,7 @@ If the slave manages to connect to the master you will see something like (on th
2017-03-09 09:38:52: netdata: INFO : STREAM xxx [receive from [10.11.12.86]:38564]: receiving metrics...
```
and something like this on the slave:
and something like this on the child:
```
2017-03-09 09:38:28: netdata: INFO : STREAM xxx [send to box:19999]: connecting...
@ -478,7 +483,8 @@ and something like this on the slave:
### Archiving to a time-series database
The `master` Netdata node can also archive metrics, for all `slaves`, to a time-series database. At the time of this writing, Netdata supports:
The parent Netdata node can also archive metrics, for all its child nodes, to a time-series database. At the time of
this writing, Netdata supports:
- graphite
- opentsdb
@ -486,13 +492,12 @@ The `master` Netdata node can also archive metrics, for all `slaves`, to a time-
- json document DBs
- all the compatibles to the above (e.g. kairosdb, influxdb, etc)
Check the Netdata [backends documentation](/backends/README.md) for configuring this.
Check the Netdata [exporting documentation](/docs/export/README.md) for configuring this.
This is how such a solution will work:
<p align="center">
<img src="https://cloud.githubusercontent.com/assets/2662304/23627295/e3569adc-02b8-11e7-9d55-4014bf98c1b3.png"/>
</p>
![Diagram showing an example configuration for archiving to a time-series
database](https://user-images.githubusercontent.com/1153921/84291308-c2ccc600-aaf9-11ea-98a9-89ccbf3a62dd.png)
### An advanced setup
@ -522,93 +527,93 @@ For a practical example see [Monitoring ephemeral nodes](#monitoring-ephemeral-n
## Troubleshooting streaming connections
This section describes the most common issues you might encounter when connecting slave and master Netdata agents.
This section describes the most common issues you might encounter when connecting parent and child nodes.
### Slow connections between slave and master
### Slow connections between parent and child
When you have a slow connection between master and slave, Netdata raises a few different errors. Most of the errors will
appear in the slave's `error.log`.
When you have a slow connection between parent and child, Netdata raises a few different errors. Most of the
errors will appear in the child's `error.log`.
```
netdata ERROR : STREAM_SENDER[SLAVE HOSTNAME] : STREAM SLAVE HOSTNAME [send to MASTER IP:MASTER PORT]: too many data pending - buffer is X bytes long,
```bash
netdata ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM CHILD HOSTNAME [send to PARENT IP:PARENT PORT]: too many data pending - buffer is X bytes long,
Y unsent - we have sent Z bytes in total, W on this connection. Closing connection to flush the data.
```
On the master side, you may see various error messages, most commonly the following:
On the parent side, you may see various error messages, most commonly the following:
```
netdata ERROR : STREAM_RECEIVER[SLAVE HOSTNAME,[SLAVE IP]:SLAVE PORT] : read failed: end of file
netdata ERROR : STREAM_PARENT[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : read failed: end of file
```
Another common problem in slow connections is the slave sending a partial message to the master. In this case, the
master will write the following in its `error.log`:
Another common problem in slow connections is the CHILD sending a partial message to the parent. In this case,
the parent will write the following in its `error.log`:
```
ERROR : STREAM_RECEIVER[SLAVE HOSTNAME,[SLAVE IP]:SLAVE PORT] : sent command 'B' which is not known by netdata, for host 'HOSTNAME'. Disabling it.
ERROR : STREAM_RECEIVER[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : sent command 'B' which is not known by netdata, for host 'HOSTNAME'. Disabling it.
```
In this example, `B` was part of a `BEGIN` message that was cut due to connection problems.
Slow connections can also cause problems when the master misses a message and then recieves a command related to the
missed message. For example, a master might miss a message containing the slave's charts, and then doesn't know what to
do with the `SET` message that follows. When that happens, the master will show a message like this:
Slow connections can also cause problems when the parent misses a message and then recieves a command related to the
missed message. For example, a parent might miss a message containing the child's charts, and then doesn't know
what to do with the `SET` message that follows. When that happens, the parent will show a message like this:
```
ERROR : STREAM_RECEIVER[SLAVE HOSTNAME,[SLAVE IP]:SLAVE PORT] : requested a SET on chart 'CHART NAME' of host 'HOSTNAME', without a dimension. Disabling it.
ERROR : STREAM_RECEIVER[CHILD HOSTNAME,[CHILD IP]:CHILD PORT] : requested a SET on chart 'CHART NAME' of host 'HOSTNAME', without a dimension. Disabling it.
```
### Slave cannot connect to master
### child cannot connect to parent
When the slave can't connect to a master for any reason (misconfiguration, networking, firewalls, master down), you will
see the following in the slave's `error.log`.
When the child can't connect to a parent for any reason (misconfiguration, networking, firewalls, parent
down), you will see the following in the child's `error.log`.
```
ERROR : STREAM_SENDER[HOSTNAME] : Failed to connect to 'MASTER IP', port 'MASTER PORT' (errno 113, No route to host)
ERROR : STREAM_SENDER[HOSTNAME] : Failed to connect to 'PARENT IP', port 'PARENT PORT' (errno 113, No route to host)
```
### 'Is this a Netdata?'
This question can appear when Netdata starts the stream and receives an unexpected response. This error can appear when
the master is using SSL and the slave tries to connect using plain text. You will also see this message when Netdata
connects to another server that isn't Netdata. The complete error message will look like this:
the parent is using SSL and the child tries to connect using plain text. You will also see this message when
Netdata connects to another server that isn't Netdata. The complete error message will look like this:
```
ERROR : STREAM_SENDER[SLAVE HOSTNAME] : STREAM SLAVE HOSTNAME [send to MASTER HOSTNAME:MASTER PORT]: server is not replying properly (is it a netdata?).
ERROR : STREAM_SENDER[CHILD HOSTNAME] : STREAM child HOSTNAME [send to PARENT HOSTNAME:PARENT PORT]: server is not replying properly (is it a netdata?).
```
### Stream charts wrong
Chart data needs to be consistent between slave and master agents. If there are differences between chart data on a
master and a slave, such as gaps in metrics collection, it most often means your slave's `memory mode` does not match
the master's. To learn more about the different ways Netdata can store metrics, and thus keep chart data consistent,
read our [memory mode documentation](/database/README.md).
Chart data needs to be consistent between child and parent nodes. If there are differences between chart data on
a parent and a child, such as gaps in metrics collection, it most often means your child's `memory mode`
does not match the parent's. To learn more about the different ways Netdata can store metrics, and thus keep chart
data consistent, read our [memory mode documentation](/database/README.md).
### Forbidding access
You may see errors about "forbidding access" for a number of reasons. It could be because of a slow connection between
the master and slave nodes, but it could also be due to other failures. Look in your master's `error.log` for errors
the parent and child nodes, but it could also be due to other failures. Look in your parent's `error.log` for errors
that look like this:
```
STREAM [receive from [SLAVE HOSTNAME]:SLAVE IP]: `MESSAGE`. Forbidding access."
STREAM [receive from [child HOSTNAME]:child IP]: `MESSAGE`. Forbidding access."
```
`MESSAGE` will have one of the following patterns:
- `request without KEY` : The message received is incomplete and the KEY value can be API, hostname, machine GUID.
- `API key 'VALUE' is not valid GUID`: The UUID received from slave does not have the format defined in [RFC 4122]
- `API key 'VALUE' is not valid GUID`: The UUID received from child does not have the format defined in [RFC 4122]
(https://tools.ietf.org/html/rfc4122)
- `machine GUID 'VALUE' is not GUID.`: This error with machine GUID is like the previous one.
- `API key 'VALUE' is not allowed`: This stream has a wrong API key.
- `API key 'VALUE' is not permitted from this IP`: The IP is not allowed to use STREAM with this master.
- `API key 'VALUE' is not permitted from this IP`: The IP is not allowed to use STREAM with this parent.
- `machine GUID 'VALUE' is not allowed.`: The GUID that is trying to send stream is not allowed.
- `Machine GUID 'VALUE' is not permitted from this IP. `: The IP does not match the pattern or IP allowed to connect
to use stream.
### Netdata could not create a stream
The connection between master and slave is a stream. When the master can't convert the initial connection into a stream,
it will write the following message inside `error.log`:
The connection between parent and child is a stream. When the parent can't convert the initial connection into
a stream, it will write the following message inside `error.log`:
```
file descriptor given is not a valid stream

View File

@ -48,7 +48,7 @@ Using the above, Netdata will bind to:
- IPv4 127.0.0.1 at port 19999 (port was used from `default port`). Only the UI (dashboard) and the read API will be accessible on this port. Both HTTP and HTTPS requests will be accepted.
- IPv4 10.1.1.1 at port 19998. The management API and `netdata.conf` will be accessible on this port.
- All the IPs `hostname` resolves to (both IPv4 and IPv6 depending on the resolved IPs) at port 19997. Only badges will be accessible on this port.
- All IPv6 IPs at port 19996. Only metric streaming requests from other Netdata agents will be accepted on this port. Only encrypted streams will be allowed (i.e. slaves also need to be [configured for TLS](/streaming/README.md).
- All IPv6 IPs at port 19996. Only metric streaming requests from other Netdata agents will be accepted on this port. Only encrypted streams will be allowed (i.e. child nodes also need to be [configured for TLS](/streaming/README.md).
- All the IPs `localhost` resolves to (both IPv4 and IPv6 depending the resolved IPs) at port 19996. This port will only accept registry API requests.
- All IPv4 and IPv6 IPs at port `http` as set in `/etc/services`. Only the UI (dashboard) and the read API will be accessible on this port.
- Unix domain socket `/run/netdata/netdata.sock`. All requests are serviceable on this socket. Note that in some OSs like Fedora, every service sees a different `/tmp`, so don't create a Unix socket under `/tmp`. `/run` or `/var/run` is suggested.
@ -67,7 +67,8 @@ The API requests are serviced as follows:
### Enabling TLS support
Since v1.16.0, Netdata supports encrypted HTTP connections to the web server, plus encryption of streaming data between a slave and its master, via the TLS protocol.
Since v1.16.0, Netdata supports encrypted HTTP connections to the web server, plus encryption of streaming data to a
parent from its child nodes, via the TLS protocol.
Inbound unix socket connections are unaffected, regardless of the TLS settings.
@ -84,7 +85,7 @@ To enable TLS, provide the path to your certificate and private key in the `[web
ssl certificate = /etc/netdata/ssl/cert.pem
```
Both files must be readable by the `netdata` user. If either of these files do not exist or are unreadable, Netdata will fall back to HTTP. For a master/slave connection, only the master needs these settings.
Both files must be readable by the `netdata` user. If either of these files do not exist or are unreadable, Netdata will fall back to HTTP. For a parent-child connection, only the parent needs these settings.
For test purposes, you can generate self-signed certificates with the following command:
@ -119,7 +120,7 @@ While Netdata accepts all the TLS version as arguments (`1` or `1.0`, `1.1`, `1.
When the certificates are defined and unless any other options are provided, a Netdata server will:
- Redirect all incoming HTTP web server requests to HTTPS. Applies to the dashboard, the API, `netdata.conf` and badges.
- Allow incoming slave connections to use both unencrypted and encrypted communications for streaming.
- Allow incoming child connections to use both unencrypted and encrypted communications for streaming.
To change this behavior, you need to modify the `bind to` setting in the `[web]` section of `netdata.conf`. At the end of each port definition, you can append `^SSL=force` or `^SSL=optional`. What happens with these settings differs, depending on whether the port is used for HTTP/S requests, or for streaming.
@ -136,7 +137,7 @@ Example:
bind to = *=dashboard|registry|badges|management|streaming|netdata.conf^SSL=force
```
For information how to configure the slaves to use TLS, check [securing the communication](/streaming/README.md#securing-streaming-communications) in the streaming documentation. There you will find additional details on the expected behavior for client and server nodes, when their respective TLS options are enabled.
For information how to configure the child to use TLS, check [securing the communication](/streaming/README.md#securing-streaming-communications) in the streaming documentation. There you will find additional details on the expected behavior for client and server nodes, when their respective TLS options are enabled.
When we define the use of SSL in a Netdata agent for different ports, Netdata will apply the behavior specified on each port. For example, using the configuration line below:
@ -148,7 +149,7 @@ When we define the use of SSL in a Netdata agent for different ports, Netdata w
Netdata will:
- Force all HTTP requests to the default port to be redirected to HTTPS (same port).
- Refuse unencrypted streaming connections from slaves on the default port.
- Refuse unencrypted streaming connections from child nodes on the default port.
- Allow both HTTP and HTTPS requests to port 20000 for `netdata.conf`
- Force HTTP requests to port 20001 to be redirected to HTTPS (same port). Only allow requests for the dashboard, the read API and the registry on port 20001.
@ -185,7 +186,7 @@ Netdata supports access lists in `netdata.conf`:
- `allow badges from` checks if the API request is for a badge. Badges are not matched by `allow dashboard from`.
- `allow streaming from` checks if the slave willing to stream metrics to this Netdata is allowed.
- `allow streaming from` checks if the child willing to stream metrics to this Netdata is allowed.
This can be controlled per API KEY and MACHINE GUID in `stream.conf`.
The setting in `netdata.conf` is checked before the ones in `stream.conf`.
@ -225,7 +226,7 @@ present that may match DNS FQDNs.
|web files group|`netdata`|If this is set, Netdata will check if the file is owned by this group and refuse to serve the file if it's not.|
|disconnect idle clients after seconds|`60`|The time in seconds to disconnect web clients after being totally idle.|
|timeout for first request|`60`|How long to wait for a client to send a request before closing the socket. Prevents slow request attacks.|
|accept a streaming request every seconds|`0`|Can be used to set a limit on how often a master Netdata server will accept streaming requests from the slaves in a [streaming and replication setup](/streaming/README.md)|
|accept a streaming request every seconds|`0`|Can be used to set a limit on how often a parent node will accept streaming requests from child nodes in a [streaming and replication setup](/streaming/README.md)|
|respect do not track policy|`no`|If set to `yes`, will respect the client's browser preferences on storing cookies.|
|x-frame-options response header||[Avoid clickjacking attacks, by ensuring that the content is not embedded into other sites](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options).|
|enable gzip compression|`yes`|When set to `yes`, Netdata web responses will be GZIP compressed, if the web client accepts such responses.|