Commit Graph

75 Commits

Author SHA1 Message Date
vkalintiris 115d074a6c
Create a top-level directory to contain source code. (#16896)
* Move ML under src

* Move spwan under src

* Move cli/ under src/

* move registry/ under src/

* move streaming/ under src/

* Move claim under src. Update docs

* Move database/ under src/

* Move libnetdata/ under src/

* Update references to libnetdata

* Fix logsmanagement includes

* Update generated script path.
2024-02-01 13:41:44 +02:00
Costa Tsaousis 841d9f125a
add the CLOEXEC flag to all sockets and files (#16881)
* add the CLOEXEC flag to all sockets and files

* add network-viewer to apps.plugin; min update frequency 5 seconds
2024-01-31 12:47:20 +02:00
Costa Tsaousis 84474006d4
New Permissions System (#16837)
* wip of migrating to bitmap permissions

* replace role with bitmapped permissions

* formatting permissions using macros

* accept view and edit permissions for all dynamic configuration

* work on older compilers

* parse the header in hex

* agreed permissions updates

* map permissions to old roles

* new permissions management

* fix function rename

* build libdatachannel when enabled - currently for code maintainance

* dyncfg now keeps 2 sets of statuses, to keep track of what happens to dyncfg and what actually happens with the plugin

* complete the additions of jobs and solve unittests

* fix renumbering of ACL bits

* processes function shows the cmdline based on permissions and the presence of the sensitive data permission

* now the agent returns 412 when authorization is missing, 403 when authorization exists but permissions are not enough, 451 when access control list prevents the user from accessing the dashboard

* enable cmdline on processes with thhe HTTP_ACCESS_VIEW_AGENT_CONFIG permission

* by default functions require anonymous-data access

* fix compilation on debian

* fix left-over renamed define

* updated schema for alerts

* updated permissions

* require a name when loading json payloads, if the name is not provided by dyncfg
2024-01-29 09:18:01 +02:00
vkalintiris 92842d8422
CMake build system. (#15996)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud>
Co-authored-by: Tasos Katsoulas <12612986+tkatsoulas@users.noreply.github.com>
Co-authored-by: Emmanuel Vasilakis <mrzammler@mm.st>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: netdatabot <bot@netdata.cloud>
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
2023-12-13 16:41:20 +02:00
Ilya Mashchenko 3ab4c07118
convert some error messages to info (#16508) 2023-11-30 15:56:51 +02:00
Costa Tsaousis 3e508c8f95
New logging layer (#16357)
* cleanup of logging - wip

* first working iteration

* add errno annotator

* replace old logging functions with netdata_logger()

* cleanup

* update error_limit

* fix remanining error_limit references

* work on fatal()

* started working on structured logs

* full cleanup

* default logging to files; fix all plugins initialization

* fix formatting of numbers

* cleanup and reorg

* fix coverity issues

* cleanup obsolete code

* fix formatting of numbers

* fix log rotation

* fix for older systems

* add detection of systemd journal via stderr

* finished on access.log

* remove left-over transport

* do not add empty fields to the logs

* journal get compact uuids; X-Transaction-ID header is added in web responses

* allow compiling on systems without memfd sealing

* added libnetdata/uuid directory

* move datetime formatters to libnetdata

* add missing files

* link the makefiles in libnetdata

* added uuid_parse_flexi() to parse UUIDs with and without hyphens; the web server now read X-Transaction-ID and uses it for functions and web responses

* added stream receiver, sender, proc plugin and pluginsd log stack

* iso8601 advanced usage; line_splitter module in libnetdata; code cleanup

* add message ids to streaming inbound and outbound connections

* cleanup line_splitter between lines to avoid logging garbage; when killing children, kill them with SIGABRT if internal checks is enabled

* send SIGABRT to external plugins only if we are not shutting down

* fix cross cleanup in pluginsd parser

* fatal when there is a stack error in logs

* compile netdata with -fexceptions

* do not kill external plugins with SIGABRT

* metasync info logs to debug level

* added severity to logs

* added json output; added options per log output; added documentation; fixed issues mentioned

* allow memfd only on linux

* moved journal low level functions to journal.c/h

* move health logs to daemon.log with proper priorities

* fixed a couple of bugs; health log in journal

* updated docs

* systemd-cat-native command to push structured logs to journal from the command line

* fix makefiles

* restored NETDATA_LOG_SEVERITY_LEVEL

* fix makefiles

* systemd-cat-native can also work as the logger of Netdata scripts

* do not require a socket to systemd-journal to log-as-netdata

* alarm notify logs in native format

* properly compare log ids

* fatals log alerts; alarm-notify.sh working

* fix overflow warning

* alarm-notify.sh now logs the request (command line)

* anotate external plugins logs with the function cmd they run

* added context, component and type to alarm-notify.sh; shell sanitization removes control character and characters that may be expanded by bash

* reformatted alarm-notify logs

* unify cgroup-network-helper.sh

* added quotes around params

* charts.d.plugin switched logging to journal native

* quotes for logfmt

* unify the status codes of streaming receivers and senders

* alarm-notify: dont log anything, if there is nothing to do

* all external plugins log to stderr when running outside netdata; alarm-notify now shows an error when notifications menthod are needed but are not available

* migrate cgroup-name.sh to new logging

* systemd-cat-native now supports messages with newlines

* socket.c logs use priority

* cleanup log field types

* inherit the systemd set INVOCATION_ID if found

* allow systemd-cat-native to send messages to a systemd-journal-remote URL

* log2journal command that can convert structured logs to journal export format

* various fixes and documentation of log2journal

* updated log2journal docs

* updated log2journal docs

* updated documentation of fields

* allow compiling without libcurl

* do not use socket as format string

* added version information to newly added tools

* updated documentation and help messages

* fix the namespace socket path

* print errno with error

* do not timeout

* updated docs

* updated docs

* updated docs

* log2journal updated docs and params

* when talking to a remote journal, systemd-cat-native batches the messages

* enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote

* Revert "enable lz4 compression for systemd-cat-native when sending messages to a systemd-journal-remote"

This reverts commit b079d53c11.

* note about uncompressed traffic

* log2journal: code reorg and cleanup to make modular

* finished rewriting log2journal

* more comments

* rewriting rules support

* increased limits

* updated docs

* updated docs

* fix old log call

* use journal only when stderr is connected to journal

* update netdata.spec for libcurl, libpcre2 and log2journal

* pcre2-devel

* do not require pcre2 in centos < 8, amazonlinux < 2023, open suse

* log2journal only on systems pcre2 is available

* ignore log2journal in .gitignore

* avoid log2journal on centos 7, amazonlinux 2 and opensuse

* add pcre2-8 to static build

* undo last commit

* Bundle to static

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add build deps for deb packages

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Add dependencies; build from source

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Test build for amazon linux and centos expect to fail for suse

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* fix minor oversight

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

* Reorg code

* Add the install from source (deps) as a TODO
* Not enable the build on suse ecosystem

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>

---------

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-11-22 10:27:25 +02:00
Costa Tsaousis badb017579
registry action hello should always work (#16212)
* registry action hello should always work

* update docs
2023-10-16 11:39:21 +03:00
Costa Tsaousis b04cf59304
return 412 instead of 403 when a bearer token is required (#15808) 2023-08-14 23:18:39 +03:00
Costa Tsaousis ce75313de0
systemd-journal plugin (#15363) 2023-08-03 15:42:11 +03:00
vkalintiris 0e230a260e
Revert "Refactor RRD code. (#15423)" (#15723)
This reverts commit 440bd51e08.

dbengine was still being used for non-zero tiers
even on non-dbengine modes.
2023-08-03 13:13:36 +03:00
vkalintiris 440bd51e08
Refactor RRD code. (#15423)
* Storage engine.

* Host indexes to rrdb

* Move globals to rrdb

* Move storage_tiers_backfill to rrdb

* default_rrd_update_every to rrdb

* default_rrd_history_entries to rrdb

* gap_when_lost_iterations_above to rrdb

* rrdset_free_obsolete_time_s to rrdb

* libuv_worker_threads to rrdb

* ieee754_doubles to rrdb

* rrdhost_free_orphan_time_s to rrdb

* rrd_rwlock to rrdb

* localhost to rrdb

* rm extern from func decls

* mv rrd macro under rrd.h

* default_rrdeng_page_cache_mb to rrdb

* default_rrdeng_extent_cache_mb to rrdb

* db_engine_journal_check to rrdb

* default_rrdeng_disk_quota_mb to rrdb

* default_multidb_disk_quota_mb to rrdb

* multidb_ctx to rrdb

* page_type_size to rrdb

* tier_page_size to rrdb

* No storage_engine_id in rrdim functions

* storage_engine_id is provided by st

* Update to fix merge conflict.

* Update field name

* Remove unnecessary macros from rrd.h

* Rm unused type decls

* Rm duplicate func decls

* make internal function static

* Make the rest of public dbengine funcs accept a storage_instance.

* No more rrdengine_instance :)

* rm rrdset_debug from rrd.h

* Use rrdb to access globals in ML and ACLK

Missed due to not having the submodules in the
worktree.

* rm total_number

* rm RRDVAR_TYPE_TOTAL

* rm unused inline

* Rm names from typedef'd enums

* rm unused header include

* Move include

* Rm unused header include

* s/rrdhost_find_or_create/rrdhost_get_or_create/g

* s/find_host_by_node_id/rrdhost_find_by_node_id/

Also, remove duplicate definition in rrdcontext.c

* rm macro used only once

* rm macro used only once

* Reduce rrd.h api by moving funcs into a collector specific utils header

* Remove unused func

* Move parser specific function out of rrd.h

* return storage_number instead of void pointer

* move code related to rrd initialization out of rrdhost.c

* Remove tier_grouping from rrdim_tier

Saves 8 * storage_tiers bytes per dimension.

* Fix rebase

* s/rrd_update_every/update_every/

* Mark functions as static and constify args

* Add license notes and file to build systems.

* Remove remaining non-log/config mentions of memory mode

* Move rrdlabels api to separate file.

Also, move localhost functions that loads
labels outside of database/ and into daemon/

* Remove function decl in rrd.h

* merge rrdhost_cache_dir_for_rrdset_alloc into rrdset_cache_dir

* Do not expose internal function from rrd.h

* Rm NETDATA_RRD_INTERNALS

Only one function decl is covered. We have more
database internal functions that we currently
expose for no good reason. These will be placed
in a separate internal header in follow up PRs.

* Add license note

* Include libnetdata.h instead of aral.h

* Use rrdb to access localhost

* Fix builds without dbengine

* Add header to build system files

* Add rrdlabels.h to build systems

* Move func def from rrd.h to rrdhost.c

* Fix macos build

* Rm non-existing function

* Rebase master

* Define buffer length macro in ad_charts.

* Fix FreeBSD builds.

* Mark functions static

* Rm func decls without definitions

* Rebase master

* Rebase master

* Properly initialize value of storage tiers.

* Fix build after rebase.
2023-07-26 15:30:49 +03:00
Costa Tsaousis c35a969dc8
added cloud status in registry?action=hello (#15530)
Co-authored-by: ilyam8 <ilya@netdata.cloud>
2023-07-25 23:52:38 +03:00
Costa Tsaousis 762701f905
fix unlocked registry access and add hostname to search response (#15426) 2023-07-17 22:48:59 +03:00
Costa Tsaousis a05a8f4074
Pre release fixes (#15405) 2023-07-14 21:04:42 +03:00
Costa Tsaousis c5b6c59928
dont add all nodes to registry action hello (#15390) 2023-07-13 22:52:51 +03:00
Costa Tsaousis 7519f01891
Revert "dont add all nodes to registry action hello" (#15389)
Revert "dont add all nodes to registry action hello (#15388)"

This reverts commit 76d10ab250.
2023-07-13 22:48:29 +03:00
Costa Tsaousis 76d10ab250
dont add all nodes to registry action hello (#15388) 2023-07-13 22:47:10 +03:00
Costa Tsaousis b61ddad5e6
agent alert notifications redirect (#15350)
* agent alert notifications redirect

* set the same cookies with SameSite: Strict

* registry search now requires only "for" parameter

* registry responses are not cacheable

* fix typo and add more error checking

* registry memory when mmap is used

* fix free with aral
2023-07-12 21:08:44 +03:00
thiagoftsm f672f4a955
Rename log Macros (debug) (#15322) 2023-07-11 14:45:16 +00:00
Costa Tsaousis 62acce9151
bearer protection - additions (#15349)
add bearer protection status flag in registry hello response
2023-07-11 13:13:46 +03:00
Stelios Fragkakis 75ecb70175
Fix coverity issues (#15345)
* sensor_name always has a value -- CID 395493: Control flow issues  (DEADCODE)

* Check fopen failure -- CID 395492:  Null pointer dereferences  (NULL_RETURNS)

* CID 395491:  Control flow issues  (DEADCODE)

* sel_count will be >=0 in this case -- CID 395490:  Control flow issues  (DEADCODE)

* Memory leak -- CID 395487:  Resource leaks  (RESOURCE_LEAK)
2023-07-11 12:32:06 +03:00
Costa Tsaousis 77076d8764
bearer improvements (#15342) 2023-07-11 02:28:06 +03:00
Costa Tsaousis 5943203a66
bearer authorization API (#15321)
* bearer authorization API - untested

* add machine guid to bearer token response

* removed REGISTRY_URL and replaced it with STRING

* eliminate url pointer from registry_machine_url

* remove registry_url counters from registry

* Revert "eliminate url pointer from registry_machine_url"

This reverts commit 79eff56f77.

* registry machine urls are now a double linked list

* registry machine urls are now using aral

* all registry objects now use aral

* strings now have 64 partitions and use R/W spinlock

* string to 128 partitions

* fix macro without internal checks

* registry now uses the bearer token when the cookie is not there

* api/v1/registry sends back all nodes on each host

* registry option to use mmap; optimization of registry structures

* do not index the terminator byte in strings; use 256 string partitions

* registry loading optimization

* convert person urls to double linked list to save memory

* re-organize items loading and make sure person urls are always available as machine urls too

* disable registry mmap by default

* keep track of all machine guids and their URLs, even if the cookie cannot be set

* fix bearer parsing
2023-07-10 18:02:02 +03:00
thiagoftsm e0f388c43f
Rename generic `error` function (#15296) 2023-07-06 15:46:48 +00:00
Costa Tsaousis c74bf56ee2
Code reorg and cleanup - enrichment of /api/v2 (#15294)
* claim script now accepts the same params as the kickstart

* rewrote buildinfo to unify all methods

* added cloud unavailable in cloud status

* added all exporters

* renamed httpd to h2o

* rename ENABLE_COMPRESSION to ENABLE_LZ4

* rename global variable

* rename ENABLE_HTTPS to ENABLE_OPENSSL

* fix coverity-scan for openssl

* add lz4 to coverity-scan

* added all plugins and most of the features

* added all plugins and most of the features

* generalize bitmap code so that we can have any size of bitmaps

* cleanup

* fix compilation without protobuf

* fix compilation with others allocators

* fix bitmap

* comprehensive bitmaps unit test

* bitmap as macros

* added developer mode

* added system info to build info

* cloud available/unavailable

* added /api/v2/info

* added units and ni to transitions

* when showing instances and transitions, show only the instances that have transitions

* cleanup

* add missing quotes

* add anchor to transitions

* added more to build info

* calculate retention per tier and expose it to /api/v2/info

* added currently collected metrics

* do not show space and retention when no numbers are available

* fix impossible overflow

* Add function for transitions and execute callback

* In case of error, reset and try next dictionary entry

* Fix error message

* simpler logic to maintain retention per tier

* /api/v2/alert_transitions

* Handle case of recipient null
Convert after and before to usec

* Add classification, type and component

* working /api/v2/alert_transitions

* Fix query to properly handle context and alert name

* cleanup

* Add search with transition

* accept transition in /api/v2/alert_transitions

* totaly dynamic facets

* fixed debug info

* restructured facets

* cleanup; removal of options=transitions

* updated alert entries flags

* method to exec

* Return also exec run timestamp
Temp table cleanup only when we don't execute with a transition

* cleanup obsolete anchor parameter

* Add sql_get_alert_configuration function

* added options=config to alert_transitions

* added /api/v2/alert_config

* preliminary work for /api/v2/claim

* initialize variables; do not expose expected retention if no disk space info is available; do not report aclk as initializing when not claimed

* fix claim session key filename

* put a newline into the session key file

* more progress on claiming

* final /api/v2/claim endpoint

* after claiming, refresh our state at the output

* Fix query to fetch config

* Remove debug log

* add configuration objects

* add configuration objects - fixed

* respect the NETDATA_DISABLE_CLOUD env variable

* NETDATA_DISABLE_CLOUD env variable sets the default, but the config sets the final value

* use a new claimed_id on every claiming

* regenerate random key on claiming and wait for online status

* ignore write() return value when writing a newline

* dont show cloud status disabled when claimed_id is missing

* added ctx to alert instances

* cleanup config and transitions from /api/v2/alerts

* fix unused variable

* in /api/v2/alert_config show 1 config without an array

* show alert values conditionally, by appending options=values

* When storing host info if the key value is empty, store unknown

* added options=summary to control when the alerts summary is shown

* increased http_api_v2 to version 5

* claming random key file is now not world readable

* added local-listeners binary that detects all the listening ports, their IPs and their command lines

---------

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-07-06 01:49:32 +03:00
Carlo Cabrera 5b56f09dbc
Replace `info` macro with a less generic name (#15266) 2023-06-30 21:14:26 +00:00
Stelios Fragkakis 4a507c44fa
Fix coverity issues (#15005)
Fix coverity issues 384022, 384021, 383825
2023-05-03 18:08:56 +03:00
Costa Tsaousis c3d70ffcb4
WEBRTC for communication between agents and browsers (#14874)
* initial webrtc setup

* missing files

* rewrite of webrtc integration

* initialization and cleanup of webrtc connections

* make it compile without libdatachannel

* add missing webrtc_initialize() function when webrtc is not enabled

* make c++17 optional

* add build/m4/ax_compiler_vendor.m4

* add ax_cxx_compile_stdcxx.m4

* added new m4 files to makefile.am

* id all webrtc connections

* show warning when webrtc is disabled

* fixed message

* moved all webrtc error checking inside webrtc.cpp

* working webrtc connection establishment and cleanup

* remove obsolete code

* rewrote webrtc code in C to remove dependency for c++17

* fixed left-over reference

* detect binary and text messages

* minor fix

* naming of webrtc threads

* added webrtc configuration

* fix for thread_get_name_np()

* smaller web_client memory footprint

* universal web clients cache

* free web clients every 100 uses

* webrtc is now enabled by default only when compiled with internal checks

* webrtc responses to /api/ requests, including LZ4 compression

* fix for binary and text messages

* web_client_cache is now global

* unification of the internal web server API, for web requests, aclk request, webrtc requests

* more cleanup and unification of web client timings

* fixed compiler warnings

* update sent and received bytes

* eliminated of almost all big buffers in web client

* registry now uses the new json generation

* cookies are now an array; fixed redirects

* fix redirects, again

* write cookies directly to the header buffer, eliminating the need for cookie structures in web client

* reset the has_cookies flag

* gathered all web client cleanup to one function

* fixes redirects

* added summary.globals in /api/v2/data response

* ars to arc in /api/v2/data

* properly handle host impersonation

* set the context of mem.numa_nodes
2023-04-20 20:49:06 +03:00
Chris Akritidis 157c0fa16b
Remove References category from learn (#14571) 2023-02-20 08:56:06 -08:00
Costa Tsaousis d2daa19bf5
JSON internal API, IEEE754 base64/hex streaming, weights endpoint optimization (#14493)
* first work on standardizing json formatting

* renamed old grouping to time_grouping and added group_by

* add dummy functions to enable compilation

* buffer json api work

* jsonwrap opening with buffer_json_X() functions

* cleanup

* storage for quotes

* optimize buffer printing for both numbers and strings

* removed ; from define

* contexts json generation using the new json functions

* fix buffer overflow at unit test

* weights endpoint using new json api

* fixes to weights endpoint

* check buffer overflow on all buffer functions

* do synchronous queries for weights

* buffer_flush() now resets json state too

* content type typedef

* print double values that are above the max 64-bit value

* str2ndd() can now parse values above UINT64_MAX

* faster number parsing by avoiding double calculations as much as possible

* faster number parsing

* faster hex parsing

* accurate printing and parsing of double values, even for very large numbers that cannot fit in 64bit integers

* full printing and parsing without using library functions - and related unit tests

* added IEEE754 streaming capability to enable streaming of double values in hex

* streaming and replication to transfer all values in hex

* use our own str2ndd for set2

* remove subnormal check from ieee

* base64 encoding for numbers, instead of hex

* when increasing double precision, also make sure the fractional number printed is aligned to the wanted precision

* str2ndd_encoded() parses all encoding formats, including integers

* prevent uninitialized use

* /api/v1/info using the new json API

* Fix error when compiling with --disable-ml

* Remove redundant 'buffer_unittest' declaration

* Fix formatting

* Fix formatting

* Fix formatting

* fix buffer unit test

* apps.plugin using the new JSON API

* make sure the metrics registry does not accept negative timestamps

* do not allow pages with negative timestamps to be loaded from db files; do not accept pages with negative timestamps in the cache

* Fix more formatting

---------

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
2023-02-15 21:16:29 +02:00
Tasos Katsoulas 9f1403de7d
Covert our documentation links to GH absolute links (#14344)
Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
2023-02-02 15:23:54 +02:00
Fotis Voutsas 0541c97e53
Introduce the new Structure of the documentation (#13915)
* Moving the cloud docs under /docs/cloud (previous location: netdata/learn/*)
* Added metadata on almost every document of the old learn site for the new ingest process of learn. 
* Map old learn document to their best fit as topic related docs.

Signed-off-by: Tasos Katsoulas <tasos@netdata.cloud>
Co-authored-by: DShreve2 <david@netdata.cloud>
Co-authored-by: hugovalente-pm <hugo@netdata.cloud>
2023-01-25 15:29:33 +02:00
vkalintiris 2d5f3acf71
Do not force internal collectors to call rrdset_next. (#13926)
* Remove calls to rrdset_next().

* Rm checks plugin

* Update documentantion

* Call rrdset_next from within rrdset_done

This wraps up the removal of rrdset_next from internal collectors, which
removes a lot of unecessary code and the need for if/else clauses in
every place.

The pluginsd parser is the only component that calls rrdset_next*()
functions because it's not strictly speaking a collector but more of a
collector manager/proxy.

With the current changes it's possible to simplify the API we expose
from RRD significantly, but this will be follow-up work in the future.

* Remove stale reference to checks.plugin

* Fix RRD unit test

rrdset_next is not meant to be called from these tests.

* Fix db engine unit test.

* Schedule rrdset_next when we have completed at least one collection.

* Mark chart creation clauses as unlikely.

* Add missing brace to fix FreeBSD plugin.
2022-11-22 04:52:15 +02:00
Timotej S efc5932b13
Fix local dashboard cloud links (#13953)
* fix kickstart message

* workaround for agent UI
2022-11-07 19:40:12 +01:00
vkalintiris ccfbdb5c3d
Remove extern from function declared in headers. (#13790)
By default functions are declared as extern in C/C++ headers. The goal
of this PR is to reduce the wall of text that many headers have and,
more importantly, to make the declaration of extern'd variables - of
which we have many dispersed in various places - easily and quickly
identifiable.

Automatically generated with:

    $ git grep -l '^extern.*(' '**.h' | \
            grep -v libjudy | \
            grep -v 'sqlite3.h' | \
            xargs sed -i -e 's/extern \(.*(.*$\)/\1/'

This is a NFC.
2022-10-09 16:38:49 +03:00
Costa Tsaousis cb7af25c09
RRD structures managed by dictionaries (#13646)
* rrdset - in progress

* rrdset optimal constructor; rrdset conflict

* rrdset final touches

* re-organization of rrdset object members

* prevent use-after-free

* dictionary dfe supports also counting of iterations

* rrddim managed by dictionary

* rrd.h cleanup

* DICTIONARY_ITEM now is referencing actual dictionary items in the code

* removed rrdset linked list

* Revert "removed rrdset linked list"

This reverts commit 690d6a588b4b99619c2c5e10f84e8f868ae6def5.

* removed rrdset linked list

* added comments

* Switch chart uuid to static allocation in rrdset
Remove unused functions

* rrdset_archive() and friends...

* always create rrdfamily

* enable ml_free_dimension

* rrddim_foreach done with dfe

* most custom rrddim loops replaced with rrddim_foreach

* removed accesses to rrddim->dimensions

* removed locks that are no longer needed

* rrdsetvar is now managed by the dictionary

* set rrdset is rrdsetvar, fixes https://github.com/netdata/netdata/pull/13646#issuecomment-1242574853

* conflict callback of rrdsetvar now properly checks if it has to reset the variable

* dictionary registered callbacks accept as first parameter the DICTIONARY_ITEM

* dictionary dfe now uses internal counter to report; avoided excess variables defined with dfe

* dictionary walkthrough callbacks get dictionary acquired items

* dictionary reference counters that can be dupped from zero

* added advanced functions for get and del

* rrdvar managed by dictionaries

* thread safety for rrdsetvar

* faster rrdvar initialization

* rrdvar string lengths should match in all add, del, get functions

* rrdvar internals hidden from the rest of the world

* rrdvar is now acquired throughout netdata

* hide the internal structures of rrdsetvar

* rrdsetvar is now acquired through out netdata

* rrddimvar managed by dictionary; rrddimvar linked list removed; rrddimvar structures hidden from the rest of netdata

* better error handling

* dont create variables if not initialized for health

* dont create variables if not initialized for health again

* rrdfamily is now managed by dictionaries; references of it are acquired dictionary items

* type checking on acquired objects

* rrdcalc renaming of functions

* type checking for rrdfamily_acquired

* rrdcalc managed by dictionaries

* rrdcalc double free fix

* host rrdvars is always needed

* attempt to fix deadlock 1

* attempt to fix deadlock 2

* Remove unused variable

* attempt to fix deadlock 3

* snprintfz

* rrdcalc index in rrdset fix

* Stop storing active charts and computing chart hashes

* Remove store active chart function

* Remove compute chart hash function

* Remove sql_store_chart_hash function

* Remove store_active_dimension function

* dictionary delayed destruction

* formatting and cleanup

* zero dictionary base on rrdsetvar

* added internal error to log delayed destructions of dictionaries

* typo in rrddimvar

* added debugging info to dictionary

* debug info

* fix for rrdcalc keys being empty

* remove forgotten unlock

* remove deadlock

* Switch to metadata version 5 and drop
  chart_hash
  chart_hash_map
  chart_active
  dimension_active
  v_chart_hash

* SQL cosmetic changes

* do not busy wait while destroying a referenced dictionary

* remove deadlock

* code cleanup; re-organization;

* fast cleanup and flushing of dictionaries

* number formatting fixes

* do not delete configured alerts when archiving a chart

* rrddim obsolete linked list management outside dictionaries

* removed duplicate contexts call

* fix crash when rrdfamily is not initialized

* dont keep rrddimvar referenced

* properly cleanup rrdvar

* removed some locks

* Do not attempt to cleanup chart_hash / chart_hash_map

* rrdcalctemplate managed by dictionary

* register callbacks on the right dictionary

* removed some more locks

* rrdcalc secondary index replaced with linked-list; rrdcalc labels updates are now executed by health thread

* when looking up for an alarm look using both chart id and chart name

* host initialization a bit more modular

* init rrdlabels on host update

* preparation for dictionary views

* improved comment

* unused variables without internal checks

* service threads isolation and worker info

* more worker info in service thread

* thread cancelability debugging with internal checks

* strings data races addressed; fixes https://github.com/netdata/netdata/issues/13647

* dictionary modularization

* Remove unused SQL statement definition

* unit-tested thread safety of dictionaries; removed data race conditions on dictionaries and strings; dictionaries now can detect if the caller is holds a write lock and automatically all the calls become their unsafe versions; all direct calls to unsafe version is eliminated

* remove worker_is_idle() from the exit of service functions, because we lose the lock time between loops

* rewritten dictionary to have 2 separate locks, one for indexing and another for traversal

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/cgroups.plugin/sys_fs_cgroup.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* Update collectors/proc.plugin/proc_net_dev.c

Co-authored-by: Vladimir Kobal <vlad@prokk.net>

* fix memory leak in rrdset cache_dir

* minor dictionary changes

* dont use index locks in single threaded

* obsolete dict option

* rrddim options and flags separation; rrdset_done() optimization to keep array of reference pointers to rrddim;

* fix jump on uninitialized value in dictionary; remove double free of cache_dir

* addressed codacy findings

* removed debugging code

* use the private refcount on dictionaries

* make dictionary item desctructors work on dictionary destruction; strictier control on dictionary API; proper cleanup sequence on rrddim;

* more dictionary statistics

* global statistics about dictionary operations, memory, items, callbacks

* dictionary support for views - missing the public API

* removed warning about unused parameter

* chart and context name for cloud

* chart and context name for cloud, again

* dictionary statistics fixed; first implementation of dictionary views - not currently used

* only the master can globally delete an item

* context needs netdata prefix

* fix context and chart it of spins

* fix for host variables when health is not enabled

* run garbage collector on item insert too

* Fix info message; remove extra "using"

* update dict unittest for new placement of garbage collector

* we need RRDHOST->rrdvars for maintaining custom host variables

* Health initialization needs the host->host_uuid

* split STRING to its own files; no code changes other than that

* initialize health unconditionally

* unit tests do not pollute the global scope with their variables

* Skip initialization when creating archived hosts on startup. When a child connects it will initialize properly

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-19 23:46:13 +03:00
Costa Tsaousis 5e1b95cf92
Deduplicate all netdata strings (#13570)
* rrdfamily

* rrddim

* rrdset plugin and module names

* rrdset units

* rrdset type

* rrdset family

* rrdset title

* rrdset title more

* rrdset context

* rrdcalctemplate context and removal of context hash from rrdset

* strings statistics

* rrdset name

* rearranged members of rrdset

* eliminate rrdset name hash; rrdcalc chart converted to STRING

* rrdset id, eliminated rrdset hash

* rrdcalc, alarm_entry, alert_config and some of rrdcalctemplate

* rrdcalctemplate

* rrdvar

* eval_variable

* rrddimvar and rrdsetvar

* rrdhost hostname, os and tags

* fix master commits

* added thread cache; implemented string_dup without locks

* faster thread cache

* rrdset and rrddim now use dictionaries for indexing

* rrdhost now uses dictionary

* rrdfamily now uses DICTIONARY

* rrdvar using dictionary instead of AVL

* allocate the right size to rrdvar flag members

* rrdhost remaining char * members to STRING *

* better error handling on indexing

* strings now use a read/write lock to allow parallel searches to the index

* removed AVL support from dictionaries; implemented STRING with native Judy calls

* string releases should be negative

* only 31 bits are allowed for enum flags

* proper locking on strings

* string threading unittest and fixes

* fix lgtm finding

* fixed naming

* stream chart/dimension definitions at the beginning of a streaming session

* thread stack variable is undefined on thread cancel

* rrdcontext garbage collect per host on startup

* worker control in garbage collection

* relaxed deletion of rrdmetrics

* type checking on dictfe

* netdata chart to monitor rrdcontext triggers

* Group chart label updates

* rrdcontext better handling of collected rrdsets

* rrdpush incremental transmition of definitions should use as much buffer as possible

* require 1MB per chart

* empty the sender buffer before enabling metrics streaming

* fill up to 50% of buffer

* reset signaling metrics sending

* use the shared variable for status

* use separate host flag for enabling streaming of metrics

* make sure the flag is clear

* add logging for streaming

* add logging for streaming on buffer overflow

* circular_buffer proper sizing

* removed obsolete logs

* do not execute worker jobs if not necessary

* better messages about compression disabling

* proper use of flags and updating rrdset last access time every time the obsoletion flag is flipped

* monitor stream sender used buffer ratio

* Update exporting unit tests

* no need to compare label value with strcmp

* streaming send workers now monitor bandwidth

* workers now use strings

* streaming receiver monitors incoming bandwidth

* parser shift of worker ids

* minor fixes

* Group chart label updates

* Populate context with dimensions that have data

* Fix chart id

* better shift of parser worker ids

* fix for streaming compression

* properly count received bytes

* ensure LZ4 compression ring buffer does not wrap prematurely

* do not stream empty charts; do not process empty instances in rrdcontext

* need_to_send_chart_definition() does not need an rrdset lock any more

* rrdcontext objects are collected, after data have been written to the db

* better logging of RRDCONTEXT transitions

* always set all variables needed by the worker utilization charts

* implemented double linked list for most objects; eliminated alarm indexes from rrdhost; and many more fixes

* lockless strings design - string_dup() and string_freez() are totally lockless when they dont need to touch Judy - only Judy is protected with a read/write lock

* STRING code re-organization for clarity

* thread_cache improvements; double numbers precision on worker threads

* STRING_ENTRY now shadown STRING, so no duplicate definition is required; string_length() renamed to string_strlen() to follow the paradigm of all other functions, STRING internal statistics are now only compiled with NETDATA_INTERNAL_CHECKS

* rrdhost index by hostname now cleans up; aclk queries of archieved hosts do not index hosts

* Add index to speed up database context searches

* Removed last_updated optimization (was also buggy after latest merge with master)

Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-09-05 19:31:06 +03:00
Costa Tsaousis 1b0f6c6b22
Labels with dictionary (#13070)
* squashed and rebased to master

* fix overflow and single character bug in sanitize; include rrd.h instead of node_info.h

* added unittest for UTF-8 multibyte sanitization

* Fix unit test compilation

* Fix CMake build

* remove double sanitizer for opentsdb; cleanup sanitize_json_string()

* rename error_description to error_message to avoid conflict with json-c

* revert last and undef error_description from json-c

* more unittests; attempt to fix protobuf map issue

* get rid of rrdlabels_get() and replace it with a safe version that writes the value to a buffer

* added dictionary sorting unittest; rrdlabels_to_buffer() now is sorted

* better sorted dictionary checking

* proper unittesting for sorted dictionaries

* call dictionary deletion callback when destroying the dictionary

* remove obsolete variable

* Fix exporting unit tests

* Fix k8s label parsing test

* workaround for cmocka and strdupz()

* Bypass cmocka memory allocation check

* Revert "Bypass cmocka memory allocation check"

This reverts commit 4c49923839.

* Revert "workaround for cmocka and strdupz()"

This reverts commit 7bebee0480.

* Bypass cmocka memory allocation checks

* respect json formatting for chart labels

* cloud sends colons

* print the value only once

* allow parenthesis in values and spaces; make stream sender send quotes for values

Co-authored-by: Vladimir Kobal <vlad@prokk.net>
2022-06-13 20:35:45 +03:00
Costa Tsaousis 7784a16cc7
Dictionary with JudyHS and double linked list (#13032)
* dictionary internals isolation

* more dictionary cleanups

* added unit test

* we should use DICT internally

* disable cups in cmake

* implement DICTIONARY with Judy arrays

* operational JUDY implementation

* JUDY cleanup

* JUDY summary added

* JudyHS implementation with double linked list

* test negative searches too

* optimize destruction

* optimize set to insert first without lookup

* updated stats

* code cleanup; better organization; updated info

* more code cleanup and commenting

* more cleanup, renames and comments

* fix rename

* more cleanups

* use Judy.h from system paths

* added foreach traversal; added flag to add item in front; isolated locks to their own functions; destruction returns the number of bytes freed

* more comments; flags are now 16-bit

* completed unittesting

* addressed comments and added reference counters maintainance

* added unittest in main; tested removal of items in front, back and middle

* added read/write walkthrough and foreach; allowed walkthrough and foreach in write mode to delete the current element (used by cups.plugin); referenced counters removed from the API

* DICTFE.name should be const too

* added API calls for exposing all statistics

* dictionary flags as enum and reference counters as atomic operations

* more comments; improved error handling at unit tests

* added functions to allow unsafe access while traversing the dictionary with locks in place

* check for libcups in cmake

* added delete callback; implemented statsd with this dictionary

* added missing dfe_done()

* added alternative implementation with AVL

* added documentation

* added comments and warning about AVL

* dictionary walktrhough on new code

* simplified foreach; updated docs

* updated docs

* AVL is much faster without hashes

* AVL should follow DBENGINE
2022-06-01 20:01:52 +03:00
Ilya Mashchenko 0fa55c7dce
feat: move dirs, logs, and env vars config options to separate sections (#12935) 2022-05-17 17:31:19 +03:00
Tina Luedtke c7f2647a62
Docs: Removed Google Analytics tags (#12145) 2022-02-17 10:37:46 +00:00
vkalintiris b8cd2bdc50
Remove unecessary relative paths when including headers. (#11124)
Currently, we add the repository's top-level dir in the compiler's
header search path. This means that code in every top-level directory
within the repo can include headers sibling top-level directories.

This patch makes header inclusion consistent when it comes to files
that are included from sibling top-level directories within the repo.
2021-05-24 17:44:50 +03:00
Vladimir Kobal f569beac51
Move global stats to a separate thread (#10991) 2021-04-19 16:46:58 +03:00
vkalintiris adec24dffa
Rename struct avl to avl_element and the typedef to avl_t (#10735)
Before:

```
struct foobar {
    avl avl;
    ...
}
```

After:

```
struct foobar {
    avl_t avl;
    ...
};
```

Which makes figuring out the type from field name easier.
2021-03-10 10:37:47 +02:00
thiagoftsm 51b57dc0a5
Add new cookie to fix 8094 (#10676)
Add missing cookies to Netdata.
2021-03-02 20:00:38 +00:00
Joel Hans 46a8075c8f
Docs housekeeping for SEO and syntax, part 1 (#10388)
* First pass to get the script working right

* Finish adding analytics tags
2021-01-07 11:44:43 -07:00
Tomáš Kopal bcb9c86827
Make libnetdata headers compilable by C++. (#10185) 2020-11-07 00:10:50 +00:00
Andrew Moss 551684bc7c
Update description in registry with minor copy edits (#9441)
Co-authored-by: Megan Moore <megan@netdata.cloud>
2020-06-29 15:14:24 +02:00
Joel Hans 78ca668e50
Cleanup of main README and registry doc (#9265)
* Cleanup README and remove old link

* Additional cleanup

* One more alignment
2020-06-04 07:12:48 -07:00
Andrew Moss aa3ec552c8
Enable support for Netdata Cloud.
This PR merges the feature-branch to make the cloud live. It contains the following work:
Co-authored-by: Andrew Moss <1043609+amoss@users.noreply.github.com(opens in new tab)>
Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)>
Co-authored-by: Austin S. Hemmelgarn <austin@netdata.cloud(opens in new tab)>
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
Co-authored-by: Markos Fountoulakis <44345837+mfundul@users.noreply.github.com(opens in new tab)>
Co-authored-by: Timotej S <6674623+underhood@users.noreply.github.com(opens in new tab)>
Co-authored-by: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com(opens in new tab)>
* dashboard with new navbars, v1.0-alpha.9: PR #8478
* dashboard v1.0.11: netdata/dashboard#76
Co-authored-by: Jacek Kolasa <jacek.kolasa@gmail.com(opens in new tab)>
* Added installer code to bundle JSON-c if it's not present. PR #8836
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
* Fix claiming config PR #8843
* Adds JSON-c as hard dep. for ACLK PR #8838
* Fix SSL renegotiation errors in old versions of openssl. PR #8840. Also - we have a transient problem with opensuse CI so this PR disables them with a commit from @prologic.
Co-authored-by: James Mills <prologic@shortcircuit.net.au(opens in new tab)>
* Fix claiming error handling PR #8850
* Added CI to verify JSON-C bundling code in installer PR #8853
* Make cloud-enabled flag in web/api/v1/info be independent of ACLK build success PR #8866
* Reduce ACLK_STABLE_TIMEOUT from 10 to 3 seconds PR #8871
* remove old-cloud related UI from old dashboard (accessible now via /old suffix) PR #8858
* dashboard v1.0.13 PR #8870
* dashboard v1.0.14 PR #8904
* Provide feedback on proxy setting changes PR #8895
* Change the name of the connect message to update during an ongoing session PR #8927
* Fetch active alarms from alarm_log PR #8944
2020-05-11 16:37:27 +10:00