![]() Additionally, update the metric used for high number of builds timing out and double the limit of high number of build submission since the high worker utilization alarm should most of the cases that submission alarm was meant to handle. |
||
---|---|---|
.build.yml | ||
LICENSE | ||
README.md | ||
backup_rules.yml | ||
build_rules.yml | ||
chat_rules.yml | ||
meta_rules.yml | ||
node_rules.yml | ||
postgres_rules.yml | ||
process_rules.yml | ||
service_rules.yml | ||
ssl_rules.yml | ||
test_rules.yml |
README.md
metrics.sr.ht
This repository tracks our Prometheus alert rules. They are available as a package from mirror.sr.ht (for Alpine only) as metrics.sr.ht-rules.
Our Prometheus instance is public:
Usage instructions
- Install our package
- Add our
rules_files
entries to yourprometheus.yml
for each set of rules you wish to use - Configure alertmanager accordingly
Our alerts are categorized into three severity groups:
- interesting alerts are worth noting, as they may be useful in identifying trends over time, for forensic attention after an outage, or for addressing on a rainy day. Upstream, we send these to our IRC channel.
- important alerts are likely to be actionable, but do not require immediate attention.
- urgent alerts require immediate attention.