Go to file

Conrad Hoffmann 94ca073cd1 Alert on increase in unconfirmed registrations There is always some fallout, but over the past two weeks the ratio was never above 0.0001. It did go up to 0.0004 when there was an issue with email delivery, so 0.0002 seems to be a decent value to trigger an investigation.		2024-04-09 10:58:59 +02:00
.build.yml	.build.yml: upgrade to 3.17	2023-01-19 11:49:38 +01:00
LICENSE	Add LICENSE	2020-01-05 13:15:14 -05:00
README.md	Update README.md	2020-01-06 10:04:15 -05:00
backup_rules.yml	Loosen up backup rules	2024-01-08 14:59:22 +01:00
build_rules.yml	build_rules.yml: correct name of builds submitted metric	2023-10-04 11:03:41 +02:00
chat_rules.yml	chat: add alarm for synIRC	2023-10-24 13:32:49 +02:00
meta_rules.yml	Alert on increase in unconfirmed registrations	2024-04-09 10:58:59 +02:00
node_rules.yml	node_rules: take all CPU modes into account	2024-04-02 15:58:26 +02:00
postgres_rules.yml	Add postgres_rules.yml	2023-01-19 11:49:12 +01:00
process_rules.yml	Add alert for process open FDs	2022-01-18 19:13:10 +01:00
service_rules.yml	Fix High number of 500 errors alert to work instance-wide	2022-02-14 16:50:14 +01:00
ssl_rules.yml	Fix summary in SSL alarm	2020-02-25 12:23:28 -05:00
test_rules.yml	Reschedule weekly test alarm to CEST window	2021-07-29 09:18:59 +02:00

README.md

metrics.sr.ht

This repository tracks our Prometheus alert rules. They are available as a package from mirror.sr.ht (for Alpine only) as metrics.sr.ht-rules.

Our Prometheus instance is public:

https://metrics.sr.ht

Usage instructions

Install our package
Add our rules_files entries to your prometheus.yml for each set of rules you wish to use
Configure alertmanager accordingly

Our alerts are categorized into three severity groups:

interesting alerts are worth noting, as they may be useful in identifying trends over time, for forensic attention after an outage, or for addressing on a rainy day. Upstream, we send these to our IRC channel.
important alerts are likely to be actionable, but do not require immediate attention.
urgent alerts require immediate attention.