Go to file
Ignas Kiela 61dd449a4a Add alerts for high worker utilization
Additionally, update the metric used for high number of builds timing
out and double the limit of high number of build submission since the
high worker utilization alarm should most of the cases that submission
alarm was meant to handle.
2023-06-22 10:34:38 +02:00
.build.yml .build.yml: upgrade to 3.17 2023-01-19 11:49:38 +01:00
LICENSE Add LICENSE 2020-01-05 13:15:14 -05:00
README.md Update README.md 2020-01-06 10:04:15 -05:00
backup_rules.yml backup_rules.yml: bump to 72 hours 2022-07-04 14:36:33 +02:00
build_rules.yml Add alerts for high worker utilization 2023-06-22 10:34:38 +02:00
chat_rules.yml chat: add rules for /media/soju-logs 2023-06-01 12:36:55 +02:00
meta_rules.yml meta_rules.yml: use increase instead of delta 2021-12-15 11:30:16 +01:00
node_rules.yml fix incorrect expression for "Instance rebooted" 2023-01-26 09:55:09 +01:00
postgres_rules.yml Add postgres_rules.yml 2023-01-19 11:49:12 +01:00
process_rules.yml Add alert for process open FDs 2022-01-18 19:13:10 +01:00
service_rules.yml Fix High number of 500 errors alert to work instance-wide 2022-02-14 16:50:14 +01:00
ssl_rules.yml Fix summary in SSL alarm 2020-02-25 12:23:28 -05:00
test_rules.yml Reschedule weekly test alarm to CEST window 2021-07-29 09:18:59 +02:00



This repository tracks our Prometheus alert rules. They are available as a package from mirror.sr.ht (for Alpine only) as metrics.sr.ht-rules.

Our Prometheus instance is public:


Usage instructions

  1. Install our package
  2. Add our rules_files entries to your prometheus.yml for each set of rules you wish to use
  3. Configure alertmanager accordingly

Our alerts are categorized into three severity groups:

  • interesting alerts are worth noting, as they may be useful in identifying trends over time, for forensic attention after an outage, or for addressing on a rainy day. Upstream, we send these to our IRC channel.
  • important alerts are likely to be actionable, but do not require immediate attention.
  • urgent alerts require immediate attention.