You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Drew DeVault b87b19ebd4 backup_rules.yml: bump to 72 hours 5 months ago
.build.yml build.yml: upgrade to Alpine 3.15 10 months ago
LICENSE Add LICENSE 3 years ago
README.md Update README.md 3 years ago
backup_rules.yml backup_rules.yml: bump to 72 hours 5 months ago
build_rules.yml Fix build queue length alert 9 months ago
chat_rules.yml Add chat.sr.ht rules 9 months ago
meta_rules.yml meta_rules.yml: use increase instead of delta 12 months ago
node_rules.yml Add low available memory alert 1 year ago
process_rules.yml Add alert for process open FDs 10 months ago
service_rules.yml Fix High number of 500 errors alert to work instance-wide 10 months ago
ssl_rules.yml Fix summary in SSL alarm 3 years ago
test_rules.yml Reschedule weekly test alarm to CEST window 1 year ago

README.md

metrics.sr.ht

This repository tracks our Prometheus alert rules. They are available as a
package from mirror.sr.ht (for Alpine only) as metrics.sr.ht-rules.

Our Prometheus instance is public:

https://metrics.sr.ht

Usage instructions

  1. Install our package
  2. Add our rules_files entries to your prometheus.yml for each set of rules
    you wish to use
  3. Configure alertmanager accordingly

Our alerts are categorized into three severity groups:

  • interesting alerts are worth noting, as they may be useful in identifying
    trends over time, for forensic attention after an outage, or for addressing on
    a rainy day. Upstream, we send these to our IRC channel.
  • important alerts are likely to be actionable, but do not require immediate
    attention.
  • urgent alerts require immediate attention.