Fix High number of 500 errors alert to work instance-wide

This was originally intentioned to be look at the instance-wide stats,
but I have accidentally copied the wrong query from my experiments.
This commit is contained in:
Ignas Kiela 2022-02-14 17:49:20 +02:00 committed by Drew DeVault
parent 927f06f0f3
commit 74b7d859d5
1 changed files with 1 additions and 1 deletions

View File

@ -10,7 +10,7 @@ groups:
annotations:
summary: "{{ $labels.instance }} has a high rate of 500 errors on route {{ $labels.route }}"
- alert: High rate of 500 errors on an instance
expr: rate(request_time_count{status="500"}[15m]) / ignoring(status) sum without(status) (rate(request_time_count[15m])) > 0.25
expr: sum by(instance) (rate(request_time_count{status="500"}[15m])) / sum by(instance) (rate(request_time_count[15m])) > 0.25
for: 5m
labels:
severity: urgent