Fix High number of 500 errors alert to work instance-wide
This was originally intentioned to be look at the instance-wide stats, but I have accidentally copied the wrong query from my experiments.
This commit is contained in:
parent
927f06f0f3
commit
74b7d859d5
|
@ -10,7 +10,7 @@ groups:
|
|||
annotations:
|
||||
summary: "{{ $labels.instance }} has a high rate of 500 errors on route {{ $labels.route }}"
|
||||
- alert: High rate of 500 errors on an instance
|
||||
expr: rate(request_time_count{status="500"}[15m]) / ignoring(status) sum without(status) (rate(request_time_count[15m])) > 0.25
|
||||
expr: sum by(instance) (rate(request_time_count{status="500"}[15m])) / sum by(instance) (rate(request_time_count[15m])) > 0.25
|
||||
for: 5m
|
||||
labels:
|
||||
severity: urgent
|
||||
|
|
Loading…
Reference in New Issue