Metrics Validation
Run chaos and automatically evaluate Prometheus metrics for a clear pass or fail without manual inspection.
Categories:
Goal: Run chaos and automatically evaluate Prometheus metrics — getting a clear pass or fail without manual inspection.
This journey is well suited to CI/CD pipelines where you cannot watch the cluster in real time.
What you need
- Everything from Basic Run
- A Prometheus instance accessible from where Krkn runs (auto-detected on OpenShift; set via scenario flags on Kubernetes) — need to set one up? See installing Prometheus on a kind cluster
- krknctl installed
Steps
Install krknctl — follow the installation guide.
Create your alerts profile at
config/alerts.yaml. This defines the PromQL expressions Krkn evaluates after each scenario:- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[2m]))[5m:]) > 0.01 description: "etcd fsync latency too high: {{$value}}" severity: error - expr: sum(kube_pod_status_phase{phase="Failed"}) > 5 description: "Too many failed pods: {{$value}}" severity: errorQueries with
severity: errorcause Krkn to exit with a non-zero code. Queries withseverity: infoare logged only.Run a scenario with the alerts profile mounted:
krknctl run pod-scenarios --alerts-profile config/alerts.yamlKrkn evaluates the alert profile at the end of each scenario and reports pass or fail.
Reference docs
- SLO Validation — full details on alert profiles and PromQL configuration
- krknctl usage — full flag reference for
run - Installing Prometheus on a kind cluster — Helm-based setup for local testing
Next steps
To persist metrics long-term for regression analysis across releases, continue to Long-Term Storage.