Metrics Validation
Run chaos and automatically evaluate Prometheus metrics for a clear pass or fail without manual inspection.
# 1. Install krknctl
curl -fsSL https://raw.githubusercontent.com/krkn-chaos/krknctl/refs/heads/main/install.sh | bash
# 2. Create a test workload
kubectl create namespace chaos-test
kubectl create deployment nginx-test --image=nginx --replicas=3 -n chaos-test
# 3. Run your first chaos scenario (pod disruption)
krknctl run pod-scenarios --namespace chaos-test --pod-label "app=nginx-test" --disruption-count 1
# 4. Verify pods recovered
kubectl get pods -n chaos-test -l app=nginx-test
| Requirement | Minimum Version | Check Command |
|---|---|---|
| Kubernetes or OpenShift cluster | 1.21+ | kubectl version |
| kubeconfig with cluster-admin access | — | kubectl get nodes |
| Docker or Podman | Docker 20.10+ / Podman 4.0+ | docker --version or podman --version |
This is the best starting point if you are new to Krkn or want to explore a specific scenario quickly. No metrics, no scoring, no pipeline — just run a scenario and see what happens.
curl -fsSL https://raw.githubusercontent.com/krkn-chaos/krknctl/refs/heads/main/install.sh | bash
Verify the installation:
krknctl --version
Enable shell auto-completion for the best experience:
Bash: source <(krknctl completion bash)
Zsh: autoload -Uz compinit && compinit && source <(krknctl completion zsh)
kubectl create namespace chaos-test
kubectl create deployment nginx-test --image=nginx --replicas=3 -n chaos-test
kubectl wait --for=condition=Available deployment/nginx-test -n chaos-test --timeout=60s
krknctl list
This shows all chaos scenarios you can run. For your first test, we will use pod-scenarios.
krknctl run pod-scenarios \
--namespace chaos-test \
--pod-label "app=nginx-test" \
--disruption-count 1 \
--kill-timeout 180 \
--expected-recovery-time 120
krknctl will prompt you for required inputs interactively, or you can pass them as flags.
The scenario will:
app=nginx-test in the chaos-test namespaceIn a separate terminal, watch the pods recover:
kubectl get pods -n chaos-test -l app=nginx-test -w
You can confirm the pod was killed and recovered by checking its age. A restarted pod will show a much shorter uptime than its neighbours:
NAMESPACE NAME READY STATUS RESTARTS AGE
chaos-test nginx-test-7d9f8b6c4-xk2pq 1/1 Running 0 8s
chaos-test nginx-test-5c6d7f8b9-lm3rt 1/1 Running 0 4d2h
chaos-test nginx-test-787d4945fb-nqpzj 1/1 Running 0 4d2h
The 8s age shows the pod was recently restarted by the scenario while the others remain unaffected.
What success looks like: The disrupted pod is deleted and Kubernetes recreates it. The new pod reaches Ready state within the --expected-recovery-time window. The scenario exits with code 0.
{
"recovered": [
{
"pod_name": "nginx-test-7d9f8b6c4-xk2pq",
"namespace": "chaos-test",
"pod_rescheduling_time": 2.3,
"pod_readiness_time": 5.7,
"total_recovery_time": 8.0
}
],
"unrecovered": []
}
What failure looks like: The pod does not recover within the timeout. The scenario exits with a non-zero code and logs an error.
{
"recovered": [],
"unrecovered": [
{
"pod_name": "nginx-test-7d9f8b6c4-xk2pq",
"namespace": "chaos-test",
"pod_rescheduling_time": 0.0,
"pod_readiness_time": 0.0,
"total_recovery_time": 0.0
}
]
}
kubectl delete namespace chaos-test
krknctl clean
Whether you’re running your first scenario or building a production resilience pipeline, pick the journey that matches your goals:
| Journey | I want to… | Experience level | Tools needed |
|---|---|---|---|
| Metrics Validation | Automatically pass/fail based on Prometheus metrics | Intermediate | krknctl + Prometheus |
| Resilience Score | Generate a scored report to validate an environment | Intermediate | krknctl + Prometheus |
| Long-Term Storage | Store metrics across runs for regression analysis | Advanced | krknctl + Prometheus + Elasticsearch |
| Multi-Cluster Orchestration | Run chaos across multiple clusters or clouds | Advanced | krkn-operator |
Krkn-hub runs scenarios as container images — ideal for CI/CD pipelines. Each scenario is a pre-built image on quay.io/krkn-chaos/krkn-hub.
podman run --net=host \
-v ~/.kube/config:/home/krkn/.kube/config:Z \
-e NAMESPACE=default \
-e POD_LABEL="app=my-app" \
-d quay.io/krkn-chaos/krkn-hub:pod-scenarios
See the krkn-hub installation guide for full setup instructions.
Note: Krkn-hub runs one scenario type at a time per container.
Krkn is the core chaos engine — a Python program that can run multiple scenario types in a single execution using config files.
See the krkn installation guide and configuration hints to get started.
Note: Krkn allows running multiple different scenario types and scenario files in one execution, unlike krkn-hub and krknctl.
Run chaos and automatically evaluate Prometheus metrics for a clear pass or fail without manual inspection.
Persist metrics from every chaos run into Elasticsearch to compare behavior across releases, dates, or cluster configurations.
Generate a numerical score (0–100%) that represents how well your environment held up during chaos.
Run chaos scenarios across multiple clusters or cloud environments from a single control point using krkn-operator.