Getting Started

Getting started with Krkn-chaos

TL;DR

# 1. Install krknctl
curl -fsSL https://raw.githubusercontent.com/krkn-chaos/krknctl/refs/heads/main/install.sh | bash

# 2. Create a test workload
kubectl create namespace chaos-test
kubectl create deployment nginx-test --image=nginx --replicas=3 -n chaos-test

# 3. Run your first chaos scenario (pod disruption)
krknctl run pod-scenarios --namespace chaos-test --pod-label "app=nginx-test" --disruption-count 1

# 4. Verify pods recovered
kubectl get pods -n chaos-test -l app=nginx-test

What you need

RequirementMinimum VersionCheck Command
Kubernetes or OpenShift cluster1.21+kubectl version
kubeconfig with cluster-admin accesskubectl get nodes
Docker or PodmanDocker 20.10+ / Podman 4.0+docker --version or podman --version

Basic Run

This is the best starting point if you are new to Krkn or want to explore a specific scenario quickly. No metrics, no scoring, no pipeline — just run a scenario and see what happens.

1. Install krknctl

curl -fsSL https://raw.githubusercontent.com/krkn-chaos/krknctl/refs/heads/main/install.sh | bash

Verify the installation:

krknctl --version

2. Create a test workload

kubectl create namespace chaos-test
kubectl create deployment nginx-test --image=nginx --replicas=3 -n chaos-test
kubectl wait --for=condition=Available deployment/nginx-test -n chaos-test --timeout=60s

3. List available scenarios

krknctl list

This shows all chaos scenarios you can run. For your first test, we will use pod-scenarios.

4. Run a scenario

krknctl run pod-scenarios \
  --namespace chaos-test \
  --pod-label "app=nginx-test" \
  --disruption-count 1 \
  --kill-timeout 180 \
  --expected-recovery-time 120

krknctl will prompt you for required inputs interactively, or you can pass them as flags.

The scenario will:

  1. Find pods matching the label app=nginx-test in the chaos-test namespace
  2. Disrupt 1 pod (delete it)
  3. Wait up to 180 seconds for the pod to be removed
  4. Monitor recovery for up to 120 seconds

5. Observe results

In a separate terminal, watch the pods recover:

kubectl get pods -n chaos-test -l app=nginx-test -w

You can confirm the pod was killed and recovered by checking its age. A restarted pod will show a much shorter uptime than its neighbours:

NAMESPACE     NAME                          READY   STATUS    RESTARTS   AGE
chaos-test    nginx-test-7d9f8b6c4-xk2pq   1/1     Running   0          8s
chaos-test    nginx-test-5c6d7f8b9-lm3rt   1/1     Running   0          4d2h
chaos-test    nginx-test-787d4945fb-nqpzj   1/1     Running   0          4d2h

The 8s age shows the pod was recently restarted by the scenario while the others remain unaffected.

What success looks like: The disrupted pod is deleted and Kubernetes recreates it. The new pod reaches Ready state within the --expected-recovery-time window. The scenario exits with code 0.

{
  "recovered": [
    {
      "pod_name": "nginx-test-7d9f8b6c4-xk2pq",
      "namespace": "chaos-test",
      "pod_rescheduling_time": 2.3,
      "pod_readiness_time": 5.7,
      "total_recovery_time": 8.0
    }
  ],
  "unrecovered": []
}

What failure looks like: The pod does not recover within the timeout. The scenario exits with a non-zero code and logs an error.

{
  "recovered": [],
  "unrecovered": [
    {
      "pod_name": "nginx-test-7d9f8b6c4-xk2pq",
      "namespace": "chaos-test",
      "pod_rescheduling_time": 0.0,
      "pod_readiness_time": 0.0,
      "total_recovery_time": 0.0
    }
  ]
}

6. Clean up

kubectl delete namespace chaos-test
krknctl clean

Where to go next

Whether you’re running your first scenario or building a production resilience pipeline, pick the journey that matches your goals:

JourneyI want to…Experience levelTools needed
Metrics ValidationAutomatically pass/fail based on Prometheus metricsIntermediatekrknctl + Prometheus
Resilience ScoreGenerate a scored report to validate an environmentIntermediatekrknctl + Prometheus
Long-Term StorageStore metrics across runs for regression analysisAdvancedkrknctl + Prometheus + Elasticsearch
Multi-Cluster OrchestrationRun chaos across multiple clusters or cloudsAdvancedkrkn-operator

Alternative Methods

Krkn-hub (Containerized)

Krkn-hub runs scenarios as container images — ideal for CI/CD pipelines. Each scenario is a pre-built image on quay.io/krkn-chaos/krkn-hub.

podman run --net=host \
  -v ~/.kube/config:/home/krkn/.kube/config:Z \
  -e NAMESPACE=default \
  -e POD_LABEL="app=my-app" \
  -d quay.io/krkn-chaos/krkn-hub:pod-scenarios

See the krkn-hub installation guide for full setup instructions.

Note: Krkn-hub runs one scenario type at a time per container.

Krkn (Standalone Python)

Krkn is the core chaos engine — a Python program that can run multiple scenario types in a single execution using config files.

See the krkn installation guide and configuration hints to get started.

Note: Krkn allows running multiple different scenario types and scenario files in one execution, unlike krkn-hub and krknctl.


Further Reading


Metrics Validation

Run chaos and automatically evaluate Prometheus metrics for a clear pass or fail without manual inspection.

Running a Chaos Scenario with Krkn

Long-Term Storage

Persist metrics from every chaos run into Elasticsearch to compare behavior across releases, dates, or cluster configurations.

Resilience Score

Generate a numerical score (0–100%) that represents how well your environment held up during chaos.

Multi-Cluster Orchestration

Run chaos scenarios across multiple clusters or cloud environments from a single control point using krkn-operator.

Last modified April 27, 2026: adding few changes (#318) (d4ab6aa)