Getting Started

Getting started with Krkn-chaos

TL;DR

# 1. Install krknctl
curl -fsSL https://raw.githubusercontent.com/krkn-chaos/krknctl/refs/heads/main/install.sh | bash

# 2. Create a test workload
kubectl create namespace chaos-test
kubectl create deployment nginx-test --image=nginx --replicas=3 -n chaos-test

# 3. Run your first chaos scenario (pod disruption)
krknctl run pod-scenarios --namespace chaos-test --pod-label "app=nginx-test" --disruption-count 1

# 4. Verify pods recovered
kubectl get pods -n chaos-test -l app=nginx-test

What you need

Requirement	Minimum Version	Check Command
Kubernetes or OpenShift cluster	1.21+	`kubectl version`
kubeconfig with cluster-admin access	—	`kubectl get nodes`
Docker or Podman	Docker 20.10+ / Podman 4.0+	`docker --version` or `podman --version`

Basic Run

This is the best starting point if you are new to Krkn or want to explore a specific scenario quickly. No metrics, no scoring, no pipeline — just run a scenario and see what happens.

1. Install krknctl

curl -fsSL https://raw.githubusercontent.com/krkn-chaos/krknctl/refs/heads/main/install.sh | bash

Verify the installation:

krknctl --version

Tip

Enable shell auto-completion for the best experience:

Bash: source <(krknctl completion bash)

Zsh: autoload -Uz compinit && compinit && source <(krknctl completion zsh)

2. Create a test workload

kubectl create namespace chaos-test
kubectl create deployment nginx-test --image=nginx --replicas=3 -n chaos-test
kubectl wait --for=condition=Available deployment/nginx-test -n chaos-test --timeout=60s

3. List available scenarios

krknctl list

This shows all chaos scenarios you can run. For your first test, we will use pod-scenarios.

4. Run a scenario

krknctl run pod-scenarios \
  --namespace chaos-test \
  --pod-label "app=nginx-test" \
  --disruption-count 1 \
  --kill-timeout 180 \
  --expected-recovery-time 120

krknctl will prompt you for required inputs interactively, or you can pass them as flags.

The scenario will:

Find pods matching the label app=nginx-test in the chaos-test namespace
Disrupt 1 pod (delete it)
Wait up to 180 seconds for the pod to be removed
Monitor recovery for up to 120 seconds

5. Observe results

In a separate terminal, watch the pods recover:

kubectl get pods -n chaos-test -l app=nginx-test -w

You can confirm the pod was killed and recovered by checking its age. A restarted pod will show a much shorter uptime than its neighbours:

NAMESPACE     NAME                          READY   STATUS    RESTARTS   AGE
chaos-test    nginx-test-7d9f8b6c4-xk2pq   1/1     Running   0          8s
chaos-test    nginx-test-5c6d7f8b9-lm3rt   1/1     Running   0          4d2h
chaos-test    nginx-test-787d4945fb-nqpzj   1/1     Running   0          4d2h

The 8s age shows the pod was recently restarted by the scenario while the others remain unaffected.

What success looks like: The disrupted pod is deleted and Kubernetes recreates it. The new pod reaches Ready state within the --expected-recovery-time window. The scenario exits with code 0.

{
  "recovered": [
    {
      "pod_name": "nginx-test-7d9f8b6c4-xk2pq",
      "namespace": "chaos-test",
      "pod_rescheduling_time": 2.3,
      "pod_readiness_time": 5.7,
      "total_recovery_time": 8.0
    }
  ],
  "unrecovered": []
}

What failure looks like: The pod does not recover within the timeout. The scenario exits with a non-zero code and logs an error.

{
  "recovered": [],
  "unrecovered": [
    {
      "pod_name": "nginx-test-7d9f8b6c4-xk2pq",
      "namespace": "chaos-test",
      "pod_rescheduling_time": 0.0,
      "pod_readiness_time": 0.0,
      "total_recovery_time": 0.0
    }
  ]
}

6. Clean up

kubectl delete namespace chaos-test
krknctl clean

Where to go next

Whether you’re running your first scenario or building a production resilience pipeline, pick the journey that matches your goals:

Journey	I want to…	Experience level	Tools needed
Metrics Validation	Automatically pass/fail based on Prometheus metrics	Intermediate	krknctl + Prometheus
Resilience Score	Generate a scored report to validate an environment	Intermediate	krknctl + Prometheus
Long-Term Storage	Store metrics across runs for regression analysis	Advanced	krknctl + Prometheus + Elasticsearch
Multi-Cluster Orchestration	Run chaos across multiple clusters or clouds	Advanced	krkn-operator

Alternative Methods

Krkn-hub (Containerized)

Krkn-hub runs scenarios as container images — ideal for CI/CD pipelines. Each scenario is a pre-built image on quay.io/krkn-chaos/krkn-hub.

podman run --net=host \
  -v ~/.kube/config:/home/krkn/.kube/config:Z \
  -e NAMESPACE=default \
  -e POD_LABEL="app=my-app" \
  -d quay.io/krkn-chaos/krkn-hub:pod-scenarios

See the krkn-hub installation guide for full setup instructions.

Note: Krkn-hub runs one scenario type at a time per container.

Krkn (Standalone Python)

Krkn is the core chaos engine — a Python program that can run multiple scenario types in a single execution using config files.

See the krkn installation guide and configuration hints to get started.

Note: Krkn allows running multiple different scenario types and scenario files in one execution, unlike krkn-hub and krknctl.

Getting Started

Tags:

Categories:

TL;DR

What you need

Basic Run

1. Install krknctl

Tip

2. Create a test workload

3. List available scenarios

4. Run a scenario

5. Observe results

6. Clean up

Where to go next

Alternative Methods

Krkn-hub (Containerized)

Krkn (Standalone Python)

Further Reading

Metrics Validation

Running a Chaos Scenario with Krkn

Long-Term Storage

Resilience Score

Multi-Cluster Orchestration