1 - Installation
Install krkn-operator using Helm
This guide walks you through installing krkn-operator using Helm, the recommended installation method.
Prerequisites
- Kubernetes 1.19+ or OpenShift 4.x
- Helm 3.0+
- A Kubernetes cluster (kind, minikube, or production cluster)
Quick Start (kind/minikube)
Perfect for testing and local development, this minimal installation gets krkn-operator running quickly on kind or minikube.
Latest Version: loading…
The version number is automatically updated in the commands below. For other available versions, see the releases page.
1. Install krkn-operator
helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator --version <VERSION>
This installs krkn-operator with default settings in the current namespace.
3. Verify Installation
kubectl get pods -l app.kubernetes.io/name=krkn-operator
Expected output:
NAME READY STATUS RESTARTS AGE
krkn-operator-xxxxxxxxx-xxxxx 2/2 Running 0 1m
4. Access the Console (Optional)
For local testing, use port-forwarding to access the web console:
kubectl port-forward svc/krkn-operator-console 3000:3000
Then open http://localhost:3000 in your browser.
Production Installation
For production deployments, you’ll want to customize the installation with a values.yaml file to ensure high availability, proper resource limits, monitoring integration, and secure external access.
When to Use Each Installation Method
Choose the installation method that matches your environment and requirements:
| Method | Use When | Key Features |
|---|
| Quick Start | Testing on kind/minikube, local development, POC | Minimal configuration, port-forward access, no HA |
| Production (Kubernetes) | Running on standard Kubernetes (EKS, GKE, AKS, self-managed) | Ingress for external access, HA setup, resource limits, monitoring |
| Production (OpenShift) | Running on OpenShift/OKD clusters | OpenShift Routes instead of Ingress, enhanced security contexts, HA setup |
The main differences between production installations are:
- Kubernetes can use either:
- Gateway API (recommended) - Modern routing standard with powerful features
- Ingress (legacy) - Traditional method, still widely supported
- OpenShift uses Routes for external access (native OpenShift feature, no additional controller needed)
- Production configurations add replica counts, resource limits, pod disruption budgets, and monitoring compared to Quick Start
All production methods support the same chaos scenarios and core functionality—the choice depends on your platform and infrastructure preferences.
Installation on Kubernetes
Kubernetes clusters can expose the web console using either Gateway API (recommended) or Ingress (legacy).
Option 1: Using Gateway API (Recommended)
Gateway API is the modern successor to Ingress and provides more powerful and flexible routing capabilities.
Prerequisites:
- Gateway API CRDs installed in your cluster (installation guide)
- A Gateway resource already deployed (usually managed by cluster admins)
Create a values.yaml file:
# Production values for Kubernetes with Gateway API
# Enable web console with Gateway API
console:
enabled: true
gateway:
enabled: true
gatewayName: krkn-gateway # Name of your existing Gateway
gatewayNamespace: "" # Optional: if Gateway is in a different namespace
hostname: krkn.example.com
path: /
pathType: PathPrefix
# Operator configuration
operator:
replicaCount: 2
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
logging:
level: info
format: json
# High availability
podDisruptionBudget:
enabled: true
minAvailable: 1
# Monitoring (if using Prometheus)
monitoring:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
Note: Gateway API assumes you have a Gateway resource already configured in your cluster. The chart creates only the HTTPRoute that attaches to that Gateway.
Option 2: Using Ingress (Legacy)
If your cluster doesn’t support Gateway API yet, you can use traditional Ingress:
# Production values for Kubernetes with Ingress
# Enable web console with Ingress
console:
enabled: true
ingress:
enabled: true
className: nginx # or your ingress controller
hostname: krkn.example.com
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
tls:
- secretName: krkn-tls
hosts:
- krkn.example.com
# Operator configuration
operator:
replicaCount: 2
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
logging:
level: info
format: json
# High availability
podDisruptionBudget:
enabled: true
minAvailable: 1
# Monitoring (if using Prometheus)
monitoring:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
Install with your custom values:
helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
--version <VERSION> \
--namespace krkn-operator-system \
--create-namespace \
-f values.yaml
Installation on OpenShift
OpenShift uses Routes instead of Ingress. Create an OpenShift-specific values.yaml:
# Production values for OpenShift
# Enable web console with Route
console:
enabled: true
route:
enabled: true
hostname: krkn.apps.cluster.example.com
tls:
termination: edge
# Operator configuration
operator:
replicaCount: 2
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
# High availability
podDisruptionBudget:
enabled: true
minAvailable: 1
Install on OpenShift:
helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
--version <VERSION> \
--namespace krkn-operator-system \
--create-namespace \
-f values-openshift.yaml
Advanced Configuration Options
Enable ACM Integration
To enable Red Hat Advanced Cluster Management (ACM) / Open Cluster Management (OCM) integration:
acm:
enabled: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
Install with ACM enabled:
helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
--version <VERSION> \
--set acm.enabled=true \
--namespace krkn-operator-system \
--create-namespace
ACM Integration: When
ACM is enabled, krkn-operator-acm will automatically discover and manage ACM-controlled clusters. See the
ACM Integration section in Configuration for more details.
Custom Namespace
Install in a custom namespace:
helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
--version <VERSION> \
--namespace my-chaos-platform \
--create-namespace \
--set namespaceOverride=my-chaos-platform
Image Registry Override
If you’re using a private registry or mirror:
operator:
image: myregistry.io/krkn-chaos/krkn-operator:<VERSION>
pullPolicy: IfNotPresent
dataProvider:
image: myregistry.io/krkn-chaos/data-provider:<VERSION>
pullSecrets:
- name: my-registry-secret
JWT Configuration
Customize JWT token settings for authentication:
jwtSecret: bXktc2VjdXJlLWp3dC1rZXktYmFzZTY0LWVuY29kZWQ= # Base64 encoded
jwtExpiryHours: 72 # 3 days
Security: Always generate a secure, random JWT secret for production. Do not use default or predictable values.
Complete values.yaml Reference
Here’s a comprehensive values.yaml with all available options:
# Namespace configuration
namespaceOverride: ""
# Image configuration
operator:
image: quay.io/krkn-chaos/krkn-operator:latest
pullPolicy: IfNotPresent
enabled: true
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
dataProvider:
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
service:
type: ClusterIP
port: 8080
grpcPort: 50051
logging:
level: info # debug, info, warn, error
format: json # json or text
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
nodeSelector: {}
tolerations: []
affinity: {}
extraEnv: []
dataProvider:
image: quay.io/krkn-chaos/data-provider:latest
# ACM Integration (Optional)
acm:
enabled: false
image: quay.io/krkn-chaos/krkn-operator-acm:latest
replicaCount: 1
config:
secretName: "" # ACM cluster credentials secret
service:
port: 8081
logging:
level: info
format: json
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
nodeSelector: {}
tolerations: []
affinity: {}
# Web Console (Optional)
console:
enabled: true
image: quay.io/krkn-chaos/console:latest
replicaCount: 1
service:
type: ClusterIP
port: 3000
nodePort: null # Only for NodePort service type
# Kubernetes Ingress (legacy)
ingress:
enabled: false
className: nginx
hostname: krkn.example.com
annotations: {}
tls: []
# Gateway API (recommended for Kubernetes)
gateway:
enabled: false
gatewayName: krkn-gateway
gatewayNamespace: ""
sectionName: ""
hostname: krkn.example.com
path: /
pathType: PathPrefix
annotations: {}
# OpenShift Route
route:
enabled: false
hostname: ""
tls:
termination: edge # edge, passthrough, or reencrypt
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
nodeSelector: {}
tolerations: []
affinity: {}
# Image pull secrets
pullSecrets: []
# JWT Authentication
jwtSecret: "" # Base64 encoded; auto-generated if empty
jwtExpiryHours: 24
# RBAC
rbac:
create: true
# Service Account
serviceAccount:
create: true
name: ""
annotations: {}
# CRDs
crds:
keep: true # Keep CRDs after uninstall
# Monitoring
monitoring:
enabled: false
service:
port: 8443
serviceMonitor:
enabled: false
interval: 30s
# Network Policy
networkPolicy:
enabled: false
ingress: []
egress: []
# Update Strategy
updateStrategy:
type: RollingUpdate
# Pod Disruption Budget
podDisruptionBudget:
enabled: false
minAvailable: 1
# Common labels and annotations
commonLabels: {}
commonAnnotations: {}
# Naming
nameOverride: ""
fullnameOverride: ""
Verification
After installation, verify all components are running:
# Check operator pods
kubectl get pods -n krkn-operator-system
# Check services
kubectl get svc -n krkn-operator-system
# Check CRDs
kubectl get crds | grep krkn
# View operator logs
kubectl logs -n krkn-operator-system -l app.kubernetes.io/name=krkn-operator -c manager
Upgrading
To upgrade to a newer version:
helm upgrade krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
--version <VERSION> \
--namespace krkn-operator-system \
-f values.yaml
Uninstalling
To remove krkn-operator:
helm uninstall krkn-operator --namespace krkn-operator-system
CRDs Persistence: By default, Custom Resource Definitions (CRDs) are preserved after uninstallation to prevent data loss. To remove them manually:
kubectl delete crds -l app.kubernetes.io/name=krkn-operator
Next Steps
2 - Configuration
Configure target clusters for chaos testing
This guide walks you through configuring target Kubernetes or OpenShift clusters where you want to run chaos engineering scenarios.
Overview
Before running chaos experiments, you need to add one or more target clusters to the Krkn Operator. Target clusters are the Kubernetes/OpenShift clusters where chaos scenarios will be executed. You can add multiple target clusters and manage them through the web console.
Administrator Access Required: Adding and managing target clusters requires administrator privileges. Only users with admin access can configure target clusters through the Settings menu.
Accessing Cluster Configuration
Step 1: Open Admin Settings
Log in to the Krkn Operator Console and click on your profile in the top-right corner. Select Admin Settings from the dropdown menu.

Admin Only: If you don’t see the “Admin Settings” option, you don’t have administrator privileges. Contact your Krkn Operator administrator to request access or to add target clusters on your behalf.
Step 2: Navigate to Cluster Targets
In the Admin Settings page, click on the Cluster Targets tab in the left sidebar. This will show you a list of all configured target clusters (if any).
Adding a New Target Cluster
Step 3: Open the Add Target Dialog
Click the Add Target button in the top-right corner of the Cluster Targets page. This will open the “Add New Target” dialog.

You’ll need to provide:
Cluster Name (required): A friendly name to identify this cluster (e.g., “Production-US-East”, “Dev-Cluster”, “OpenShift-QA”)
Authentication Type (required): Choose one of three authentication methods:
- Kubeconfig - Full kubeconfig file (recommended)
- Service Account Token - Token-based authentication
- Username/Password - Basic authentication (for clusters that support it)
Authentication Methods
The Krkn Operator supports three different ways to authenticate to target clusters. Choose the method that best fits your cluster’s security configuration.
Method 1: Kubeconfig (Recommended)
This is the most common and recommended method. It uses a complete kubeconfig file to authenticate to the target cluster.
When to use:
- You have direct access to the cluster’s kubeconfig file
- You want to authenticate with certificates or tokens defined in the kubeconfig
- The cluster supports standard Kubernetes authentication
How to configure:
- Select Kubeconfig as the Authentication Type
- Obtain the kubeconfig file for your target cluster:
# For most Kubernetes clusters
kubectl config view --flatten --minify > target-cluster.kubeconfig
# For OpenShift clusters
oc login https://api.cluster.example.com:6443
oc config view --flatten > target-cluster.kubeconfig
- Open the kubeconfig file in a text editor and copy its entire contents
- Paste the kubeconfig content into the Kubeconfig text area in the dialog
- Click Create
Automatic Encoding: The kubeconfig content will be automatically base64-encoded and stored securely. You don’t need to encode it manually.
Example kubeconfig content:
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTi...
server: https://api.cluster.example.com:6443
name: my-cluster
contexts:
- context:
cluster: my-cluster
user: admin
name: my-cluster-context
current-context: my-cluster-context
users:
- name: admin
user:
client-certificate-data: LS0tLS1CRUdJTi...
client-key-data: LS0tLS1CRUdJTi...
Method 2: Service Account Token
Use this method if you want to authenticate using a Kubernetes Service Account token.
When to use:
- You want fine-grained RBAC control over what the operator can do
- You’re following a zero-trust security model
- You want to create a dedicated service account for chaos testing
How to configure:
Create a service account in the target cluster with appropriate permissions:
# Create service account
kubectl create serviceaccount krkn-operator -n krkn-system
# Create ClusterRole with necessary permissions
kubectl create clusterrolebinding krkn-operator-admin \
--clusterrole=cluster-admin \
--serviceaccount=krkn-system:krkn-operator
# Get the service account token
kubectl create token krkn-operator -n krkn-system --duration=8760h
In the “Add New Target” dialog:
- Enter a Cluster Name
- Select Service Account Token as the Authentication Type
- Enter the API Server URL (e.g.,
https://api.cluster.example.com:6443) - Paste the Service Account Token you generated
- (Optional) Provide CA Certificate data if your cluster uses a self-signed or custom Certificate Authority
- Click Create
About CA Certificate (Optional):
The CA Certificate field is optional and only needed in specific scenarios:
- When to provide it: If your cluster uses a self-signed certificate or a custom/private Certificate Authority (CA) that is not trusted by default
- When to skip it: If your cluster uses certificates from a public CA (like Let’s Encrypt, DigiCert, etc.) or standard cloud provider certificates
- What it does: The CA certificate allows the Krkn Operator to verify the identity of your cluster’s API server and establish a secure TLS connection
- How to get it: Extract the CA certificate from your cluster’s kubeconfig file (the
certificate-authority-data field, base64-decoded) or from your cluster administrator
Example of extracting CA certificate from kubeconfig:
# Extract and decode CA certificate
kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d > ca.crt
Token Expiration: Service account tokens can expire. If your cluster targets stop working, check if the token has expired and generate a new one.
Method 3: Username/Password
Use basic authentication with a username and password. This method is only supported by clusters that have basic auth enabled.
When to use:
- Your cluster supports basic authentication
- You’re testing in a development environment
- You have credentials for a user with appropriate permissions
How to configure:
- In the “Add New Target” dialog:
- Enter a Cluster Name
- Select Username/Password as the Authentication Type
- Enter the API Server URL (e.g.,
https://api.cluster.example.com:6443) - Enter your Username
- Enter your Password
- (Optional) Provide CA Certificate data if your cluster uses a self-signed or custom Certificate Authority
- Click Create
About CA Certificate (Optional):
Same as with token authentication, the CA Certificate is optional:
- When needed: Only if your cluster uses self-signed certificates or a custom/private Certificate Authority
- When to skip: If using public CA certificates or standard cloud provider setups
- Purpose: Enables secure TLS verification when connecting to the cluster’s API server
Security Warning: Basic authentication is less secure than certificate-based or token-based authentication. It’s recommended only for development and testing environments. Most production Kubernetes/OpenShift clusters have basic auth disabled by default.
Verifying Target Cluster
After adding a target cluster, the Krkn Operator will attempt to connect to it and verify the credentials.
Successful Configuration
If the cluster is configured correctly, you’ll see it appear in the Cluster Targets list with a green status indicator. You can now use this cluster as a target for chaos scenarios.
Troubleshooting Connection Issues
If the cluster connection fails, check the following:
| Issue | Possible Cause | Solution |
|---|
| Connection timeout | Incorrect API server URL | Verify the API server URL is correct and accessible from the operator |
| Authentication failed | Invalid credentials | Re-check your kubeconfig, token, or username/password |
| Certificate error | CA certificate mismatch | Provide the correct CA certificate for clusters with custom CAs |
| Permission denied | Insufficient RBAC permissions | Ensure the service account or user has cluster-admin or necessary permissions |
| Network unreachable | Firewall or network policy | Ensure the Krkn Operator can reach the target cluster’s API server |
You can view detailed error messages in the operator logs:
kubectl logs -n krkn-operator-system -l app.kubernetes.io/name=krkn-operator -c manager
Managing Target Clusters
Navigate to Admin Settings → Cluster Targets to see all configured target clusters. Each cluster shows:
- Cluster name
- Connection status
- Last verified time
- Authentication method used
Editing a Target Cluster
To modify an existing target cluster:
- Click the Edit button next to the cluster in the list
- Update the cluster name or authentication credentials
- Click Save
Removing a Target Cluster
To remove a target cluster:
- Click the Delete button next to the cluster in the list
- Confirm the deletion
Active Scenarios: If you delete a target cluster that has running chaos scenarios, those scenarios will be terminated immediately.
Required Permissions
The service account or user used to connect to target clusters needs the following permissions:
Minimum RBAC Permissions
For most chaos scenarios, the operator needs cluster-admin privileges or at least these permissions:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: krkn-operator-target-access
rules:
# Pod chaos scenarios
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/exec"]
verbs: ["get", "list", "watch", "create", "delete", "deletecollection"]
# Node chaos scenarios
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "update", "patch"]
# Deployment/StatefulSet/DaemonSet scenarios
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets", "daemonsets", "replicasets"]
verbs: ["get", "list", "watch", "update", "patch", "delete"]
# Service and networking scenarios
- apiGroups: [""]
resources: ["services", "endpoints"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
# Namespace scenarios
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list", "watch"]
# Job creation for scenario execution
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
# Events for monitoring
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "list", "watch"]
OpenShift Clusters: For OpenShift clusters, you may also need permissions for OpenShift-specific resources like Route, DeploymentConfig, and Project.
Best Practices
Use Dedicated Service Accounts: Create a dedicated service account in each target cluster specifically for chaos testing. This makes it easier to audit and control permissions.
Rotate Credentials Regularly: Periodically rotate kubeconfig files and service account tokens to maintain security.
Test Connectivity First: After adding a target cluster, run a simple non-destructive scenario to verify connectivity before running destructive chaos tests.
Organize by Environment: Use clear naming conventions like prod-us-east-1, staging-eu-west, dev-local to easily identify clusters.
Limit Production Access: Consider restricting production cluster access to specific users or requiring additional approval workflows.
Monitor Operator Logs: Regularly check operator logs for authentication errors or connection issues.
ACM/OCM Integration (Advanced)
For organizations using Red Hat Advanced Cluster Management (ACM) or Open Cluster Management (OCM), the Krkn Operator provides seamless integration that automatically discovers and manages all ACM-controlled clusters as chaos testing targets.
What is ACM/OCM?
Advanced Cluster Management (ACM) and Open Cluster Management (OCM) are multi-cluster management platforms that allow you to manage multiple Kubernetes and OpenShift clusters from a single hub cluster. ACM/OCM provides:
- Centralized cluster lifecycle management - Deploy, upgrade, and manage multiple clusters
- Application deployment across clusters - Deploy applications to multiple clusters with policies
- Governance and compliance - Apply security and compliance policies across your fleet
- Observability - Monitor metrics, logs, and alerts from all managed clusters
How ACM Integration Works
When the ACM integration is enabled in the Krkn Operator, the krkn-operator-acm component automatically:
- Discovers all managed clusters registered with your ACM/OCM hub
- Imports them as chaos testing targets into the Krkn Operator console
- Keeps the cluster list synchronized as new clusters are added or removed from ACM
- Authenticates automatically using ACM’s
ManagedServiceAccount resources—no manual credential management required
Zero Configuration: Once ACM integration is enabled, you don’t need to manually add clusters, provide kubeconfig files, or manage authentication tokens. The operator handles everything automatically through ACM’s native authentication mechanisms.
Benefits of ACM Integration
| Feature | Manual Configuration | ACM Integration |
|---|
| Cluster Discovery | Manual - add each cluster individually | Automatic - all ACM-managed clusters |
| Credential Management | Manual - maintain tokens/kubeconfig per cluster | Automatic - uses ManagedServiceAccount |
| Cluster Updates | Manual - update credentials when they change | Automatic - ACM handles rotation |
| New Clusters | Manual - must add explicitly | Automatic - discovered immediately |
| Security | Per-cluster authentication | Centralized ACM RBAC with fine-grained control |
Enabling ACM Integration
Step 1: Install with ACM Enabled
To enable ACM integration, install the Krkn Operator with the ACM component enabled via Helm:
helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
--version <VERSION> \
--set acm.enabled=true \
--namespace krkn-operator-system \
--create-namespace
Or add it to your values.yaml:
acm:
enabled: true
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
logging:
level: info
format: json
For complete installation instructions and additional configuration options, see the Installation Guide.
Hub Cluster Requirement: The Krkn Operator must be installed on the same cluster where ACM/OCM is running (the hub cluster). It will then discover all spoke clusters managed by that ACM instance.
Step 2: Verify ACM Component
After installation, verify that the ACM component is running:
kubectl get pods -n krkn-operator-system -l app.kubernetes.io/component=acm
# Expected output:
# NAME READY STATUS RESTARTS AGE
# krkn-operator-acm-xxxxxxxxx-xxxxx 1/1 Running 0 2m
Check the ACM component logs to see cluster discovery in action:
kubectl logs -n krkn-operator-system -l app.kubernetes.io/component=acm
# You should see logs like:
# INFO Discovered 5 managed clusters from ACM
# INFO Synced cluster: production-us-east
# INFO Synced cluster: staging-eu-west
Configuring ManagedServiceAccounts (Fine-Grained Security)
One of the most powerful features of ACM integration is the ability to use ManagedServiceAccounts for authentication to target clusters. This provides fine-grained, per-cluster security control.
What are ManagedServiceAccounts?
ManagedServiceAccounts are a feature of OCM/ACM that allows the hub cluster to create and manage service accounts on spoke clusters. Instead of using a single highly-privileged service account (like open-cluster-management-agent-addon-application-manager), you can create dedicated service accounts with custom RBAC permissions for each cluster.
Configuring Per-Cluster Service Accounts
Navigate to Admin Settings → Provider Configuration → ACM to configure which ManagedServiceAccount to use for each cluster:

For each managed cluster, you can:
- Select a ManagedServiceAccount: Choose from existing ManagedServiceAccounts created on that cluster
- Customize permissions per cluster: Each cluster can use a different service account with different RBAC permissions
- Apply the configuration: The Krkn Operator will use this service account for all chaos testing operations on that cluster
Why Use Custom ManagedServiceAccounts?
By default, ACM uses the open-cluster-management-agent-addon-application-manager service account, which has cluster-admin privileges on all spoke clusters. While convenient, this violates the principle of least privilege.
Using custom ManagedServiceAccounts provides:
Enhanced Security:
- Least privilege access: Grant only the permissions needed for chaos testing (e.g., pod deletion, network policy creation) rather than full cluster-admin
- Per-cluster customization: Production clusters can have more restrictive permissions than dev/test clusters
- Audit trail: Each cluster has a dedicated service account, making it easier to track and audit chaos testing activities
Flexibility:
- Environment-specific policies: Different permissions for prod, staging, and dev environments
- Scenario-specific accounts: Create different service accounts for different types of chaos scenarios
- Compliance: Meet security and compliance requirements by limiting operator privileges
Example: Creating a Custom ManagedServiceAccount
Create a ManagedServiceAccount with limited chaos testing permissions:
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: krkn-chaos-operator
namespace: cluster-prod-us-east # ManagedCluster namespace
spec:
rotation: {}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: krkn-chaos-limited
rules:
# Pod chaos - read and delete only
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "delete"]
# Node chaos - read and cordon/drain only
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "update", "patch"]
# Network policies - create and delete
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies"]
verbs: ["get", "list", "create", "delete"]
# No destructive operations on critical resources
# (no namespace deletion, no service account manipulation, etc.)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: krkn-chaos-limited-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: krkn-chaos-limited
subjects:
- kind: ServiceAccount
name: krkn-chaos-operator
namespace: open-cluster-management-agent-addon
Apply this to the ACM hub cluster, and the ManagedServiceAccount will be created on the spoke cluster automatically. You can then select it in the Provider Configuration UI.
Security Best Practice: Create different ManagedServiceAccounts for different environments. For example:
krkn-prod with minimal permissions (only non-destructive scenarios)krkn-staging with moderate permissions (most scenarios)krkn-dev with full chaos permissions (all scenarios)
Automatic Cluster Synchronization
Once ACM integration is enabled and configured, the Krkn Operator automatically:
- Syncs cluster list every 60 seconds (configurable)
- Adds new clusters as they’re imported into ACM
- Removes clusters that are deleted from ACM
- Updates cluster status based on ACM health checks
- Rotates credentials automatically when ManagedServiceAccount tokens are refreshed
You can view all ACM-discovered clusters in the Cluster Targets page. They will be marked with an ACM badge to distinguish them from manually configured clusters.
Troubleshooting ACM Integration
ACM Component Not Starting
If the ACM component fails to start, check:
# Check pod status
kubectl get pods -n krkn-operator-system -l app.kubernetes.io/component=acm
# View logs
kubectl logs -n krkn-operator-system -l app.kubernetes.io/component=acm
# Common issues:
# - ACM/OCM not installed on the hub cluster
# - Missing RBAC permissions for the operator to read ManagedCluster resources
# - Network policies blocking communication
No Clusters Discovered
If the ACM component is running but no clusters appear:
Verify ACM is managing clusters:
kubectl get managedclusters
Check if clusters are in “Ready” state:
kubectl get managedclusters -o wide
Review ACM component logs for discovery errors:
kubectl logs -n krkn-operator-system -l app.kubernetes.io/component=acm | grep -i error
ManagedServiceAccount Not Working
If a cluster shows authentication errors after configuring a ManagedServiceAccount:
Verify the ManagedServiceAccount exists and is ready:
kubectl get managedserviceaccount -n <cluster-namespace>
Check the ManagedServiceAccount has proper RBAC permissions on the spoke cluster
Ensure the ManagedServiceAccount token hasn’t expired
For more detailed troubleshooting, see the ACM Integration Troubleshooting Guide.
Next Steps
Now that you’ve configured your target clusters (manually or via ACM), you’re ready to run chaos scenarios:
3 - Usage
Learn how to run chaos scenarios with Krkn Operator
This guide walks you through the process of running chaos engineering scenarios using the Krkn Operator web interface.
Overview
The Krkn Operator provides an intuitive web interface for executing chaos scenarios against your Kubernetes clusters. The workflow is straightforward: select your target clusters, choose a scenario registry, pick a scenario, configure it, and launch the experiment. The operator handles all the complexity of scheduling, execution, and monitoring.
Step 1: Starting a Scenario Run
From the Krkn Operator home page, you’ll see the main dashboard with an overview of your configured targets and recent scenario runs.

To begin running a chaos scenario, click the Run Scenario button. This will launch the scenario configuration wizard that guides you through the setup process.
Step 2: Selecting Target Clusters
The first step in the wizard is selecting which clusters you want to target with your chaos experiment.

One of the powerful features of Krkn Operator is its ability to run scenarios across multiple clusters simultaneously. If you have configured multiple target providers (such as manual targets and ACM-managed clusters), all available clusters will be presented in a unified view.
Key capabilities:
- Multi-cluster selection: Select one or more target clusters to run the same scenario across multiple environments
- Unified view: All clusters from all configured providers (manual targets, ACM, etc.) are displayed together
- Parallel execution: When multiple targets are selected, the scenario will execute on all of them concurrently
This is particularly useful for testing:
- Consistency of behavior across environments (dev, staging, production)
- Regional cluster resilience
- Multi-tenant cluster configurations
- Different Kubernetes distributions or versions
Step 3: Selecting a Scenario Registry
After selecting your target clusters, you’ll choose where to pull the chaos scenario container images from.

Krkn Operator supports two types of registries:
Quay.io (Default)
The default option is the official Krkn Chaos registry on Quay.io, which contains all the pre-built, tested chaos scenarios maintained by the Krkn community. This is the recommended choice for most users as it provides:
- Immediate access to 20+ chaos scenarios
- Regular updates and new scenario releases
- Pre-validated and tested scenario images
Private Registry
For organizations with specific requirements, you can configure a private container registry. This is useful when you need to:
- Run custom or modified chaos scenarios
- Operate in restricted network environments
- Maintain full control over scenario versions
- Meet compliance or security requirements
Air-Gapped and Disconnected Environments: Krkn Operator uses the OCI registry itself as the backend for scenario metadata through OCI registry APIs. This means that in a private registry configuration, the operator can function completely in disconnected or air-gapped environments without requiring external connectivity. All scenario definitions, metadata, and images are stored and retrieved from your private registry.
To use a private registry, you’ll need to:
- Configure the private registry in the Configuration section
- Push the Krkn scenario images to your private registry
- Ensure the operator has proper authentication credentials
Step 4: Selecting a Chaos Scenario
After choosing your registry, you’ll be presented with a list of available chaos scenarios to run against your target clusters.

The scenario selection page displays all available chaos scenarios from the chosen registry. Each scenario card shows:
- Scenario name and description
- Scenario type (pod, node, network, etc.)
- Version information
Browse through the available scenarios and select the one that matches your chaos engineering objectives. For detailed information about each scenario and what it does, refer to the Scenarios documentation.
Step 5: Configuring Scenario Parameters
Once you’ve selected a scenario, you’ll move to the configuration phase where you can customize the scenario’s behavior to match your testing requirements.
Mandatory Parameters

Mandatory parameters are scenario-specific settings that must be configured before running the chaos experiment. When a scenario has mandatory parameters, you cannot proceed without providing values for them.
Important notes:
- Required when present: If a scenario displays mandatory parameters, you must fill them in—there are no defaults
- Not all scenarios have them: Some scenarios can run without any mandatory configuration
- Scenario-specific: Different scenarios have different mandatory parameters based on what they’re testing
If a scenario has no mandatory parameters, it can technically run with just the built-in defaults. However, running with defaults alone may not produce the desired chaos effect on your cluster, as the scenario won’t be tailored to your specific environment and applications.
Best Practice: Even when mandatory parameters aren’t present, review the optional parameters to ensure the scenario targets the right resources and behaves as expected in your environment. For example, a pod deletion scenario might run with defaults, but you’ll want to configure it to target your specific application namespace and workloads.
Optional Parameters

Optional parameters provide fine-grained control over the scenario’s behavior. These parameters:
- Allow you to customize the chaos experiment beyond the basic configuration
- Are entirely optional—scenarios run perfectly fine without setting them
- Enable advanced testing patterns (custom filters, label selectors, timing controls, etc.)
Examples of optional parameters might include:
- Label selectors to target specific pods
- Duration and interval settings
- Percentage of resources to affect
- Custom filters or exclusion rules
Global Options

Global options control the behavior of the Krkn framework itself, not the specific scenario. These settings enable integration with observability and monitoring tools:
- Elasticsearch integration: Send scenario metrics and results to Elasticsearch
- Prometheus integration: Export chaos metrics to Prometheus
- Alert collection: Capture and analyze alerts triggered during the chaos experiment
- Custom dashboards: Configure metrics export for custom monitoring dashboards
- Cerberus integration: Enable health monitoring during chaos runs
Default Value Handling: Global options are only applied if you modify them from their default values in the form. If you leave a global option at its default setting, it will not be included in the scenario configuration. This prevents unnecessary configuration bloat and ensures only intentional customizations are applied.
After configuring all parameters, click Run Scenario to launch the chaos experiment.
Monitoring Scenario Runs
Once you launch a scenario, you can monitor its execution in real-time through the Krkn Operator web interface.
Active Scenarios Dashboard

The home page displays all active scenario runs across all target clusters. Each scenario card shows:
- Scenario name and type
- Target cluster(s) where it’s running
- Current status (running, completed, failed)
- Start time and duration
- User who initiated the run
From this dashboard, you can:
- View all running experiments at a glance
- Click on a scenario to see detailed execution information
- Stop or cancel running scenarios (if you have permissions)
Scenario Run Details

Clicking on a running scenario opens the detailed view, which provides:
- Real-time container logs: Watch the chaos scenario execute with live log streaming
- Execution timeline: See when the scenario started, its current phase, and expected completion
- Configuration details: Review the parameters that were used for this run
- Target information: Verify which cluster(s) the scenario is affecting
- Status updates: Real-time status changes as the scenario progresses through its phases
The live log streaming is particularly useful for:
- Debugging scenario failures
- Understanding what the chaos experiment is currently doing
- Verifying that the chaos is being injected as expected
- Capturing evidence for post-experiment analysis
User Permissions and Visibility
Role-Based Access Control: Scenario visibility and management capabilities depend on your user role.
Administrator users can:
- View all scenario runs from all users
- Manage any running scenario
- Cancel experiments initiated by any user
Regular users can:
- View only their own scenario runs
- Manage only scenarios they initiated
- Scenarios started by other users are not visible to them
This role-based access control ensures that teams can work independently while administrators maintain oversight and control of all chaos engineering activities.
What’s Next?
Now that you understand how to run and monitor chaos scenarios with Krkn Operator, you might want to: