This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

What is krkn-operator?

Kubernetes Operator for Krkn Chaos Engineering

Overview

krkn-operator is a Kubernetes Operator that orchestrates Krkn-based chaos scenarios using Kubernetes as the execution platform instead of Docker/Podman as krknctl does.

Cloud-Native Architecture

krkn-operator is built following cloud-native best practices:

  • All component interactions happen through Kubernetes Custom Resource Definitions (CRDs)
  • Fully declarative configuration
  • Native integration with Kubernetes security model

Important: Multi-Cluster Design

A critical architectural principle of krkn-operator is that the cluster running the operator does NOT execute chaos scenarios against itself. Instead:

  • The control plane cluster runs krkn-operator and orchestrates chaos execution
  • Target clusters are where chaos scenarios are actually injected
  • This design preserves the original Krkn architecture where chaos testing is performed from an external control point

This separation ensures that chaos experiments cannot destabilize the orchestration layer itself.

Security Benefits

One of the major advantages of krkn-operator over previous approaches (krknctl, krkn-hub containers) is enhanced credential security:

Previous Approach (krknctl / krkn-hub)

  • Users needed direct access to target cluster credentials (kubeconfig files, service account tokens)
  • Credential sharing made user onboarding/offboarding complex and risky
  • Each user managed their own credentials, increasing the attack surface

krkn-operator Approach

  • Target cluster credentials are configured once by the krkn-operator administrator
  • Users are granted access through the KrknUser CRD, a custom resource that manages user permissions
  • No cluster credentials are shared with end users
  • User permissions are managed declaratively through KrknUser resources
  • Simplified and secure onboarding/offboarding process

Modular Design

krkn-operator features a modular, extensible architecture that supports integration with various target providers:

  • Exposes well-defined interfaces for target provider integration operators
  • Allows extending chaos capabilities to different cluster management platforms
  • Example: krkn-operator-acm provides integration with Red Hat Advanced Cluster Management (ACM) and Open Cluster Management (OCM)

This design enables organizations to integrate krkn-operator with their existing cluster management infrastructure seamlessly.

Getting Started

Documentation for installation and configuration is coming soon.

1 - Installation

Install krkn-operator using Helm

This guide walks you through installing krkn-operator using Helm, the recommended installation method.

Prerequisites

  • Kubernetes 1.19+ or OpenShift 4.x
  • Helm 3.0+
  • A Kubernetes cluster (kind, minikube, or production cluster)

Quick Start (kind/minikube)

Perfect for testing and local development, this minimal installation gets krkn-operator running quickly on kind or minikube.

Latest Version: loading…

The version number is automatically updated in the commands below. For other available versions, see the releases page.

1. Install krkn-operator

helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator --version <VERSION>

This installs krkn-operator with default settings in the current namespace.

3. Verify Installation

kubectl get pods -l app.kubernetes.io/name=krkn-operator

Expected output:

NAME                              READY   STATUS    RESTARTS   AGE
krkn-operator-xxxxxxxxx-xxxxx     2/2     Running   0          1m

4. Access the Console (Optional)

For local testing, use port-forwarding to access the web console:

kubectl port-forward svc/krkn-operator-console 3000:3000

Then open http://localhost:3000 in your browser.


Production Installation

For production deployments, you’ll want to customize the installation with a values.yaml file to ensure high availability, proper resource limits, monitoring integration, and secure external access.

When to Use Each Installation Method

Choose the installation method that matches your environment and requirements:

MethodUse WhenKey Features
Quick StartTesting on kind/minikube, local development, POCMinimal configuration, port-forward access, no HA
Production (Kubernetes)Running on standard Kubernetes (EKS, GKE, AKS, self-managed)Ingress for external access, HA setup, resource limits, monitoring
Production (OpenShift)Running on OpenShift/OKD clustersOpenShift Routes instead of Ingress, enhanced security contexts, HA setup

The main differences between production installations are:

  • Kubernetes can use either:
    • Gateway API (recommended) - Modern routing standard with powerful features
    • Ingress (legacy) - Traditional method, still widely supported
  • OpenShift uses Routes for external access (native OpenShift feature, no additional controller needed)
  • Production configurations add replica counts, resource limits, pod disruption budgets, and monitoring compared to Quick Start

All production methods support the same chaos scenarios and core functionality—the choice depends on your platform and infrastructure preferences.

Installation on Kubernetes

Kubernetes clusters can expose the web console using either Gateway API (recommended) or Ingress (legacy).

Gateway API is the modern successor to Ingress and provides more powerful and flexible routing capabilities.

Prerequisites:

  • Gateway API CRDs installed in your cluster (installation guide)
  • A Gateway resource already deployed (usually managed by cluster admins)

Create a values.yaml file:

# Production values for Kubernetes with Gateway API

# Enable web console with Gateway API
console:
  enabled: true
  gateway:
    enabled: true
    gatewayName: krkn-gateway  # Name of your existing Gateway
    gatewayNamespace: ""  # Optional: if Gateway is in a different namespace
    hostname: krkn.example.com
    path: /
    pathType: PathPrefix

# Operator configuration
operator:
  replicaCount: 2
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi
  logging:
    level: info
    format: json

# High availability
podDisruptionBudget:
  enabled: true
  minAvailable: 1

# Monitoring (if using Prometheus)
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 30s

Note: Gateway API assumes you have a Gateway resource already configured in your cluster. The chart creates only the HTTPRoute that attaches to that Gateway.

Option 2: Using Ingress (Legacy)

If your cluster doesn’t support Gateway API yet, you can use traditional Ingress:

# Production values for Kubernetes with Ingress

# Enable web console with Ingress
console:
  enabled: true
  ingress:
    enabled: true
    className: nginx  # or your ingress controller
    hostname: krkn.example.com
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    tls:
      - secretName: krkn-tls
        hosts:
          - krkn.example.com

# Operator configuration
operator:
  replicaCount: 2
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi
  logging:
    level: info
    format: json

# High availability
podDisruptionBudget:
  enabled: true
  minAvailable: 1

# Monitoring (if using Prometheus)
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 30s

Install with your custom values:

helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
  --version <VERSION> \
  --namespace krkn-operator-system \
  --create-namespace \
  -f values.yaml

Installation on OpenShift

OpenShift uses Routes instead of Ingress. Create an OpenShift-specific values.yaml:

# Production values for OpenShift

# Enable web console with Route
console:
  enabled: true
  route:
    enabled: true
    hostname: krkn.apps.cluster.example.com
    tls:
      termination: edge

# Operator configuration
operator:
  replicaCount: 2
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

# High availability
podDisruptionBudget:
  enabled: true
  minAvailable: 1

Install on OpenShift:

helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
  --version <VERSION> \
  --namespace krkn-operator-system \
  --create-namespace \
  -f values-openshift.yaml

Advanced Configuration Options

Enable ACM Integration

To enable Red Hat Advanced Cluster Management (ACM) / Open Cluster Management (OCM) integration:

acm:
  enabled: true
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi

Install with ACM enabled:

helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
  --version <VERSION> \
  --set acm.enabled=true \
  --namespace krkn-operator-system \
  --create-namespace

Custom Namespace

Install in a custom namespace:

helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
  --version <VERSION> \
  --namespace my-chaos-platform \
  --create-namespace \
  --set namespaceOverride=my-chaos-platform

Image Registry Override

If you’re using a private registry or mirror:

operator:
  image: myregistry.io/krkn-chaos/krkn-operator:<VERSION>
  pullPolicy: IfNotPresent

dataProvider:
  image: myregistry.io/krkn-chaos/data-provider:<VERSION>

pullSecrets:
  - name: my-registry-secret

JWT Configuration

Customize JWT token settings for authentication:

jwtSecret: bXktc2VjdXJlLWp3dC1rZXktYmFzZTY0LWVuY29kZWQ=  # Base64 encoded
jwtExpiryHours: 72  # 3 days

Complete values.yaml Reference

Here’s a comprehensive values.yaml with all available options:

# Namespace configuration
namespaceOverride: ""

# Image configuration
operator:
  image: quay.io/krkn-chaos/krkn-operator:latest
  pullPolicy: IfNotPresent
  enabled: true
  replicaCount: 1

  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi

  dataProvider:
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 256Mi

  service:
    type: ClusterIP
    port: 8080
    grpcPort: 50051

  logging:
    level: info  # debug, info, warn, error
    format: json  # json or text

  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

  nodeSelector: {}
  tolerations: []
  affinity: {}
  extraEnv: []

dataProvider:
  image: quay.io/krkn-chaos/data-provider:latest

# ACM Integration (Optional)
acm:
  enabled: false
  image: quay.io/krkn-chaos/krkn-operator-acm:latest
  replicaCount: 1

  config:
    secretName: ""  # ACM cluster credentials secret

  service:
    port: 8081

  logging:
    level: info
    format: json

  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi

  nodeSelector: {}
  tolerations: []
  affinity: {}

# Web Console (Optional)
console:
  enabled: true
  image: quay.io/krkn-chaos/console:latest
  replicaCount: 1

  service:
    type: ClusterIP
    port: 3000
    nodePort: null  # Only for NodePort service type

  # Kubernetes Ingress (legacy)
  ingress:
    enabled: false
    className: nginx
    hostname: krkn.example.com
    annotations: {}
    tls: []

  # Gateway API (recommended for Kubernetes)
  gateway:
    enabled: false
    gatewayName: krkn-gateway
    gatewayNamespace: ""
    sectionName: ""
    hostname: krkn.example.com
    path: /
    pathType: PathPrefix
    annotations: {}

  # OpenShift Route
  route:
    enabled: false
    hostname: ""
    tls:
      termination: edge  # edge, passthrough, or reencrypt

  resources:
    requests:
      cpu: 50m
      memory: 64Mi
    limits:
      cpu: 200m
      memory: 256Mi

  nodeSelector: {}
  tolerations: []
  affinity: {}

# Image pull secrets
pullSecrets: []

# JWT Authentication
jwtSecret: ""  # Base64 encoded; auto-generated if empty
jwtExpiryHours: 24

# RBAC
rbac:
  create: true

# Service Account
serviceAccount:
  create: true
  name: ""
  annotations: {}

# CRDs
crds:
  keep: true  # Keep CRDs after uninstall

# Monitoring
monitoring:
  enabled: false
  service:
    port: 8443
  serviceMonitor:
    enabled: false
    interval: 30s

# Network Policy
networkPolicy:
  enabled: false
  ingress: []
  egress: []

# Update Strategy
updateStrategy:
  type: RollingUpdate

# Pod Disruption Budget
podDisruptionBudget:
  enabled: false
  minAvailable: 1

# Common labels and annotations
commonLabels: {}
commonAnnotations: {}

# Naming
nameOverride: ""
fullnameOverride: ""

Verification

After installation, verify all components are running:

# Check operator pods
kubectl get pods -n krkn-operator-system

# Check services
kubectl get svc -n krkn-operator-system

# Check CRDs
kubectl get crds | grep krkn

# View operator logs
kubectl logs -n krkn-operator-system -l app.kubernetes.io/name=krkn-operator -c manager

Upgrading

To upgrade to a newer version:

helm upgrade krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
  --version <VERSION> \
  --namespace krkn-operator-system \
  -f values.yaml

Uninstalling

To remove krkn-operator:

helm uninstall krkn-operator --namespace krkn-operator-system

Next Steps

2 - Configuration

Configure target clusters for chaos testing

This guide walks you through configuring target Kubernetes or OpenShift clusters where you want to run chaos engineering scenarios.

Overview

Before running chaos experiments, you need to add one or more target clusters to the Krkn Operator. Target clusters are the Kubernetes/OpenShift clusters where chaos scenarios will be executed. You can add multiple target clusters and manage them through the web console.


Accessing Cluster Configuration

Step 1: Open Admin Settings

Log in to the Krkn Operator Console and click on your profile in the top-right corner. Select Admin Settings from the dropdown menu.

Admin Settings Menu

Step 2: Navigate to Cluster Targets

In the Admin Settings page, click on the Cluster Targets tab in the left sidebar. This will show you a list of all configured target clusters (if any).


Adding a New Target Cluster

Step 3: Open the Add Target Dialog

Click the Add Target button in the top-right corner of the Cluster Targets page. This will open the “Add New Target” dialog.

Add New Target Dialog

Step 4: Enter Cluster Information

You’ll need to provide:

  1. Cluster Name (required): A friendly name to identify this cluster (e.g., “Production-US-East”, “Dev-Cluster”, “OpenShift-QA”)

  2. Authentication Type (required): Choose one of three authentication methods:

    • Kubeconfig - Full kubeconfig file (recommended)
    • Service Account Token - Token-based authentication
    • Username/Password - Basic authentication (for clusters that support it)

Authentication Methods

The Krkn Operator supports three different ways to authenticate to target clusters. Choose the method that best fits your cluster’s security configuration.

This is the most common and recommended method. It uses a complete kubeconfig file to authenticate to the target cluster.

When to use:

  • You have direct access to the cluster’s kubeconfig file
  • You want to authenticate with certificates or tokens defined in the kubeconfig
  • The cluster supports standard Kubernetes authentication

How to configure:

  1. Select Kubeconfig as the Authentication Type
  2. Obtain the kubeconfig file for your target cluster:
    # For most Kubernetes clusters
    kubectl config view --flatten --minify > target-cluster.kubeconfig
    
    # For OpenShift clusters
    oc login https://api.cluster.example.com:6443
    oc config view --flatten > target-cluster.kubeconfig
    
  3. Open the kubeconfig file in a text editor and copy its entire contents
  4. Paste the kubeconfig content into the Kubeconfig text area in the dialog
  5. Click Create

Example kubeconfig content:

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTi...
    server: https://api.cluster.example.com:6443
  name: my-cluster
contexts:
- context:
    cluster: my-cluster
    user: admin
  name: my-cluster-context
current-context: my-cluster-context
users:
- name: admin
  user:
    client-certificate-data: LS0tLS1CRUdJTi...
    client-key-data: LS0tLS1CRUdJTi...

Method 2: Service Account Token

Use this method if you want to authenticate using a Kubernetes Service Account token.

When to use:

  • You want fine-grained RBAC control over what the operator can do
  • You’re following a zero-trust security model
  • You want to create a dedicated service account for chaos testing

How to configure:

  1. Create a service account in the target cluster with appropriate permissions:

    # Create service account
    kubectl create serviceaccount krkn-operator -n krkn-system
    
    # Create ClusterRole with necessary permissions
    kubectl create clusterrolebinding krkn-operator-admin \
      --clusterrole=cluster-admin \
      --serviceaccount=krkn-system:krkn-operator
    
    # Get the service account token
    kubectl create token krkn-operator -n krkn-system --duration=8760h
    
  2. In the “Add New Target” dialog:

    • Enter a Cluster Name
    • Select Service Account Token as the Authentication Type
    • Enter the API Server URL (e.g., https://api.cluster.example.com:6443)
    • Paste the Service Account Token you generated
    • (Optional) Provide CA Certificate data if your cluster uses a self-signed or custom Certificate Authority
    • Click Create

About CA Certificate (Optional):

The CA Certificate field is optional and only needed in specific scenarios:

  • When to provide it: If your cluster uses a self-signed certificate or a custom/private Certificate Authority (CA) that is not trusted by default
  • When to skip it: If your cluster uses certificates from a public CA (like Let’s Encrypt, DigiCert, etc.) or standard cloud provider certificates
  • What it does: The CA certificate allows the Krkn Operator to verify the identity of your cluster’s API server and establish a secure TLS connection
  • How to get it: Extract the CA certificate from your cluster’s kubeconfig file (the certificate-authority-data field, base64-decoded) or from your cluster administrator

Example of extracting CA certificate from kubeconfig:

# Extract and decode CA certificate
kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d > ca.crt

Method 3: Username/Password

Use basic authentication with a username and password. This method is only supported by clusters that have basic auth enabled.

When to use:

  • Your cluster supports basic authentication
  • You’re testing in a development environment
  • You have credentials for a user with appropriate permissions

How to configure:

  1. In the “Add New Target” dialog:
    • Enter a Cluster Name
    • Select Username/Password as the Authentication Type
    • Enter the API Server URL (e.g., https://api.cluster.example.com:6443)
    • Enter your Username
    • Enter your Password
    • (Optional) Provide CA Certificate data if your cluster uses a self-signed or custom Certificate Authority
    • Click Create

About CA Certificate (Optional):

Same as with token authentication, the CA Certificate is optional:

  • When needed: Only if your cluster uses self-signed certificates or a custom/private Certificate Authority
  • When to skip: If using public CA certificates or standard cloud provider setups
  • Purpose: Enables secure TLS verification when connecting to the cluster’s API server

Verifying Target Cluster

After adding a target cluster, the Krkn Operator will attempt to connect to it and verify the credentials.

Successful Configuration

If the cluster is configured correctly, you’ll see it appear in the Cluster Targets list with a green status indicator. You can now use this cluster as a target for chaos scenarios.

Troubleshooting Connection Issues

If the cluster connection fails, check the following:

IssuePossible CauseSolution
Connection timeoutIncorrect API server URLVerify the API server URL is correct and accessible from the operator
Authentication failedInvalid credentialsRe-check your kubeconfig, token, or username/password
Certificate errorCA certificate mismatchProvide the correct CA certificate for clusters with custom CAs
Permission deniedInsufficient RBAC permissionsEnsure the service account or user has cluster-admin or necessary permissions
Network unreachableFirewall or network policyEnsure the Krkn Operator can reach the target cluster’s API server

You can view detailed error messages in the operator logs:

kubectl logs -n krkn-operator-system -l app.kubernetes.io/name=krkn-operator -c manager

Managing Target Clusters

Viewing Configured Clusters

Navigate to Admin SettingsCluster Targets to see all configured target clusters. Each cluster shows:

  • Cluster name
  • Connection status
  • Last verified time
  • Authentication method used

Editing a Target Cluster

To modify an existing target cluster:

  1. Click the Edit button next to the cluster in the list
  2. Update the cluster name or authentication credentials
  3. Click Save

Removing a Target Cluster

To remove a target cluster:

  1. Click the Delete button next to the cluster in the list
  2. Confirm the deletion

Required Permissions

The service account or user used to connect to target clusters needs the following permissions:

Minimum RBAC Permissions

For most chaos scenarios, the operator needs cluster-admin privileges or at least these permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: krkn-operator-target-access
rules:
# Pod chaos scenarios
- apiGroups: [""]
  resources: ["pods", "pods/log", "pods/exec"]
  verbs: ["get", "list", "watch", "create", "delete", "deletecollection"]

# Node chaos scenarios
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch", "update", "patch"]

# Deployment/StatefulSet/DaemonSet scenarios
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets", "daemonsets", "replicasets"]
  verbs: ["get", "list", "watch", "update", "patch", "delete"]

# Service and networking scenarios
- apiGroups: [""]
  resources: ["services", "endpoints"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]

- apiGroups: ["networking.k8s.io"]
  resources: ["networkpolicies"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]

# Namespace scenarios
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["get", "list", "watch"]

# Job creation for scenario execution
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]

# Events for monitoring
- apiGroups: [""]
  resources: ["events"]
  verbs: ["get", "list", "watch"]

Best Practices

  1. Use Dedicated Service Accounts: Create a dedicated service account in each target cluster specifically for chaos testing. This makes it easier to audit and control permissions.

  2. Rotate Credentials Regularly: Periodically rotate kubeconfig files and service account tokens to maintain security.

  3. Test Connectivity First: After adding a target cluster, run a simple non-destructive scenario to verify connectivity before running destructive chaos tests.

  4. Organize by Environment: Use clear naming conventions like prod-us-east-1, staging-eu-west, dev-local to easily identify clusters.

  5. Limit Production Access: Consider restricting production cluster access to specific users or requiring additional approval workflows.

  6. Monitor Operator Logs: Regularly check operator logs for authentication errors or connection issues.


ACM/OCM Integration (Advanced)

For organizations using Red Hat Advanced Cluster Management (ACM) or Open Cluster Management (OCM), the Krkn Operator provides seamless integration that automatically discovers and manages all ACM-controlled clusters as chaos testing targets.

What is ACM/OCM?

Advanced Cluster Management (ACM) and Open Cluster Management (OCM) are multi-cluster management platforms that allow you to manage multiple Kubernetes and OpenShift clusters from a single hub cluster. ACM/OCM provides:

  • Centralized cluster lifecycle management - Deploy, upgrade, and manage multiple clusters
  • Application deployment across clusters - Deploy applications to multiple clusters with policies
  • Governance and compliance - Apply security and compliance policies across your fleet
  • Observability - Monitor metrics, logs, and alerts from all managed clusters

How ACM Integration Works

When the ACM integration is enabled in the Krkn Operator, the krkn-operator-acm component automatically:

  1. Discovers all managed clusters registered with your ACM/OCM hub
  2. Imports them as chaos testing targets into the Krkn Operator console
  3. Keeps the cluster list synchronized as new clusters are added or removed from ACM
  4. Authenticates automatically using ACM’s ManagedServiceAccount resources—no manual credential management required

Benefits of ACM Integration

FeatureManual ConfigurationACM Integration
Cluster DiscoveryManual - add each cluster individuallyAutomatic - all ACM-managed clusters
Credential ManagementManual - maintain tokens/kubeconfig per clusterAutomatic - uses ManagedServiceAccount
Cluster UpdatesManual - update credentials when they changeAutomatic - ACM handles rotation
New ClustersManual - must add explicitlyAutomatic - discovered immediately
SecurityPer-cluster authenticationCentralized ACM RBAC with fine-grained control

Enabling ACM Integration

Step 1: Install with ACM Enabled

To enable ACM integration, install the Krkn Operator with the ACM component enabled via Helm:

helm install krkn-operator oci://quay.io/krkn-chaos/charts/krkn-operator \
  --version <VERSION> \
  --set acm.enabled=true \
  --namespace krkn-operator-system \
  --create-namespace

Or add it to your values.yaml:

acm:
  enabled: true
  replicaCount: 1

  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 200m
      memory: 256Mi

  logging:
    level: info
    format: json

For complete installation instructions and additional configuration options, see the Installation Guide.

Step 2: Verify ACM Component

After installation, verify that the ACM component is running:

kubectl get pods -n krkn-operator-system -l app.kubernetes.io/component=acm

# Expected output:
# NAME                                  READY   STATUS    RESTARTS   AGE
# krkn-operator-acm-xxxxxxxxx-xxxxx     1/1     Running   0          2m

Check the ACM component logs to see cluster discovery in action:

kubectl logs -n krkn-operator-system -l app.kubernetes.io/component=acm

# You should see logs like:
# INFO  Discovered 5 managed clusters from ACM
# INFO  Synced cluster: production-us-east
# INFO  Synced cluster: staging-eu-west

Configuring ManagedServiceAccounts (Fine-Grained Security)

One of the most powerful features of ACM integration is the ability to use ManagedServiceAccounts for authentication to target clusters. This provides fine-grained, per-cluster security control.

What are ManagedServiceAccounts?

ManagedServiceAccounts are a feature of OCM/ACM that allows the hub cluster to create and manage service accounts on spoke clusters. Instead of using a single highly-privileged service account (like open-cluster-management-agent-addon-application-manager), you can create dedicated service accounts with custom RBAC permissions for each cluster.

Configuring Per-Cluster Service Accounts

Navigate to Admin SettingsProvider ConfigurationACM to configure which ManagedServiceAccount to use for each cluster:

ACM Provider Configuration

For each managed cluster, you can:

  1. Select a ManagedServiceAccount: Choose from existing ManagedServiceAccounts created on that cluster
  2. Customize permissions per cluster: Each cluster can use a different service account with different RBAC permissions
  3. Apply the configuration: The Krkn Operator will use this service account for all chaos testing operations on that cluster

Why Use Custom ManagedServiceAccounts?

By default, ACM uses the open-cluster-management-agent-addon-application-manager service account, which has cluster-admin privileges on all spoke clusters. While convenient, this violates the principle of least privilege.

Using custom ManagedServiceAccounts provides:

Enhanced Security:

  • Least privilege access: Grant only the permissions needed for chaos testing (e.g., pod deletion, network policy creation) rather than full cluster-admin
  • Per-cluster customization: Production clusters can have more restrictive permissions than dev/test clusters
  • Audit trail: Each cluster has a dedicated service account, making it easier to track and audit chaos testing activities

Flexibility:

  • Environment-specific policies: Different permissions for prod, staging, and dev environments
  • Scenario-specific accounts: Create different service accounts for different types of chaos scenarios
  • Compliance: Meet security and compliance requirements by limiting operator privileges

Example: Creating a Custom ManagedServiceAccount

Create a ManagedServiceAccount with limited chaos testing permissions:

apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
  name: krkn-chaos-operator
  namespace: cluster-prod-us-east  # ManagedCluster namespace
spec:
  rotation: {}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: krkn-chaos-limited
rules:
# Pod chaos - read and delete only
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "delete"]

# Node chaos - read and cordon/drain only
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch", "update", "patch"]

# Network policies - create and delete
- apiGroups: ["networking.k8s.io"]
  resources: ["networkpolicies"]
  verbs: ["get", "list", "create", "delete"]

# No destructive operations on critical resources
# (no namespace deletion, no service account manipulation, etc.)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: krkn-chaos-limited-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: krkn-chaos-limited
subjects:
- kind: ServiceAccount
  name: krkn-chaos-operator
  namespace: open-cluster-management-agent-addon

Apply this to the ACM hub cluster, and the ManagedServiceAccount will be created on the spoke cluster automatically. You can then select it in the Provider Configuration UI.


Automatic Cluster Synchronization

Once ACM integration is enabled and configured, the Krkn Operator automatically:

  • Syncs cluster list every 60 seconds (configurable)
  • Adds new clusters as they’re imported into ACM
  • Removes clusters that are deleted from ACM
  • Updates cluster status based on ACM health checks
  • Rotates credentials automatically when ManagedServiceAccount tokens are refreshed

You can view all ACM-discovered clusters in the Cluster Targets page. They will be marked with an ACM badge to distinguish them from manually configured clusters.


Troubleshooting ACM Integration

ACM Component Not Starting

If the ACM component fails to start, check:

# Check pod status
kubectl get pods -n krkn-operator-system -l app.kubernetes.io/component=acm

# View logs
kubectl logs -n krkn-operator-system -l app.kubernetes.io/component=acm

# Common issues:
# - ACM/OCM not installed on the hub cluster
# - Missing RBAC permissions for the operator to read ManagedCluster resources
# - Network policies blocking communication

No Clusters Discovered

If the ACM component is running but no clusters appear:

  1. Verify ACM is managing clusters:

    kubectl get managedclusters
    
  2. Check if clusters are in “Ready” state:

    kubectl get managedclusters -o wide
    
  3. Review ACM component logs for discovery errors:

    kubectl logs -n krkn-operator-system -l app.kubernetes.io/component=acm | grep -i error
    

ManagedServiceAccount Not Working

If a cluster shows authentication errors after configuring a ManagedServiceAccount:

  1. Verify the ManagedServiceAccount exists and is ready:

    kubectl get managedserviceaccount -n <cluster-namespace>
    
  2. Check the ManagedServiceAccount has proper RBAC permissions on the spoke cluster

  3. Ensure the ManagedServiceAccount token hasn’t expired

For more detailed troubleshooting, see the ACM Integration Troubleshooting Guide.


Next Steps

Now that you’ve configured your target clusters (manually or via ACM), you’re ready to run chaos scenarios:

3 - Usage

Learn how to run chaos scenarios with Krkn Operator

This guide walks you through the process of running chaos engineering scenarios using the Krkn Operator web interface.

Overview

The Krkn Operator provides an intuitive web interface for executing chaos scenarios against your Kubernetes clusters. The workflow is straightforward: select your target clusters, choose a scenario registry, pick a scenario, configure it, and launch the experiment. The operator handles all the complexity of scheduling, execution, and monitoring.

Step 1: Starting a Scenario Run

From the Krkn Operator home page, you’ll see the main dashboard with an overview of your configured targets and recent scenario runs.

Krkn Operator Main Screen

To begin running a chaos scenario, click the Run Scenario button. This will launch the scenario configuration wizard that guides you through the setup process.

Step 2: Selecting Target Clusters

The first step in the wizard is selecting which clusters you want to target with your chaos experiment.

Select Target Clusters

One of the powerful features of Krkn Operator is its ability to run scenarios across multiple clusters simultaneously. If you have configured multiple target providers (such as manual targets and ACM-managed clusters), all available clusters will be presented in a unified view.

Key capabilities:

  • Multi-cluster selection: Select one or more target clusters to run the same scenario across multiple environments
  • Unified view: All clusters from all configured providers (manual targets, ACM, etc.) are displayed together
  • Parallel execution: When multiple targets are selected, the scenario will execute on all of them concurrently

This is particularly useful for testing:

  • Consistency of behavior across environments (dev, staging, production)
  • Regional cluster resilience
  • Multi-tenant cluster configurations
  • Different Kubernetes distributions or versions

Step 3: Selecting a Scenario Registry

After selecting your target clusters, you’ll choose where to pull the chaos scenario container images from.

Select Scenario Registry

Krkn Operator supports two types of registries:

Quay.io (Default)

The default option is the official Krkn Chaos registry on Quay.io, which contains all the pre-built, tested chaos scenarios maintained by the Krkn community. This is the recommended choice for most users as it provides:

  • Immediate access to 20+ chaos scenarios
  • Regular updates and new scenario releases
  • Pre-validated and tested scenario images

Private Registry

For organizations with specific requirements, you can configure a private container registry. This is useful when you need to:

  • Run custom or modified chaos scenarios
  • Operate in restricted network environments
  • Maintain full control over scenario versions
  • Meet compliance or security requirements

To use a private registry, you’ll need to:

  1. Configure the private registry in the Configuration section
  2. Push the Krkn scenario images to your private registry
  3. Ensure the operator has proper authentication credentials

Step 4: Selecting a Chaos Scenario

After choosing your registry, you’ll be presented with a list of available chaos scenarios to run against your target clusters.

Select Chaos Scenario

The scenario selection page displays all available chaos scenarios from the chosen registry. Each scenario card shows:

  • Scenario name and description
  • Scenario type (pod, node, network, etc.)
  • Version information

Browse through the available scenarios and select the one that matches your chaos engineering objectives. For detailed information about each scenario and what it does, refer to the Scenarios documentation.

Step 5: Configuring Scenario Parameters

Once you’ve selected a scenario, you’ll move to the configuration phase where you can customize the scenario’s behavior to match your testing requirements.

Mandatory Parameters

Mandatory Scenario Parameters

Mandatory parameters are scenario-specific settings that must be configured before running the chaos experiment. When a scenario has mandatory parameters, you cannot proceed without providing values for them.

Important notes:

  • Required when present: If a scenario displays mandatory parameters, you must fill them in—there are no defaults
  • Not all scenarios have them: Some scenarios can run without any mandatory configuration
  • Scenario-specific: Different scenarios have different mandatory parameters based on what they’re testing

If a scenario has no mandatory parameters, it can technically run with just the built-in defaults. However, running with defaults alone may not produce the desired chaos effect on your cluster, as the scenario won’t be tailored to your specific environment and applications.

Best Practice: Even when mandatory parameters aren’t present, review the optional parameters to ensure the scenario targets the right resources and behaves as expected in your environment. For example, a pod deletion scenario might run with defaults, but you’ll want to configure it to target your specific application namespace and workloads.

Optional Parameters

Optional Scenario Parameters

Optional parameters provide fine-grained control over the scenario’s behavior. These parameters:

  • Allow you to customize the chaos experiment beyond the basic configuration
  • Are entirely optional—scenarios run perfectly fine without setting them
  • Enable advanced testing patterns (custom filters, label selectors, timing controls, etc.)

Examples of optional parameters might include:

  • Label selectors to target specific pods
  • Duration and interval settings
  • Percentage of resources to affect
  • Custom filters or exclusion rules

Global Options

Global Scenario Options

Global options control the behavior of the Krkn framework itself, not the specific scenario. These settings enable integration with observability and monitoring tools:

  • Elasticsearch integration: Send scenario metrics and results to Elasticsearch
  • Prometheus integration: Export chaos metrics to Prometheus
  • Alert collection: Capture and analyze alerts triggered during the chaos experiment
  • Custom dashboards: Configure metrics export for custom monitoring dashboards
  • Cerberus integration: Enable health monitoring during chaos runs

After configuring all parameters, click Run Scenario to launch the chaos experiment.


Monitoring Scenario Runs

Once you launch a scenario, you can monitor its execution in real-time through the Krkn Operator web interface.

Active Scenarios Dashboard

Active Scenario Runs

The home page displays all active scenario runs across all target clusters. Each scenario card shows:

  • Scenario name and type
  • Target cluster(s) where it’s running
  • Current status (running, completed, failed)
  • Start time and duration
  • User who initiated the run

From this dashboard, you can:

  • View all running experiments at a glance
  • Click on a scenario to see detailed execution information
  • Stop or cancel running scenarios (if you have permissions)

Scenario Run Details

Scenario Run Details with Live Logs

Clicking on a running scenario opens the detailed view, which provides:

  • Real-time container logs: Watch the chaos scenario execute with live log streaming
  • Execution timeline: See when the scenario started, its current phase, and expected completion
  • Configuration details: Review the parameters that were used for this run
  • Target information: Verify which cluster(s) the scenario is affecting
  • Status updates: Real-time status changes as the scenario progresses through its phases

The live log streaming is particularly useful for:

  • Debugging scenario failures
  • Understanding what the chaos experiment is currently doing
  • Verifying that the chaos is being injected as expected
  • Capturing evidence for post-experiment analysis

User Permissions and Visibility

Role-Based Access Control: Scenario visibility and management capabilities depend on your user role.

Administrator users can:

  • View all scenario runs from all users
  • Manage any running scenario
  • Cancel experiments initiated by any user

Regular users can:

  • View only their own scenario runs
  • Manage only scenarios they initiated
  • Scenarios started by other users are not visible to them

This role-based access control ensures that teams can work independently while administrators maintain oversight and control of all chaos engineering activities.


What’s Next?

Now that you understand how to run and monitor chaos scenarios with Krkn Operator, you might want to: