This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Developers Guide

Developers Guide Overview

This document describes how to develop and add to Krkn. Before you start, it is recommended that you read the following documents first:

  1. Krkn Main README
  2. List of all Supported Scenarios

Be sure to properly install Krkn. Then you can start to develop krkn. The following documents will help you get started:

  1. Add k8s functionality to krkn-lib
  2. Add a New Chaos Scenario using Plugin API: Adding a new scenario into krkn
  3. Test your changes

NOTE: All base kubernetes functionality should be added into krkn-lib and called from krkn

Once a scenario gets added to krkn, changes will be need in krkn-hub and krknctl as well. See steps below on help to edit krkn-hub and krknctl

Questions?

For any questions or further guidance, feel free to reach out to us on the Kubernetes workspace in the #krkn channel. We’re happy to assist. Now, release the Krkn!

Follow Contribution Guide

Once all you’re happy with your changes, follow the contribution guide on how to create your own branch and squash your commits

1 - Krkn-lib

Krkn-lib contains the base kubernetes python functions

PyPI

krkn-lib

Krkn Chaos and resiliency testing tool Foundation Library

Contents

The Library contains Classes, Models and helper functions used in Kraken to interact with Kubernetes, Openshift and other external APIS. The goal of this library is to give to developers the building blocks to realize new Chaos Scenarios and to increase the testability and the modularity of the Krkn codebase.

Packages

The library is subdivided in several Packages under src/krkn_lib

  • ocp: Openshift Integration
  • k8s: Kubernetes Integration
  • elastic: Collection of ElasticSearch functions for posting telemetry
  • prometheus: Collection of prometheus functions for collecting metrics and alerts
  • telemetry:
    • k8s: Kubernetes Telemetry collection and distribution
    • ocp: Openshift Telemetry collection and distribution
  • models: Krkn shared data models
    • k8s: Kubernetes objects model
    • krkn: Krkn base models
    • telemetry: Telemetry collection model
    • elastic: Elastic model for data
  • utils: common functions

Documentation and Available Functions

The Library documentation of available functions is here. The documentation is automatically generated by Sphinx on top of the reStructuredText Docstring Format comments present in the code.

Installation

Git

Clone the repository

git clone https://github.com/krkn-chaos/krkn-lib
cd krkn-lib

Install the dependencies

Krkn lib uses poetry for its dependency management and packaging. To install the proper packages please use:

$ pip install poetry
$ poetry install --no-interaction

Testing your changes

To see how you can configure and test your changes see testing changes

2 - Adding scenarios via plugin api

Scenario Plugin API:

This API enables seamless integration of Scenario Plugins for Krkn. Plugins are automatically detected and loaded by the plugin loader, provided they extend the AbstractPluginScenario abstract class, implement the required methods, and adhere to the specified naming conventions.

Plugin folder:

The plugin loader automatically loads plugins found in the krkn/scenario_plugins directory, relative to the Krkn root folder. Each plugin must reside in its own directory and can consist of one or more Python files. The entry point for each plugin is a Python class that extends the AbstractPluginScenario abstract class and implements its required methods.

__init__ file

For the plugin to be properly found by the plugin api, there needs to be a init file in the base folder

For example: init.py

AbstractPluginScenario abstract class:

This abstract class defines the contract between the plugin and krkn. It consists of two methods:

  • run(...)
  • get_scenario_type()

Most IDEs can automatically suggest and implement the abstract methods defined in AbstractPluginScenario: pycharm (IntelliJ PyCharm)

run(...)

    def run(
        self,
        run_uuid: str,
        scenario: str,
        krkn_config: dict[str, any],
        lib_telemetry: KrknTelemetryOpenshift,
        scenario_telemetry: ScenarioTelemetry,
    ) -> int:

This method represents the entry point of the plugin and the first method that will be executed.

Parameters:

  • run_uuid:
    • the uuid of the chaos run generated by krkn for every single run.
  • scenario:
    • the config file of the scenario that is currently executed
  • krkn_config:
    • the full dictionary representation of the config.yaml
  • lib_telemetry
    • it is a composite object of all the krkn-lib objects and methods needed by a krkn plugin to run.
  • scenario_telemetry
    • the ScenarioTelemetry object of the scenario that is currently executed

Return value:

Returns 0 if the scenario succeeds and 1 if it fails.

get_scenario_types():

python def get_scenario_types(self) -> list[str]:

Indicates the scenario types specified in the config.yaml. For the plugin to be properly loaded, recognized and executed, it must be implemented and must return one or more strings matching scenario_type strings set in the config.

Naming conventions:

A key requirement for developing a plugin that will be properly loaded by the plugin loader is following the established naming conventions. These conventions are enforced to maintain a uniform and readable codebase, making it easier to onboard new developers from the community.

plugin folder:

  • the plugin folder must be placed in the krkn/scenario_plugin folder starting from the krkn root folder
  • the plugin folder cannot contain the words
    • plugin
    • scenario

plugin file name and class name:

  • the plugin file containing the main plugin class must be named in snake case and must have the suffix _scenario_plugin:
    • example_scenario_plugin.py
  • the main plugin class must named in capital camel case and must have the suffix ScenarioPlugin :
    • ExampleScenarioPlugin
  • the file name must match the class name in the respective syntax:
    • example_scenario_plugin.py -> ExampleScenarioPlugin

scenario type:

  • the scenario type must be unique between all the scenarios.

logging:

If your new scenario does not adhere to the naming conventions, an error log will be generated in the Krkn standard output, providing details about the issue:

2024-10-03 18:06:31,136 [INFO] 📣 `ScenarioPluginFactory`: types from config.yaml mapped to respective classes for execution:
2024-10-03 18:06:31,136 [INFO]   ✅ type: application_outages_scenarios ➡️ `ApplicationOutageScenarioPlugin` 
2024-10-03 18:06:31,136 [INFO]   ✅ types: [hog_scenarios, arcaflow_scenario] ➡️ `ArcaflowScenarioPlugin` 
2024-10-03 18:06:31,136 [INFO]   ✅ type: container_scenarios ➡️ `ContainerScenarioPlugin` 
2024-10-03 18:06:31,136 [INFO]   ✅ type: managedcluster_scenarios ➡️ `ManagedClusterScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ types: [pod_disruption_scenarios, pod_network_scenario, vmware_node_scenarios, ibmcloud_node_scenarios] ➡️ `NativeScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: network_chaos_scenarios ➡️ `NetworkChaosScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: node_scenarios ➡️ `NodeActionsScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: pvc_scenarios ➡️ `PvcScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: service_disruption_scenarios ➡️ `ServiceDisruptionScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: service_hijacking_scenarios ➡️ `ServiceHijackingScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: cluster_shut_down_scenarios ➡️ `ShutDownScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: syn_flood_scenarios ➡️ `SynFloodScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: time_scenarios ➡️ `TimeActionsScenarioPlugin` 
2024-10-03 18:06:31,137 [INFO]   ✅ type: zone_outages_scenarios ➡️ `ZoneOutageScenarioPlugin`

2024-09-18 14:48:41,735 [INFO] Failed to load Scenario Plugins:

2024-09-18 14:48:41,735 [ERROR] ⛔ Class: ExamplePluginScenario Module: krkn.scenario_plugins.example.example_scenario_plugin
2024-09-18 14:48:41,735 [ERROR] ⚠️ scenario plugin class name must start with a capital letter, end with `ScenarioPlugin`, and cannot be just `ScenarioPlugin`.

ExampleScenarioPlugin

The ExampleScenarioPlugin class included in the tests folder can be used as a scaffolding for new plugins and it is considered part of the documentation.

Adding CI tests

Depending on the complexity of the new scneario, it would be much appreciated if a CI test of the scenario would be added to our github action that gets run on each PR.
To add a test:

3 - Adding New Scenario to Krkn-hub

Adding/Editing a New Scenario to Krkn-hub

  1. Create folder with scenario name under krkn-hub

  2. Create generic scenario template with enviornment variables

    a. See scenario.yaml for example

    b. Almost all parameters should be set using a variable (these will be set in the env.sh file or through the command line environment variables)

  3. Add defaults for any environment variables in an “env.sh” file

    a. See env.sh for example

  4. Create script to run.sh chaos scenario a. See run.sh for example

    b. edit line 16 with your scenario yaml template

    c. edit line 17 and 23 with your yaml config location

  5. Create Dockerfile template

    a. See dockerfile template for example

    b. Lines to edit

     i. 12: replace "application-outages" with your folder name
    
     ii. 14: replace "application-outages" with your folder name
    
     iii. 17: replace "application-outages" with your scenario name
    
     iv. 18: replace description with a description of your new scenario
    
  6. Add service/scenario to docker-compose.yaml file following syntax of other services

  7. Point the dockerfile parameter in your docker-compose to the Dockerfile file in your new folder

  8. Add the folder name to the list of scenarios in build.sh

  9. Update the krkn website and main README with new scenario type

NOTE:

  1. If you added any main configuration variables or new sections be sure to update config.yaml.template
  2. Similar to above, also add the default parameter values to env.sh

4 - Adding New Scenario to Krknctl

Adding Scenario to Krknctl

Adding a New Scenario to Krknctl

For krknctl to find the parameters of the scenario it uses a krknctl input json file. Once this file is added to krkn-hub, krknctl will be able to find it along with the details of how to run the scenario.

Add KrknCtl Input Json

This file adds every enviornment variable that is set up for krkn-hub to be defined as a flag to the krknctl cli command. There are a number of different type of variables that you can use, each with their own required fields. See below for an example of the different variable types

An exmaple krknctl-input.json file can be found here

Enum Type Required Key/Values

{
    "name": "<name>",
    "short_description":"<short-description>",
    "description":"<longer-description>",
    "variable":"<variable_name>", //this needs to match enviornment variable in krkn-hub
    "type": "enum",
    "allowed_values": "<value>,<value>",
    "separator": ",",
    "default":"", // any default value
    "required":"<true_or_false>" // true or false if required to set when running
}

String Type Required Key/Values

{
    "name": "<name>",
    "short_description":"<short-description>",
    "description":"<longer-description>",
    "variable":"<variable_name>", //this needs to match enviornment variable in krkn-hub
    "type": "string",
    "default": "", // any default value
    "required":"<true_or_false>" // true or false if required to set when running
}

Number Type Required Key/Values

{
    "name": "<name>",
    "short_description": "<short-description>",
    "description": "<longer-description>",
    "variable": "<variable_name>", //this needs to match enviornment variable in krkn-hub
    "type": "number",  // options: string, number, file, file64
    "default": "", // any default value
    "required": "<true_or_false>" // true or false if required to set when running
}

File Type Required Key/Values

{
    "name": "<name>",
    "short_description":"<short-description>",
    "description":"<longer-description>",
    "variable":"<variable_name>", //this needs to match enviornment variable in krkn-hub
    "type":"file",  
    "mount_path": "/home/krkn/<file_loc>", // file location to mount to, using /home/krkn as the base has correct read/write locations
    "required":"<true_or_false>" // true or false if required to set when running
}

File Base 64 Type Required Key/Values

{
    "name": "<name>",
    "short_description":"<short-description>",
    "description":"<longer-description>",
    "variable":"<variable_name>", //this needs to match enviornment variable in krkn-hub
    "type":"file_base64",  
    "required":"<true_or_false>" // true or false if required to set when running
}

5 - Testing your changes

This page gives details about how you can get a kind cluster configured to be able to run on krkn-lib (the lowest level of krkn-chaos repos) up through krknctl (our easiest way to run and highest level repo)

Configure Kind Testing Enviornment

  1. Install kind

  2. Create cluster using kind-config.yml under krkn-lib base folder

kind create cluster --wait 300s --config=kind-config.yml

Install Elasticsearch and Prometheus

To be able to run the full test suite of tests you need to have elasticsearch and promethues properly configured on the cluster

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add stable https://charts.helm.sh/stable
helm repo update

Prometheus

Deploy prometheus on your cluster

kubectl create namespace monitoring
helm install \
--wait --timeout 360s \
kind-prometheus \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set prometheus.service.nodePort=30000 \
--set prometheus.service.type=NodePort \
--set grafana.service.nodePort=31000 \
--set grafana.service.type=NodePort \
--set alertmanager.service.nodePort=32000 \
--set alertmanager.service.type=NodePort \
--set prometheus-node-exporter.service.nodePort=32001 \
--set prometheus-node-exporter.service.type=NodePort \
--set prometheus.prometheusSpec.maximumStartupDurationSeconds=300

ElasticSearch

Set enviornment variables of elasticsearch variables

export ELASTIC_URL="https://localhost"
export ELASTIC_PORT="9091"
export ELASTIC_USER="elastic"
export ELASTIC_PASSWORD="test"

Deploy elasticsearch on your cluster

helm install \
--wait --timeout 360s \
elasticsearch \
oci://registry-1.docker.io/bitnamicharts/elasticsearch \
--set master.masterOnly=false \
--set master.replicaCount=1 \
--set data.replicaCount=0 \
--set coordinating.replicaCount=0 \
--set ingest.replicaCount=0 \
--set service.type=NodePort \
--set service.nodePorts.restAPI=32766 \
--set security.elasticPassword=test \
--set security.enabled=true \
--set image.tag=7.17.23-debian-12-r0 \
--set security.tls.autoGenerated=true

Testing Changes in Krkn-lib

To be able to run all the tests in the krkn-lib suite, you’ll need to have prometheus and elastic properly configured. See above steps for details

Install poetry

Using a virtual enviornment install poetry and install krkn-lib requirmenets

$ pip install poetry
$ poetry install --no-interaction

Run tests

poetry run python3 -m coverage run -a -m unittest discover -v src/krkn_lib/tests/

Adding tests

Be sure that if you are adding any new functions or functionality you are adding unit tests for it. We want to keep above an 80% coverage in this repo since its our base functionality

Testing Changes in Krkn

Configuring test Cluster

After creating a kind cluster with the steps above, create these test pods on your cluster

kubectl apply -f CI/templates/outage_pod.yaml
kubectl wait --for=condition=ready pod -l scenario=outage --timeout=300s
kubectl apply -f CI/templates/container_scenario_pod.yaml
kubectl wait --for=condition=ready pod -l scenario=container --timeout=300s
kubectl create namespace namespace-scenario
kubectl apply -f CI/templates/time_pod.yaml
kubectl wait --for=condition=ready pod -l scenario=time-skew --timeout=300s
kubectl apply -f CI/templates/service_hijacking.yaml
kubectl wait --for=condition=ready pod -l "app.kubernetes.io/name=proxy" --timeout=300s

Install Requirements

$ python3.9 -m venv chaos
$ source chaos/bin/activate
$ pip install -r requirements.txt

Run Tests

  1. Add prometheus configuration variables to the test config file
yq -i '.kraken.port="8081"' CI/config/common_test_config.yaml
yq -i '.kraken.signal_address="0.0.0.0"' CI/config/common_test_config.yaml
yq -i '.kraken.performance_monitoring="localhost:9090"' CI/config/common_test_config.yaml
  1. Add tests to the list of functional tests to run
echo "test_service_hijacking" > ./CI/tests/functional_tests
echo "test_app_outages" >> ./CI/tests/functional_tests
echo "test_container"      >> ./CI/tests/functional_tests
echo "test_pod" >> ./CI/tests/functional_tests
echo "test_namespace"      >> ./CI/tests/functional_tests
echo "test_net_chaos"      >> ./CI/tests/functional_tests
echo "test_time"           >> ./CI/tests/functional_tests
echo "test_cpu_hog" >> ./CI/tests/functional_tests
echo "test_memory_hog" >> ./CI/tests/functional_tests
echo "test_io_hog" >> ./CI/tests/functional_tests
  1. Run tests
./CI/run.sh

Results can be seen in ./CI/results.markdown

Adding Tests

Be sure that if you are adding any new scenario you are adding tests for it based on a 1 node kind cluster.

The tests live here

Testing Changes for Krkn-hub

Install Podman/Docker Compose

You can use either podman-compose or docker-compose for this step

NOTE: Podman might not work on Mac’s

pip3 install docker-compose

OR

To get latest podman-compose features we need, use this installation command

pip3 install https://github.com/containers/podman-compose/archive/devel.tar.gz

Build Your Changes

  1. Run build.sh to create Dockerfile’s for each scenario
  2. Edit the docker-compose.yaml file to point to your quay.io repository (optional; required if you want to push or are testing krknctl)
ex.) 
image: containers.krkn-chaos.dev/krkn-chaos/krkn-hub:chaos-recommender 

change to >

image: quay.io/<user>/krkn-hub:chaos-recommender
  1. Build your image(s) from base krkn-hub directory

    Builds all images in docker-compose file

    docker-compose build
    

    Builds single image defined by service/scenario name

    docker-compose build <scenario_type>
    

    OR

    Builds all images in podman-compose file

    podman-compose build
    

    Builds single image defined by service/scenario name

    podman-compose build <scenario_type>
    

Push Images to your quay.io

Push all Images using docker-compose

docker-compose push

Push a single image using docker-compose

docker-compose push <scenario_type>

OR

Single Image (have to go one by one to push images through podman)

podman-compose push <scenario_type>

OR

podman push quay.io/<username>/krkn-hub:<scenario_type>

Run your new scenario

docker run -d -v <kube_config_path>:/root/.kube/config:Z quay.io/<username>/krkn-hub:<scenario_type>

OR

podman run -d -v <kube_config_path>:/root/.kube/config:Z quay.io/<username>/krkn-hub:<scenario_type>

See krkn-hub documentation for each scenario to see all possible variables to use

Testing Changes in Krknctl

Once you’ve created a krknctl-input.json file using the steps here, you’ll want to test those changes using the below steps. You will need a either podman or docker installed as well as a quay account.

Build and Push to personal Quay

First you will build your changes of krkn-hub and push changes to your own quay repository for testing

Run Krknctl with Personal Image

Once you have your images in quay, you are all set to configure krknctl to look for these new images. You’ll edit the config file of krknctl found here and edit the quay_org to be set to your quay username

With these updates to your config, you’ll build your personal krknctl binary and you’l be all set to start testing your new scenario and config options.

If any krknctl code changes are required, you’ll have to make changes and rebuild the the krknctl binary each time to test as well