Getting Started
Categories:
Getting Started with Krkn-AI
This documentation details how to deploy a sample microservice application on Kubernetes Cluster and run Krkn-AI test.
Prerequisites
- Follow this guide to install Krkn-AI CLI.
- Krkn-AI uses Thanos Querier to fetch SLO metrics by PromQL. You can easily install it by setting up prometheus-operator in your cluster.
Deploy Sample Microservice
For demonstration purpose, we will deploy a sample microservice called robot-shop on the cluster:
# Change to Krkn-AI project directory
cd krkn-ai/
# Namespace where to deploy the microservice application
export DEMO_NAMESPACE=robot-shop
# Whether the K8s cluster is an OpenShift cluster
export IS_OPENSHIFT=true
./scripts/setup-demo-microservice.sh
# Set context to the demo namespace
oc config set-context --current --namespace=$DEMO_NAMESPACE
# If you are using kubectl:
# kubectl config set-context --current --namespace=$DEMO_NAMESPACE
# Check whether pods are running
oc get pods
We will deploy a NGINX reverse proxy and a LoadBalancer service in the cluster to expose the routes for some of the pods.
# Setup NGINX reverse proxy for external access
./scripts/setup-nginx.sh
# Check nginx pod
oc get pods -l app=nginx-proxy
# Test application endpoints
./scripts/test-nginx-routes.sh
export HOST="http://$(kubectl get service rs -o json | jq -r '.status.loadBalancer.ingress[0].hostname')"
Note
If your cluster uses Ingress or custom annotation to expose the services, make sure to follow those steps.π Generate Configuration
Krkn-AI uses YAML configuration files to define experiments. You can generate a sample config file dynamically by running Krkn-AI discover command.
$ uv run krkn_ai discover --help
Usage: krkn_ai discover [OPTIONS]
Discover components for Krkn-AI tests
Options:
-k, --kubeconfig TEXT Path to cluster kubeconfig file.
-o, --output TEXT Path to save config file. [default: ./krkn-ai.yaml]
-n, --namespace TEXT Namespace(s) to discover components in. Supports
Regex and comma separated values. [default: .*]
-pl, --pod-label TEXT Pod Label Keys(s) to filter. Supports Regex and
comma separated values. [default: .*]
-nl, --node-label TEXT Node Label Keys(s) to filter. Supports Regex and
comma separated values. [default: .*]
-v, --verbose Increase verbosity of output. [default: 0]
--help Show this message and exit.
# Discover components in cluster to generate the config
$ uv run krkn_ai discover -k ./path/to/kubeconfig.yaml -n "robot-shop" -pl "service" -o ./krkn-ai.yaml
Discover command generates a yaml
file as an output that contains the initial boilerplate for testing. You can modify this file to include custom SLO definitions, cluster components and configure algorithm settings as per your testing use-case.
# Path to your kubeconfig file
kubeconfig_file_path: "./path/to/kubeconfig.yaml"
# Genetic algorithm parameters
generations: 5
population_size: 10
composition_rate: 0.3
population_injection_rate: 0.1
# Fitness function configuration for defining SLO
# In the below example, we use Total Restarts in "robot-shop" namespace as the SLO
fitness_function:
query: 'sum(kube_pod_container_status_restarts_total{namespace="robot-shop"})'
type: point
# Whether to include non-zero exit code status in the fitness function scoring
include_krkn_failure: true
# Health endpoints for synthetic monitoring of applications
health_checks:
stop_watcher_on_failure: false
applications:
- name: cart
url: "$HOST/cart/add/1/Watson/1"
- name: catalogue
url: "$HOST/catalogue/categories"
# Chaos scenarios to consider during testing
scenario:
pod-scenarios:
enable: true
application-outages:
enable: true
container-scenarios:
enable: false
node-cpu-hog:
enable: false
node-memory-hog:
enable: false
# Cluster components to consider for Krkn-AI testing
cluster_components:
namespaces:
- name: robot-shop
pods:
- containers:
- name: cart
labels:
service: cart
name: cart-7cd6c77dbf-j4gsv
- containers:
- name: catalogue
labels:
service: catalogue
name: catalogue-94df6b9b-pjgsr
nodes:
- labels:
kubernetes.io/hostname: node-1
name: node-1
- labels:
kubernetes.io/hostname: node-2
name: node-2
Running Krkn-AI
Once your test configuration is set, you can start Krkn-AI testing using the run
command. This command initializes a random population sample containing Chaos Experiments based on the Krkn-AI configuration, then starts the evolutionary algorithm to run the experiments, gather feedback, and continue evolving existing scenarios until the total number of generations defined in the config is met.
$ uv run krkn_ai run --help
Usage: krkn_ai run [OPTIONS]
Run Krkn-AI tests
Options:
-c, --config TEXT Path to Krkn-AI config file.
-o, --output TEXT Directory to save results.
-f, --format [json|yaml] Format of the output file. [default: yaml]
-r, --runner-type [krknctl|krknhub] Type of chaos engine to use.
-p, --param TEXT Additional parameters for config file in key=value format.
-v, --verbose Increase verbosity of output. [default: 0]
--help Show this message and exit.
# Configure Prometheus
# (Optional) In OpenShift cluster, the framework will automatically look for thanos querier in openshift-monitoring namespace.
export PROMETHEUS_URL='https://Thanos-Querier-url'
export PROMETHEUS_TOKEN='enter-access-token'
# Start Krkn-AI test
uv run krkn_ai run -vv -c ./krkn-ai.yaml -o ./tmp/results/ -p HOST=$HOST
Understanding the Results
In the ./tmp/results
directory, you will find the results from testing. The final results contain information about each scenario, their fitness evaluation scores, reports, and graphs, which you can use to further investigate.
.
βββ results/
βββ reports/
β βββ best_scenarios.yaml
β βββ health_check_report.csv
β βββ graphs/
β βββ best_generation.png
β βββ scenario_1.png
β βββ scenario_2.png
β βββ ...
βββ yaml/
β βββ generation_0/
β β βββ scenario_1.yaml
β β βββ scenario_2.yaml
β β βββ ...
β βββ generation_1/
β βββ ...
βββ log/
β βββ scenario_1.log
β βββ scenario_2.log
β βββ ...
βββ krkn-ai.yaml
Reports Directory:
health_check_report.csv
: Summary of application health checks containing details about the scenario, component, failure status and latency.best_scenarios.yaml
: YAML file containing information about best scenario identified in each generation.best_generation.png
: Visualization of best fitness score found in each generation.scenario_<ids>.png
: Visualization of response time line plot for health checks and heatmap for success and failures.
YAML:
scenario_<id>.yaml
: YAML file detailing about the Chaos scenario executed which includes the krknctl command, fitness scores, health check metrices, etc. These files are organised under eachgeneration
folder.
Log:
scenario_<id>.log
: Logs captured from krknctl scenario.