Using this type of scenario configuration one is able to delete crucial objects in a specific namespace, or a namespace matching a certain regex string.
This is the multi-page printable view of this section. Click here to print.
Service Disruption Scenarios
1 - Service Disruption Scenarios using Krkn
Configuration Options:
namespace: Specific namespace or regex style namespace of what you want to delete. Gets all namespaces if not specified; set to "" if you want to use the label_selector field.
Set to ‘^.*$’ and label_selector to "" to randomly select any namespace in your cluster.
label_selector: Label on the namespace you want to delete. Set to "" if you are using the namespace variable.
delete_count: Number of namespaces to kill in each run. Based on matching namespace and label specified, default is 1.
runs: Number of runs/iterations to kill namespaces, default is 1.
sleep: Number of seconds to wait between each iteration/count of killing namespaces. Defaults to 10 seconds if not set
Refer to namespace_scenarios_example config file.
scenarios:
- namespace: "^.*$"
runs: 1
- namespace: "^.*ingress.*$"
runs: 1
sleep: 15
Steps
This scenario will select a namespace (or multiple) dependent on the configuration and will kill all of the below object types in that namespace and will wait for them to be Running in the post action
- Services
- Daemonsets
- Statefulsets
- Replicasets
- Deployments
Post Action
We do a post chaos check to wait and verify the specific objects in each namespace are Ready
Here there are two options:
- Pass a custom script in the main config scenario list that will run before the chaos and verify the output matches post chaos scenario.
See scenarios/post_action_namespace.py for an example
- namespace_scenarios:
- - scenarios/regex_namespace.yaml
- scenarios/post_action_namespace.py
- Allow kraken to wait and check all killed objects in the namespaces become ‘Running’ again. Kraken keeps a list of the specific objects in namespaces that were killed to verify all that were affected recover properly.
wait_time: <seconds to wait for namespace to recover>
2 - Service Disruption Scenario using Krkn-Hub
This scenario deletes main objects within a namespace in your Kubernetes/OpenShift cluster. More information can be found here.
Run
If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED
environment variable for the chaos injection container to autoconnect.
$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios
OR
$ docker run -e <VARIABLE>=<value> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
Tip
Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host -v ~kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:<scenario>
Supported parameters
The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:
example:
export <parameter_name>=<value>
See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables
Parameter | Description | Default |
---|---|---|
LABEL_SELECTOR | Label of the namespace to target. Set this parameter only if NAMESPACE is not set | "" |
NAMESPACE | Name of the namespace you want to target. Set this parameter only if LABEL_SELECTOR is not set | “openshift-etcd” |
SLEEP | Number of seconds to wait before polling to see if namespace exists again | 15 |
DELETE_COUNT | Number of namespaces to kill in each run, based on matching namespace and label specified | 1 |
RUNS | Number of runs to execute the action | 1 |
Note
In case of using custom metrics profile or alerts profile whenCAPTURE_METRICS
or ENABLE_ALERTS
is enabled, mount the metrics profile from the host on which the container is run using podman/docker under /home/krkn/kraken/config/metrics-aggregated.yaml
and /home/krkn/kraken/config/alerts
.$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios
Demo
You can find a link to a demo of the scenario here