This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Service Disruption Scenarios

Using this type of scenario configuration one is able to delete crucial objects in a specific namespace, or a namespace matching a certain regex string.

1 - Service Disruption Scenarios using Krkn

Configuration Options:

namespace: Specific namespace or regex style namespace of what you want to delete. Gets all namespaces if not specified; set to "" if you want to use the label_selector field.

Set to ‘^.*$’ and label_selector to "" to randomly select any namespace in your cluster.

label_selector: Label on the namespace you want to delete. Set to "" if you are using the namespace variable.

delete_count: Number of namespaces to kill in each run. Based on matching namespace and label specified, default is 1.

runs: Number of runs/iterations to kill namespaces, default is 1.

sleep: Number of seconds to wait between each iteration/count of killing namespaces. Defaults to 10 seconds if not set

Refer to namespace_scenarios_example config file.

scenarios:
- namespace: "^.*$"
  runs: 1
- namespace: "^.*ingress.*$"
  runs: 1
  sleep: 15

Steps

This scenario will select a namespace (or multiple) dependent on the configuration and will kill all of the below object types in that namespace and will wait for them to be Running in the post action

  1. Services
  2. Daemonsets
  3. Statefulsets
  4. Replicasets
  5. Deployments

Post Action

We do a post chaos check to wait and verify the specific objects in each namespace are Ready

Here there are two options:

  1. Pass a custom script in the main config scenario list that will run before the chaos and verify the output matches post chaos scenario.

See scenarios/post_action_namespace.py for an example

-   namespace_scenarios:
     - -    scenarios/regex_namespace.yaml
       -    scenarios/post_action_namespace.py
  1. Allow kraken to wait and check all killed objects in the namespaces become ‘Running’ again. Kraken keeps a list of the specific objects in namespaces that were killed to verify all that were affected recover properly.
wait_time: <seconds to wait for namespace to recover>

2 - Service Disruption Scenario using Krkn-Hub

This scenario deletes main objects within a namespace in your Kubernetes/OpenShift cluster. More information can be found here.

Run

If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED environment variable for the chaos injection container to autoconnect.

$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios
OR 
$ docker run -e <VARIABLE>=<value> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

example: export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

ParameterDescriptionDefault
LABEL_SELECTORLabel of the namespace to target. Set this parameter only if NAMESPACE is not set""
NAMESPACEName of the namespace you want to target. Set this parameter only if LABEL_SELECTOR is not set“openshift-etcd”
SLEEPNumber of seconds to wait before polling to see if namespace exists again15
DELETE_COUNTNumber of namespaces to kill in each run, based on matching namespace and label specified1
RUNSNumber of runs to execute the action1

For example:

$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:service-disruption-scenarios

Demo

You can find a link to a demo of the scenario here