Network Chaos Scenario

1: Network Chaos Scenario using Krkn
2: Network Chaos Scenarios using Krknctl
3: Network Chaos Scenario using Krkn-Hub

Scenario to introduce network latency, packet loss, and bandwidth restriction in the Node’s host network interface. The purpose of this scenario is to observe faults caused by random variations in the network.

1 - Network Chaos Scenario using Krkn

Sample scenario config for egress traffic shaping

network_chaos:                                    # Scenario to create an outage by simulating random variations in the network.
  duration: 300                                   # In seconds - duration network chaos will be applied.
  node_name:                                      # Comma separated node names on which scenario has to be injected.
  label_selector: node-role.kubernetes.io/master  # When node_name is not specified, a node with matching label_selector is selected for running the scenario.
  instance_count: 1                               # Number of nodes in which to execute network chaos.
  interfaces:                                     # List of interface on which to apply the network restriction.
  - "ens5"                                        # Interface name would be the Kernel host network interface name.
  execution: serial|parallel                      # Execute each of the egress options as a single scenario(parallel) or as separate scenario(serial).
  egress:
    latency: 500ms
    loss: 50%                                    # percentage
    bandwidth: 10mbit

Sample scenario config for ingress traffic shaping (using a plugin)

- id: network_chaos
  config:
    node_interface_name:                            # Dictionary with key as node name(s) and value as a list of its interfaces to test
      ip-10-0-128-153.us-west-2.compute.internal:
        - ens5
        - genev_sys_6081
    label_selector: node-role.kubernetes.io/master  # When node_interface_name is not specified, nodes with matching label_selector is selected for node chaos scenario injection
    instance_count: 1                               # Number of nodes to perform action/select that match the label selector
    kubeconfig_path: ~/.kube/config                 # Path to kubernetes config file. If not specified, it defaults to ~/.kube/config
    execution_type: parallel                        # Execute each of the ingress options as a single scenario(parallel) or as separate scenario(serial).
    network_params:
        latency: 500ms
        loss: '50%'
        bandwidth: 10mbit
    wait_duration: 120
    test_duration: 60

Note: For ingress traffic shaping, ensure that your node doesn’t have any IFB interfaces already present. The scenario relies on creating IFBs to do the shaping, and they are deleted at the end of the scenario.

Steps

Pick the nodes to introduce the network anomaly either from node_name or label_selector.
Verify interface list in one of the nodes or use the interface with a default route, as test interface, if no interface is specified by the user.
Set traffic shaping config on node’s interface using tc and netem.
Wait for the duration time.
Remove traffic shaping config on node’s interface.
Remove the job that spawned the pod.

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    .. 
    chaos_scenarios:
        - network_chaos_scenarios:
            - scenarios/<scenario_name>.yaml

2 - Network Chaos Scenarios using Krknctl

krknctl run network-chaos (optional: --<parameter>:<value> )

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Default
--traffic-type	Selects the network chaos scenario type can be ingress or egress	enum	ingress
--duration	Duration in seconds - during with network chaos will be applied.	number	300
--label-selector	When NODE_NAME is not specified, a node with matching label_selector is selected for running.	string	node-role.kubernetes.io/master
--execution parallel	serial: Execute each of the egress option as a single scenario(parallel) or as separate scenario(serial).	enum	parallel
--node-name	Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma	string
--interfaces	List of interface on which to apply the network restriction. eg.	[eth0,eth1,eth2]	string
--egress	Dictonary of values to set network latency(latency: 50ms), packet loss(loss: 0.02), bandwidth restriction(bandwidth: 100mbit) eg. {bandwidth: 100mbit}	string	“{bandwidth: 100mbit}”
--target-node-interface	Dictionary with key as node name(s) and value as a list of its interfaces to test. For example: {ip-10-0-216-2.us-west-2.compute.internal: ens5]}	string
--network-params	latency, loss and bandwidth are the three supported network parameters to alter for the chaos test. For example: {latency: 50ms, loss: 0.02}	string
--wait-duration	Ensure that it is at least about twice of test_duration	number	300

To see all available scenario options

krknctl run network-chaos --help

3 - Network Chaos Scenario using Krkn-Hub

This scenario introduces network latency, packet loss, bandwidth restriction in the egress traffic of a Node’s interface using the tc and Netem. For more information refer the following documentation.

Run

If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED environment variable for the chaos injection container to autoconnect.

$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

–env-host: This option is not available with the remote Podman client, including Mac and Windows (excluding WSL2) machines. Without the –env-host option you’ll have to set each enviornment variable on the podman command line like -e <VARIABLE>=<value>

$ docker run -e <VARIABLE>=<value> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:network-chaos

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

Note

export TRAFFIC_TYPE=egress for Egress scenarios and export TRAFFIC_TYPE=ingress for Ingress scenarios

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Egress Scenarios

Parameter	Description	Default
DURATION	Duration in seconds - during with network chaos will be applied.	300
NODE_NAME	Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma	""
LABEL_SELECTOR	When NODE_NAME is not specified, a node with matching label_selector is selected for running.	node-role.kubernetes.io/master
INSTANCE_COUNT	Targeted instance count matching the label selector	1
INTERFACES	List of interface on which to apply the network restriction.	[]
EXECUTION	Execute each of the egress option as a single scenario(parallel) or as separate scenario(serial).	parallel
EGRESS	Dictonary of values to set network latency(latency: 50ms), packet loss(loss: 0.02), bandwidth restriction(bandwidth: 100mbit)	{bandwidth: 100mbit}

Ingress Scenarios

Parameter	Description	Default
DURATION	Duration in seconds - during with network chaos will be applied.	300
TARGET_NODE_AND_INTERFACE	# Dictionary with key as node name(s) and value as a list of its interfaces to test. For example: {ip-10-0-216-2.us-west-2.compute.internal: [ens5]}	""
LABEL_SELECTOR	When NODE_NAME is not specified, a node with matching label_selector is selected for running.	node-role.kubernetes.io/master
INSTANCE_COUNT	Targeted instance count matching the label selector	1
EXECUTION	Used to specify whether you want to apply filters on interfaces one at a time or all at once.	parallel
NETWORK_PARAMS	latency, loss and bandwidth are the three supported network parameters to alter for the chaos test. For example: {latency: 50ms, loss: ‘0.02’}	""
WAIT_DURATION	Ensure that it is at least about twice of test_duration	300

Note

In case of using custom metrics profile or alerts profile when CAPTURE_METRICS or ENABLE_ALERTS is enabled, mount the metrics profile from the host on which the container is run using podman/docker under /home/krkn/kraken/config/metrics-aggregated.yaml and /home/krkn/kraken/config/alerts.

For example:

$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:network-chaos