This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Network Chaos Scenario

Scenario to introduce network latency, packet loss, and bandwidth restriction in the Node’s host network interface. The purpose of this scenario is to observe faults caused by random variations in the network.

1 - Network Chaos Scenario using Krkn

Sample scenario config for egress traffic shaping
network_chaos:                                    # Scenario to create an outage by simulating random variations in the network.
  duration: 300                                   # In seconds - duration network chaos will be applied.
  node_name:                                      # Comma separated node names on which scenario has to be injected.
  label_selector: node-role.kubernetes.io/master  # When node_name is not specified, a node with matching label_selector is selected for running the scenario.
  instance_count: 1                               # Number of nodes in which to execute network chaos.
  interfaces:                                     # List of interface on which to apply the network restriction.
  - "ens5"                                        # Interface name would be the Kernel host network interface name.
  execution: serial|parallel                      # Execute each of the egress options as a single scenario(parallel) or as separate scenario(serial).
  egress:
    latency: 500ms
    loss: 50%                                    # percentage
    bandwidth: 10mbit
Sample scenario config for ingress traffic shaping (using a plugin)
- id: network_chaos
  config:
    node_interface_name:                            # Dictionary with key as node name(s) and value as a list of its interfaces to test
      ip-10-0-128-153.us-west-2.compute.internal:
        - ens5
        - genev_sys_6081
    label_selector: node-role.kubernetes.io/master  # When node_interface_name is not specified, nodes with matching label_selector is selected for node chaos scenario injection
    instance_count: 1                               # Number of nodes to perform action/select that match the label selector
    kubeconfig_path: ~/.kube/config                 # Path to kubernetes config file. If not specified, it defaults to ~/.kube/config
    execution_type: parallel                        # Execute each of the ingress options as a single scenario(parallel) or as separate scenario(serial).
    network_params:
        latency: 500ms
        loss: '50%'
        bandwidth: 10mbit
    wait_duration: 120
    test_duration: 60
'''

Note: For ingress traffic shaping, ensure that your node doesn't have any [IFB](https://wiki.linuxfoundation.org/networking/ifb) interfaces already present. The scenario relies on creating IFBs to do the shaping, and they are deleted at the end of the scenario.


##### Steps
 - Pick the nodes to introduce the network anomaly either from node_name or label_selector.
 - Verify interface list in one of the nodes or use the interface with a default route, as test interface, if no interface is specified by the user.
 - Set traffic shaping config on node's interface using tc and netem.
 - Wait for the duration time.
 - Remove traffic shaping config on node's interface.
 - Remove the job that spawned the pod.

2 - Network Chaos Scenario using Krkn-Hub

This scenario introduces network latency, packet loss, bandwidth restriction in the egress traffic of a Node’s interface using the tc and Netem. For more information refer the following documentation.

Run

If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED environment variable for the chaos injection container to autoconnect.

$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
$ docker run -e <VARIABLE>=<value> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:network-chaos

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

example: export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Egress Scenarios
ParameterDescriptionDefault
DURATIONDuration in seconds - during with network chaos will be applied.300
NODE_NAMENode name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma""
LABEL_SELECTORWhen NODE_NAME is not specified, a node with matching label_selector is selected for running.node-role.kubernetes.io/master
INSTANCE_COUNTTargeted instance count matching the label selector1
INTERFACESList of interface on which to apply the network restriction.[]
EXECUTIONExecute each of the egress option as a single scenario(parallel) or as separate scenario(serial).parallel
EGRESSDictonary of values to set network latency(latency: 50ms), packet loss(loss: 0.02), bandwidth restriction(bandwidth: 100mbit){bandwidth: 100mbit}
Ingress Scenarios
ParameterDescriptionDefault
DURATIONDuration in seconds - during with network chaos will be applied.300
TARGET_NODE_AND_INTERFACE# Dictionary with key as node name(s) and value as a list of its interfaces to test. For example: {ip-10-0-216-2.us-west-2.compute.internal: [ens5]}""
LABEL_SELECTORWhen NODE_NAME is not specified, a node with matching label_selector is selected for running.node-role.kubernetes.io/master
INSTANCE_COUNTTargeted instance count matching the label selector1
EXECUTIONUsed to specify whether you want to apply filters on interfaces one at a time or all at once.parallel
NETWORK_PARAMSlatency, loss and bandwidth are the three supported network parameters to alter for the chaos test. For example: {latency: 50ms, loss: ‘0.02’}""
WAIT_DURATIONEnsure that it is at least about twice of test_duration300

For example:

$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:container-scenarios