Scenario to introduce network latency, packet loss, and bandwidth restriction in the Node’s host network interface. The purpose of this scenario is to observe faults caused by random variations in the network.
1 - Network Chaos Scenario using Krkn
Sample scenario config for egress traffic shaping
network_chaos: # Scenario to create an outage by simulating random variations in the network.
duration: 300 # In seconds - duration network chaos will be applied.
node_name: # Comma separated node names on which scenario has to be injected.
label_selector: node-role.kubernetes.io/master # When node_name is not specified, a node with matching label_selector is selected for running the scenario.
instance_count: 1 # Number of nodes in which to execute network chaos.
interfaces: # List of interface on which to apply the network restriction.
- "ens5" # Interface name would be the Kernel host network interface name.
execution: serial|parallel # Execute each of the egress options as a single scenario(parallel) or as separate scenario(serial).
egress:
latency: 500ms
loss: 50% # percentage
bandwidth: 10mbit
Sample scenario config for ingress traffic shaping (using a plugin)
- id: network_chaos
config:
node_interface_name: # Dictionary with key as node name(s) and value as a list of its interfaces to test
ip-10-0-128-153.us-west-2.compute.internal:
- ens5
- genev_sys_6081
label_selector: node-role.kubernetes.io/master # When node_interface_name is not specified, nodes with matching label_selector is selected for node chaos scenario injection
instance_count: 1 # Number of nodes to perform action/select that match the label selector
kubeconfig_path: ~/.kube/config # Path to kubernetes config file. If not specified, it defaults to ~/.kube/config
execution_type: parallel # Execute each of the ingress options as a single scenario(parallel) or as separate scenario(serial).
network_params:
latency: 500ms
loss: '50%'
bandwidth: 10mbit
wait_duration: 120
test_duration: 60
'''
Note: For ingress traffic shaping, ensure that your node doesn't have any [IFB](https://wiki.linuxfoundation.org/networking/ifb) interfaces already present. The scenario relies on creating IFBs to do the shaping, and they are deleted at the end of the scenario.
##### Steps
- Pick the nodes to introduce the network anomaly either from node_name or label_selector.
- Verify interface list in one of the nodes or use the interface with a default route, as test interface, if no interface is specified by the user.
- Set traffic shaping config on node's interface using tc and netem.
- Wait for the duration time.
- Remove traffic shaping config on node's interface.
- Remove the job that spawned the pod.
2 - Network Chaos Scenario using Krkn-Hub
This scenario introduces network latency, packet loss, bandwidth restriction in the egress traffic of a Node’s interface using the tc and Netem. For more information refer the following documentation.
Run
If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED
environment variable for the chaos injection container to autoconnect.
$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
$ docker run -e <VARIABLE>=<value> --net=host -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:network-chaos
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
Tip
Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host -v ~kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:<scenario>
Supported parameters
The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:
example:
export <parameter_name>=<value>
Note
export TRAFFIC_TYPE=egress
for Egress scenarios and export TRAFFIC_TYPE=ingress
for Ingress scenariosSee list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables
Egress Scenarios
Parameter | Description | Default |
---|---|---|
DURATION | Duration in seconds - during with network chaos will be applied. | 300 |
NODE_NAME | Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma | "" |
LABEL_SELECTOR | When NODE_NAME is not specified, a node with matching label_selector is selected for running. | node-role.kubernetes.io/master |
INSTANCE_COUNT | Targeted instance count matching the label selector | 1 |
INTERFACES | List of interface on which to apply the network restriction. | [] |
EXECUTION | Execute each of the egress option as a single scenario(parallel) or as separate scenario(serial). | parallel |
EGRESS | Dictonary of values to set network latency(latency: 50ms), packet loss(loss: 0.02), bandwidth restriction(bandwidth: 100mbit) | {bandwidth: 100mbit} |
Ingress Scenarios
Parameter | Description | Default |
---|---|---|
DURATION | Duration in seconds - during with network chaos will be applied. | 300 |
TARGET_NODE_AND_INTERFACE | # Dictionary with key as node name(s) and value as a list of its interfaces to test. For example: {ip-10-0-216-2.us-west-2.compute.internal: [ens5]} | "" |
LABEL_SELECTOR | When NODE_NAME is not specified, a node with matching label_selector is selected for running. | node-role.kubernetes.io/master |
INSTANCE_COUNT | Targeted instance count matching the label selector | 1 |
EXECUTION | Used to specify whether you want to apply filters on interfaces one at a time or all at once. | parallel |
NETWORK_PARAMS | latency, loss and bandwidth are the three supported network parameters to alter for the chaos test. For example: {latency: 50ms, loss: ‘0.02’} | "" |
WAIT_DURATION | Ensure that it is at least about twice of test_duration | 300 |
Note
In case of using custom metrics profile or alerts profile whenCAPTURE_METRICS
or ENABLE_ALERTS
is enabled, mount the metrics profile from the host on which the container is run using podman/docker under /home/krkn/kraken/config/metrics-aggregated.yaml
and /home/krkn/kraken/config/alerts
.$ podman run --name=<container_name> --net=host --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:container-scenarios