VMI Network Chaos

Injects network degradation into a KubeVirt Virtual Machine Instance (VMI) by shaping traffic on the VM's tap interface inside the virt-launcher network namespace. Supports configurable bandwidth limiting, latency injection, and packet loss. Unlike node or pod network chaos, this scenario targets the tap device that connects QEMU to the bridge, so only the specific VMI is affected without disrupting OVN's BFD heartbeats or other workloads on the same node.

How to Run VMI Network Chaos Scenarios

Choose your preferred method to run VMI network chaos scenarios:

Example scenario file: virt_network_chaos.yaml

Configuration

- id: vmi_network_chaos
  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
  wait_duration: 300
  test_duration: 120
  label_selector: ""
  service_account: ""
  taints: []
  namespace: "my-namespace"
  instance_count: 1
  execution: serial
  target: ".*"
  interfaces: []
  ingress: true
  egress: true
  latency: "100ms"
  loss: "10"
  bandwidth: "100mbit"

For the common module settings please refer to the documentation.

target: regex to match VMI names within the namespace (e.g. "<vmi-name-prefix>-.*" or ".*" for all)
namespace: namespace containing the target VMIs (required; also supports regex to match multiple namespaces)
interfaces: list of tap interface names to target. Leave empty to auto-detect the tap device in the virt-launcher network namespace
ingress: shape incoming traffic to the VM
egress: shape outgoing traffic from the VM
latency: artificial network latency added to packets (e.g. "100ms", "500ms")
loss: percentage of packets to drop (e.g. "10" for 10%, "50" for 50%)
bandwidth: maximum throughput cap (e.g. "100mbit", "1gbit", "500kbit")

Note

At least one of latency, loss, or bandwidth should be set. Setting all three simultaneously compounds the degradation.

Catastrophic Configurations

The following combinations produce the most impactful chaos:

Complete network degradation (maximum chaos):

  latency: "2000ms"
  loss: "50"
  bandwidth: "1mbit"

Combines severe latency with heavy packet loss and near-complete bandwidth exhaustion.

DNS blackout via latency (cascading failures):

  latency: "5000ms"
  loss: "0"
  bandwidth: ""

5-second latency causes DNS timeouts across every service in the VM, producing cascading failures without a hard cut.

Bandwidth starvation:

  latency: ""
  loss: "0"
  bandwidth: "100kbit"

Throttles the VMI to 100 kbit/s — enough to keep connections alive but too slow for most application traffic.

Usage

To enable VMI network chaos scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the scenario yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network_chaos.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network_chaos.yaml
            - scenarios/openshift/virt_network_chaos_2.yaml

You can also combine multiple different scenario types in the same config.yaml file:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network_chaos.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos
OR
$ docker run -e <VARIABLE>=<value> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~/kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

ex.) export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
TOTAL_CHAOS_DURATION	Chaos duration in seconds	number	120
NAMESPACE	Namespace containing the target VMIs (required)	string
VMI_NAME	Regex to match VMI names (e.g. `virt-server-.` or `.` for all)	string	`.*`
LABEL_SELECTOR	Label selector to filter VMIs (e.g. `app=myapp`)	string	`""`
INSTANCE_COUNT	Maximum number of VMIs to target	number	1
EXECUTION	Execution mode: `serial` or `parallel`	enum	`serial`
INGRESS	Shape incoming traffic to the VM	boolean	true
EGRESS	Shape outgoing traffic from the VM	boolean	true
INTERFACES	Comma-separated tap interface names (empty to auto-detect)	string	`""`
LATENCY	Artificial latency added to packets (e.g. `100ms`, `500ms`)	string	`""`
LOSS	Packet loss percentage (e.g. `10` for 10%)	string	`""`
BANDWIDTH	Maximum throughput cap (e.g. `100mbit`, `1gbit`)	string	`""`
WAIT_DURATION	Seconds to wait before running the next scenario in the same file	number	300
IMAGE	Network chaos injection workload image	string	`quay.io/krkn-chaos/krkn-network-chaos:latest`
TAINTS	List of taints for which tolerations are created (e.g. `["node-role.kubernetes.io/master:NoSchedule"]`)	string	`[]`
SERVICE_ACCOUNT	Optional service account for the scenario workload	string	`""`

NOTE In case of using custom metrics profile or alerts profile when CAPTURE_METRICS or ENABLE_ALERTS is enabled, mount the metrics profile from the host on which the container is run using podman/docker under /home/krkn/kraken/config/metrics-aggregated.yaml and /home/krkn/kraken/config/alerts. For example:

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos

krknctl run vmi-network-chaos [--<parameter> <value>]

Can also set any global variable listed here

VMI Network Chaos Parameters

Argument	Type	Description	Required	Default Value
`--chaos-duration`	number	Chaos duration in seconds	false	120
`--namespace`	string	Namespace containing the target VMIs	true
`--target`	string	Regex to match VMI names (e.g. `<vmi-name-prefix>-.` or `.` for all)	false	`.*`
`--label-selector`	string	Label selector to filter VMIs (e.g. `app=myapp`)	false
`--instance-count`	number	Maximum number of VMIs to target	false	1
`--execution`	enum	Execution mode: `parallel` or `serial`	false	serial
`--ingress`	boolean	Shape incoming traffic to the VM	false	true
`--egress`	boolean	Shape outgoing traffic from the VM	false	true
`--interfaces`	string	Comma-separated tap interface names (empty to auto-detect)	false
`--latency`	string	Artificial latency added to packets (e.g. `100ms`, `500ms`)	false
`--loss`	string	Packet loss percentage (e.g. `10` for 10%)	false
`--bandwidth`	string	Maximum throughput cap (e.g. `100mbit`, `1gbit`, `500kbit`)	false
`--image`	string	Network chaos injection workload image	false	quay.io/krkn-chaos/krkn-network-chaos:latest
`--taints`	string	Comma-separated taints for which tolerations are created (e.g. `node-role.kubernetes.io/master:NoSchedule`)	false
`--service-account`	string	Optional service account for the scenario workload	false
`--wait-duration`	number	Seconds to wait before running the next scenario in the same file	false	300

Parameter Format Details

VMI Selection:

--namespace: required; supports regex to match multiple namespaces (e.g. virt-density-.*)
--target: regex matched against VMI names (e.g. <vmi-name-prefix>-.* targets all VMIs whose name starts with that prefix)
--label-selector: Kubernetes label selector in key=value format
Use --instance-count to limit how many matching VMIs are targeted

Traffic Shaping Values:

--latency: any value accepted by Linux tc netem delay (e.g. 100ms, 1s, 500ms)
--loss: integer percentage without the % symbol (e.g. 10 = 10%)
--bandwidth: any value accepted by Linux tc HTB rate (e.g. 100mbit, 1gbit, 500kbit)
At least one of --latency, --loss, or --bandwidth should be set

Interface Detection:

Leave --interfaces empty to let the scenario auto-detect the tap device inside the virt-launcher network namespace
Specify explicitly (e.g. tap0) only if auto-detection fails or you want to target a specific interface

Example Commands

Add latency and packet loss to all VMIs in a namespace:

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target ".*" \
  --latency 100ms \
  --loss 10 \
  --chaos-duration 120

Bandwidth cap on a specific VMI:

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target "<vmi-name>" \
  --bandwidth 1mbit \
  --ingress true \
  --egress true \
  --chaos-duration 300

Catastrophic combined degradation:

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target "<vmi-name-prefix>-.*" \
  --instance-count 3 \
  --execution parallel \
  --latency 2000ms \
  --loss 50 \
  --bandwidth 1mbit \
  --chaos-duration 180

DNS blackout simulation (high latency, no packet drop):

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target ".*" \
  --latency 5000ms \
  --chaos-duration 60