Krkn-Hub All Scenarios Variables

These variables are to be used for the top level configuration template that are shared by all the scenarios in Krkn-hub.

Each section below corresponds to a section in the Krkn config reference. Set variables on the host running the container:

export <parameter_name>=<value>

Kraken

Signal and status publishing settings. See Kraken config for full details.

ParameterDescriptionDefault
PUBLISH_KRAKEN_STATUSPublish kraken status to the signal addressTrue
SIGNAL_ADDRESSAddress to publish kraken status to0.0.0.0
PORTPort to publish kraken status to8081
SIGNAL_STATEWaits for the RUN signal when set to PAUSE before running the scenarios, refer docs for more detailsRUN

Cerberus

Cluster health monitoring integration. See Cerberus config for full details.

ParameterDescriptionDefault
CERBERUS_ENABLEDSet this to true if cerberus is running and monitoring the clusterFalse
CERBERUS_URLURL to poll for the go/no-go signalhttp://0.0.0.0:8080

Performance Monitoring

Prometheus metrics collection and alert evaluation. See Performance Monitoring config for full details.

ParameterDescriptionDefault
DEPLOY_DASHBOARDSDeploys mutable grafana loaded with dashboards visualizing performance metrics pulled from in-cluster prometheus. The dashboard will be exposed as a route.False
CAPTURE_METRICSCaptures metrics as specified in the profile from in-cluster prometheus. Default metrics captures are listed hereFalse
ENABLE_ALERTSEvaluates expressions from in-cluster prometheus and exits 0 or 1 based on the severity set. Default profile.False
ALERTS_PATHPath to the alerts file to use when ENABLE_ALERTS is setconfig/alerts
CHECK_CRITICAL_ALERTSWhen enabled will check prometheus for critical alerts firing post chaosFalse

Resiliency Score

Resiliency scoring configuration. See Resiliency Score config for full details.

ParameterDescriptionDefault
RESILIENCY_RUN_MODEResiliency scoring mode: standalone embeds score in telemetry, detailed prints JSON report to stdout, disabled turns off scoringstandalone
RESILIENCY_FILEPath to a YAML file containing SLO definitions; defaults to the alerts profile or config/alerts.yamlconfig/alerts.yaml

Elastic

Elasticsearch storage for telemetry and metrics. See Elastic config for full details.

ParameterDescriptionDefault
ELASTIC_SERVERURL of the Elasticsearch instance to store telemetry datablank
ELASTIC_INDEXElasticsearch index pattern to post results toblank

Tunings

Execution timing and iteration controls. See Tunings config for full details.

ParameterDescriptionDefault
WAIT_DURATIONDuration in seconds to wait between each chaos scenario60
ITERATIONSNumber of times to execute the scenarios1
DAEMON_MODEIterations are set to infinity which means that the kraken will cause chaos foreverFalse

Telemetry

Run data collection and upload settings. See Telemetry config for full details.

ParameterDescriptionDefault
TELEMETRY_ENABLEDEnable/disables the telemetry collection featureFalse
TELEMETRY_API_URLTelemetry service endpointhttps://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production
TELEMETRY_USERNAMETelemetry service usernameredhat-chaos
TELEMETRY_PASSWORDTelemetry service passwordNo default
TELEMETRY_PROMETHEUS_BACKUPEnables/disables prometheus data collectionTrue
TELEMTRY_FULL_PROMETHEUS_BACKUPIf set to False only the /prometheus/wal folder will be downloadedFalse
TELEMETRY_BACKUP_THREADSNumber of telemetry download/upload threads5
TELEMETRY_ARCHIVE_PATHLocal path where the archive files will be temporarily stored/tmp
TELEMETRY_MAX_RETRIESMaximum number of upload retries (if 0 will retry forever)0
TELEMETRY_RUN_TAGIf set, this will be appended to the run folder in the bucket (useful to group the runs)chaos
TELEMETRY_GROUPIf set will archive the telemetry in the S3 bucket on a folder named after the valuedefault
TELEMETRY_ARCHIVE_SIZEThe size of the prometheus data archive in KB1000
TELEMETRY_LOGS_BACKUPLogs backup to S3False
TELEMETRY_FILTER_PATTERFilter logs based on certain timestamp patterns["(\\w{3}\\s\\d{1,2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d+).+", ...]
TELEMETRY_CLI_PATHOC CLI path, if not specified will be searched in $PATHblank

Health Checks

Application endpoint monitoring during chaos. See Health Checks config for full details.

ParameterDescriptionDefault
HEALTH_CHECK_URLURL to continually check and detect downtimesblank
HEALTH_CHECK_INTERVALInterval in seconds at which to run health checks2
HEALTH_CHECK_BEARER_TOKENBearer token used for authenticating into health check URLblank
HEALTH_CHECK_AUTHTuple of (username, password) used for authenticating into health check URLblank
HEALTH_CHECK_EXIT_ON_FAILUREIf True, exits when health check fails for applicationblank
HEALTH_CHECK_VERIFYHealth check URL SSL validationFalse

Virt Checks

KubeVirt VMI SSH connection monitoring during chaos. See Virt Checks config for full details.

ParameterDescriptionDefault
KUBE_VIRT_CHECK_INTERVALInterval in seconds at which to test kubevirt connections2
KUBE_VIRT_NAMESPACENamespace to find VMIs in and watchblank
KUBE_VIRT_NAMERegex style name to match VMIs to watchblank
KUBE_VIRT_FAILURESIf True, will only report when ssh connections fail to VMIblank
KUBE_VIRT_DISCONNECTEDUse disconnected check by passing cluster APIFalse
KUBE_VIRT_NODE_NAMEIf set, will filter VMs to only track ones running on the specified nodeblank
KUBE_VIRT_EXIT_ON_FAILFails run if VMs still have false status at end of runFalse
KUBE_VIRT_SSH_NODEIf set, will be a backup way to SSH to a node. Should be a node not targeted in chaosblank