Krkn-Hub All Scenarios Variables

These variables are to be used for the top level configuration template that are shared by all the scenarios in Krkn-hub.

Each section below corresponds to a section in the Krkn config reference. Set variables on the host running the container:

export <parameter_name>=<value>

Kraken

Signal and status publishing settings. See Kraken config for full details.

ParameterDescriptionDefault
KRKN_KUBE_CONFIGPath to the kubeconfig file for cluster accessrequired
PUBLISH_KRAKEN_STATUSPublish kraken status to the signal addressTrue
SIGNAL_ADDRESSAddress to publish kraken status to0.0.0.0
PORTPort to publish kraken status to8081
SIGNAL_STATEWaits for the RUN signal when set to PAUSE before running the scenarios, refer docs for more detailsRUN

Cerberus

Cluster health monitoring integration. See Cerberus config for full details.

ParameterDescriptionDefault
CERBERUS_ENABLEDSet this to true if cerberus is running and monitoring the clusterFalse
CERBERUS_URLURL to poll for the go/no-go signalhttp://0.0.0.0:8080

Performance Monitoring

Prometheus metrics collection and alert evaluation. See Performance Monitoring config for full details.

ParameterDescriptionDefault
PROMETHEUS_URLURL to Prometheus instance; auto-detected on OpenShift, required for Kubernetesblank
PROMETHEUS_TOKENBearer token for Prometheus authentication; auto-detected on OpenShift, required for Kubernetesblank
UUIDUUID for the run; auto-generated if not setblank
CAPTURE_METRICSCaptures metrics as specified in the profile from in-cluster prometheus. Default metrics captures are listed hereFalse
METRICS_PATHPath to the metrics profile to use when CAPTURE_METRICS is setconfig/metrics-aggregated.yaml
ENABLE_ALERTSEvaluates expressions from in-cluster prometheus and exits 0 or 1 based on the severity set. Default profile.False
ALERTS_PATHPath to the alerts file to use when ENABLE_ALERTS is setconfig/alerts
CHECK_CRITICAL_ALERTSWhen enabled will check prometheus for critical alerts firing post chaosFalse

Resiliency Score

Resiliency scoring configuration. See Resiliency Score config for full details.

ParameterDescriptionDefault
RESILIENCY_RUN_MODEResiliency scoring mode: standalone embeds score in telemetry, detailed prints JSON report to stdout, disabled turns off scoringstandalone
RESILIENCY_FILEPath to a YAML file containing SLO definitions; defaults to the alerts profile or config/alerts.yamlconfig/alerts.yaml

Elastic

Elasticsearch storage for telemetry and metrics. See Elastic config for full details.

ParameterDescriptionDefault
ENABLE_ESEnable Elasticsearch integrationFalse
ES_VERIFY_CERTSVerify SSL certificates when connecting to ElasticsearchTrue
ES_SERVERURL of the Elasticsearch instanceblank
ES_PORTPort of the Elasticsearch instanceblank
ES_USERNAMEUsername for Elasticsearch authenticationblank
ES_PASSWORDPassword for Elasticsearch authenticationblank
ES_METRICS_INDEXElasticsearch index for metrics datablank
ES_ALERTS_INDEXElasticsearch index for alerts datablank
ES_TELEMETRY_INDEXElasticsearch index for telemetry datablank
ES_RUN_TAGTag to identify the run in Elasticsearchblank

Tunings

Execution timing and iteration controls. See Tunings config for full details.

ParameterDescriptionDefault
WAIT_DURATIONDuration in seconds to wait between each chaos scenario60
ITERATIONSNumber of times to execute the scenarios1
DAEMON_MODEIterations are set to infinity which means that the kraken will cause chaos foreverFalse

Telemetry

Run data collection and upload settings. See Telemetry config for full details.

ParameterDescriptionDefault
TELEMETRY_ENABLEDEnable/disables the telemetry collection featureFalse
TELEMETRY_API_URLTelemetry service endpointhttps://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production
TELEMETRY_USERNAMETelemetry service usernameredhat-chaos
TELEMETRY_PASSWORDTelemetry service passwordNo default
TELEMETRY_PROMETHEUS_BACKUPEnables/disables prometheus data collectionTrue
TELEMETRY_FULL_PROMETHEUS_BACKUPIf set to False only the /prometheus/wal folder will be downloadedFalse
TELEMETRY_BACKUP_THREADSNumber of telemetry download/upload threads5
TELEMETRY_ARCHIVE_PATHLocal path where the archive files will be temporarily stored/tmp
TELEMETRY_MAX_RETRIESMaximum number of upload retries (if 0 will retry forever)0
TELEMETRY_RUN_TAGIf set, this will be appended to the run folder in the bucket (useful to group the runs)chaos
TELEMETRY_GROUPIf set will archive the telemetry in the S3 bucket on a folder named after the valuedefault
TELEMETRY_ARCHIVE_SIZEThe size of the prometheus data archive in KB1000
TELEMETRY_LOGS_BACKUPLogs backup to S3False
TELEMETRY_FILTER_PATTERNFilter logs based on certain timestamp patterns["(\\w{3}\\s\\d{1,2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d+).+", ...]
TELEMETRY_CLI_PATHOC CLI path, if not specified will be searched in $PATHblank
TELEMETRY_EVENTS_BACKUPEnables/disables events backup to S3False

Health Checks

Application endpoint monitoring during chaos. See Health Checks config for full details.

ParameterDescriptionDefault
HEALTH_CHECK_URLURL to continually check and detect downtimesblank
HEALTH_CHECK_INTERVALInterval in seconds at which to run health checks2
HEALTH_CHECK_BEARER_TOKENBearer token used for authenticating into health check URLblank
HEALTH_CHECK_AUTHTuple of (username, password) used for authenticating into health check URLblank
HEALTH_CHECK_EXIT_ON_FAILUREIf True, exits when health check fails for applicationblank
HEALTH_CHECK_VERIFYHealth check URL SSL validationFalse

Virt Checks

KubeVirt VMI SSH connection monitoring during chaos. See Virt Checks config for full details.

ParameterDescriptionDefault
KUBE_VIRT_CHECK_INTERVALInterval in seconds at which to test kubevirt connections2
KUBE_VIRT_NAMESPACENamespace to find VMIs in and watchblank
KUBE_VIRT_NAMERegex style name to match VMIs to watchblank
KUBE_VIRT_FAILURESIf True, will only report when ssh connections fail to VMIblank
KUBE_VIRT_DISCONNECTEDUse disconnected check by passing cluster APIFalse
KUBE_VIRT_NODE_NAMEIf set, will filter VMs to only track ones running on the specified nodeblank
KUBE_VIRT_EXIT_ON_FAILFails run if VMs still have false status at end of runFalse
KUBE_VIRT_SSH_NODEIf set, will be a backup way to SSH to a node. Should be a node not targeted in chaosblank