Long-Term Storage

Persist metrics from every chaos run into Elasticsearch to compare behavior across releases, dates, or cluster configurations.

Tags:

Docs

Categories:

Getting Started

Goal: Persist metrics from every chaos run into Elasticsearch so you can compare behavior across releases, dates, or cluster configurations.

This journey enables regression analysis — for example, detecting that API server latency during a node failure has increased between software versions.

What you need

Everything from Metrics Validation
An Elasticsearch instance (self-hosted or managed) — need to set one up? See installing Elasticsearch on a kind cluster
krknctl installed

Steps

Complete Metrics Validation first to confirm Prometheus evaluation is working.

Define your metrics profile at config/metrics.yaml. This controls which Prometheus metrics are snapshotted and stored per run:

- query: irate(apiserver_request_total{verb="POST"}[2m])
  metricName: apiserverRequestRate

- query: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[2m]))
  metricName: etcdFsyncLatencyP99

Run a scenario with both profiles mounted:
```
krknctl run pod-scenarios \
  --metrics-profile config/metrics.yaml \
  --alerts-profile config/alerts.yaml
```
After each scenario, metrics snapshots are stored alongside run metadata (scenario type, duration, cluster version, exit status).

Deploy krkn-visualize to query your data through pre-built Grafana dashboards for API performance, etcd health, node and pod scenarios, and more:

krknctl visualize \
  --es-url https://elasticsearch.example.com \
  --es-username elastic \
  --es-password <your-password> \
  --prometheus-url https://prometheus.example.com \
  --prometheus-bearer-token <your-token> \
  --grafana-password <grafana-admin-password>

This deploys krkn-visualize to your cluster and wires it to both Elasticsearch and Prometheus. To tear it down later:

krknctl visualize --delete

Reference docs

krknctl usage — full flag reference for run and visualize
Performance Dashboards — krkn-visualize dashboards and manual deploy script
Telemetry — understanding the data Krkn captures and stores per run
Installing Elasticsearch on a kind cluster — Helm-based setup for local testing

Next steps

To generate a numerical resilience score on top of your Prometheus data, continue to Resilience Score.

Last modified May 6, 2026: docs: fix broken performance dashboards links in long term storage user journey (#341) (f85007e)