Container Scenarios
Recovery Time Metrics in Krkn Telemetry
Krkn tracks three key recovery time metrics for each affected container:
pod_rescheduling_time - The time (in seconds) that the Kubernetes cluster took to reschedule the pod after it was killed. This measures the cluster’s scheduling efficiency and includes the time from pod deletion until the replacement pod is scheduled on a node. In some cases when the container gets killed, the pod won’t fully reschedule so the pod rescheduling might be 0.0 seconds
pod_readiness_time - The time (in seconds) the pod took to become ready after being scheduled. This measures application startup time, including container image pulls, initialization, and readiness probe success.
total_recovery_time - The total amount of time (in seconds) from pod deletion until the replacement pod became fully ready and available to serve traffic. This is the sum of rescheduling time and readiness time.
These metrics appear in the telemetry output under PodsStatus.recovered for successfully recovered pods. Pods that fail to recover within the timeout period appear under PodsStatus.unrecovered without timing data.
Example telemetry output:
{
"recovered": [
{
"pod_name": "backend-7d8f9c-xyz",
"namespace": "production",
"pod_rescheduling_time": 43.62235879898071,
"pod_readiness_time": 0.0,
"total_recovery_time": 43.62235879898071
}
],
"unrecovered": []
}