你可以使用 kube_pod_container_status_last_terminated_reason 指标来查找问题原因。 首先,通过 Prometheus 查询该指标的值,找到容器最后一次终止的原因。 如果原因是 OOMKilled(内存不足导致被杀),你可能需要检查应用的内存使用情况,并考虑增加 Pod 的内存限制。 如果原因是 Error 或CrashLoopBackOff,你可能需要检查...
I guess one could introduce a kube_pod_container_status_last_terminated_reason metric. It seems like @brancz is fine accepting a pull request for this. For anyone wanting to tackle this, I am happy to help. Changes would need to touch pod.go. cs.LastTerminationState should give you ...