← Back to Blog

Fix Kubernetes CrashLoopBackOff: 8 Causes and Solutions

Quick Fix

Pod stuck in CrashLoopBackOff? Start with these three commands:

# See why the pod is crashing
kubectl describe pod  -n 

# Check logs from the crashed container
kubectl logs  -n  --previous

# Check events for the namespace
kubectl get events -n  --sort-by='.lastTimestamp' | tail -20

CrashLoopBackOff is not an error itself. It is Kubernetes telling you that a container keeps crashing and it is applying exponential backoff before restarting it (10s, 20s, 40s, 80s, 160s, 300s max). The actual error is whatever is causing the container to exit. This guide covers 8 specific causes with the exact kubectl describe output you will see for each one and the fix for each.

Understanding the kubectl describe Output

Before diving into causes, understand how to read the output. Run kubectl describe pod <pod-name> and look at these sections:

State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       Error        # or OOMKilled, Completed
  Exit Code:    1            # or 137, 126, 127, 0
  Started:      Wed, 02 Apr 2026 14:00:00 +0000
  Finished:     Wed, 02 Apr 2026 14:00:03 +0000
Restart Count:  5

Key fields:

  • Exit Code 0: Container exited successfully. It was not meant to be long-running, or it completed and Kubernetes keeps restarting it.
  • Exit Code 1: Application error. Check logs.
  • Exit Code 126: Command found but not executable (permission issue).
  • Exit Code 127: Command not found (wrong binary path or missing from image).
  • Exit Code 137: SIGKILL. Usually OOM killed. Check if Reason says OOMKilled.
  • Exit Code 139: Segfault (SIGSEGV).
  • Exit Code 143: SIGTERM. Graceful shutdown, but the container did not stay running.

Cause 1: OOMKilled (Exit Code 137)

What You See

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Why It Happens

The container exceeded its resources.limits.memory. The kernel OOM killer sent SIGKILL. This is the single most common cause of CrashLoopBackOff in production clusters.

The Fix

spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "512Mi"
      limits:
        memory: "1Gi"    # Increase this

Before blindly increasing the limit, check actual usage:

# Current memory usage across all pods in deployment
kubectl top pods -n  -l app=your-app

# If metrics-server is not installed
kubectl exec  -- cat /sys/fs/cgroup/memory.current

Set the limit to 125-150% of observed peak usage. For Java, ensure -XX:MaxRAMPercentage=75 is set. For Node.js, set --max-old-space-size to 75% of the limit.

Cause 2: Missing ConfigMap or Secret

What You See

Events:
  Warning  FailedMount  kubelet  MountVolume.SetUp failed for volume "app-config":
    configmap "app-config" not found

Or for environment variables:

Events:
  Warning  Failed  kubelet  Error: configmap "app-config" not found

Why It Happens

The pod spec references a ConfigMap or Secret that does not exist in the namespace. Common after deploying to a new namespace, after a Helm values error, or after someone deleted the ConfigMap.

The Fix

# Check if the ConfigMap exists
kubectl get configmap app-config -n 

# Check if the Secret exists
kubectl get secret app-secret -n 

# If missing, create it
kubectl create configmap app-config \
  --from-file=config.yaml=./config.yaml \
  -n 

# Or for secrets
kubectl create secret generic app-secret \
  --from-literal=DB_PASSWORD=yourpassword \
  -n 

To prevent this from blocking pod startup entirely, mark the reference as optional:

env:
- name: DB_HOST
  valueFrom:
    configMapKeyRef:
      name: app-config
      key: db_host
      optional: true    # Pod starts even if ConfigMap is missing

Cause 3: Failed Liveness or Startup Probe

What You See

Events:
  Warning  Unhealthy  kubelet  Liveness probe failed: HTTP probe failed with statuscode: 503
  Normal   Killing    kubelet  Container app failed liveness probe, will be restarted

Why It Happens

The liveness probe is declaring the container unhealthy, causing Kubernetes to kill and restart it. The most common subcauses:

  • The application takes longer to start than the probe allows
  • The probe endpoint is wrong (path, port, or scheme)
  • The application is genuinely unhealthy (deadlocked, resource exhaustion)

The Fix

For slow-starting applications, use a startup probe instead of relying on initialDelaySeconds:

spec:
  containers:
  - name: app
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      failureThreshold: 30
      periodSeconds: 10
      # Allows up to 300 seconds (5 min) for startup
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 10
      failureThreshold: 3
      # Only runs after startupProbe succeeds

Verify the probe works manually:

# Port-forward and test the health endpoint
kubectl port-forward  8080:8080 -n 
curl -v http://localhost:8080/healthz

Cause 4: Missing Secret for Image Pull (ImagePullBackOff)

What You See

Events:
  Warning  Failed   kubelet  Failed to pull image "registry.example.com/app:v1.2":
    rpc error: code = Unknown desc = failed to pull and unpack image: unauthorized
  Warning  Failed   kubelet  Error: ErrImagePull
  Normal   BackOff  kubelet  Back-off pulling image "registry.example.com/app:v1.2"
  Warning  Failed   kubelet  Error: ImagePullBackOff

Why It Happens

The node cannot authenticate with the container registry. The imagePullSecrets is missing, expired, or references a secret that does not exist in the namespace.

The Fix

# Create the image pull secret
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=deploy \
  --docker-password=your-token \
  -n 

# Reference it in the pod spec
spec:
  imagePullSecrets:
  - name: regcred
  containers:
  - name: app
    image: registry.example.com/app:v1.2

For ECR, tokens expire every 12 hours. Use a CronJob or a controller like ecr-credential-helper to refresh the secret automatically. Or better, use IRSA (IAM Roles for Service Accounts) so nodes pull directly without stored credentials.

Decode Kubernetes Secrets Safely

Use SecureBin's Base64 decoder to inspect Kubernetes secrets without exposing them in shell history.

Decode Base64

Cause 5: Permission Denied (Exit Code 126)

What You See

Last State:     Terminated
  Reason:       Error
  Exit Code:    126

# In logs:
/bin/sh: ./entrypoint.sh: Permission denied

Why It Happens

The entrypoint script or binary is not executable. Common when building Docker images on macOS/Windows (file permissions not preserved) or when running as a non-root user that cannot execute the file.

The Fix

In your Dockerfile:

COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh

If the issue is a securityContext preventing execution:

spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000    # Ensures mounted volumes are group-accessible
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true

If readOnlyRootFilesystem: true is set and your app writes to the filesystem, mount writable volumes:

volumeMounts:
- name: tmp
  mountPath: /tmp
- name: app-data
  mountPath: /app/data
volumes:
- name: tmp
  emptyDir: {}
- name: app-data
  emptyDir: {}

Cause 6: Port Conflict or Bind Failure

What You See

# In logs:
Error: listen EADDRINUSE: address already in use :::8080
# or
bind: address already in use

Why It Happens

Another process in the same pod (sidecar, init container that did not terminate) is already bound to the port. Or the application is trying to bind to a privileged port (<1024) without the NET_BIND_SERVICE capability.

The Fix

Check if multiple containers in the pod use the same port:

kubectl get pod  -o jsonpath='{.spec.containers[*].ports[*].containerPort}' -n 

Each container in a pod shares the same network namespace, so ports must be unique across all containers. Change one container's port.

For privileged port binding:

spec:
  containers:
  - name: app
    securityContext:
      capabilities:
        add: ["NET_BIND_SERVICE"]

Or better, change the application to bind to a non-privileged port (8080 instead of 80) and use the Service to map external port 80 to container port 8080.

Cause 7: Init Container Failure

What You See

Init Containers:
  wait-for-db:
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Exit Code: 1
Containers:
  app:
    State:       Waiting
      Reason:    PodInitializing

Why It Happens

An init container must complete successfully (exit code 0) before the main containers start. If it fails, Kubernetes restarts it with backoff, and the main container never starts. Common init containers that fail: database migration jobs, wait-for-dependency scripts, config generators.

The Fix

# Check init container logs
kubectl logs  -c wait-for-db -n 

# Common wait-for-db init container (ensure it has correct host/port)
initContainers:
- name: wait-for-db
  image: busybox:1.36
  command: ['sh', '-c',
    'until nc -z db-service 5432; do echo "waiting for db"; sleep 2; done']

Verify the target service is reachable from the pod's namespace:

# Run a debug pod in the same namespace
kubectl run debug --rm -it --image=busybox:1.36 -n  -- sh
# Then test connectivity
nc -zv db-service 5432
nslookup db-service

Cause 8: Application Exits Immediately (Exit Code 0)

What You See

Last State:     Terminated
  Reason:       Completed
  Exit Code:    0
Restart Count:  8

Why It Happens

The container runs a command that exits successfully and then terminates. Kubernetes restarts it because the restartPolicy is Always (the default for Deployments). This happens when:

  • The Dockerfile CMD runs a one-shot script instead of a long-running process
  • The application starts a background process and the foreground process exits
  • The entrypoint script does not exec the main process

The Fix

Ensure the main process runs in the foreground:

# Bad: starts in background, shell exits
CMD ["sh", "-c", "nginx &"]

# Good: runs in foreground
CMD ["nginx", "-g", "daemon off;"]

In entrypoint scripts, always use exec to replace the shell with the application process:

#!/bin/sh
# Setup tasks
echo "Configuring..."
envsubst < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf

# exec replaces the shell - PID 1 becomes nginx
exec nginx -g "daemon off;"

If the container is intentionally a one-shot job, use a Job or CronJob instead of a Deployment:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  template:
    spec:
      restartPolicy: Never    # Do not restart on success
      containers:
      - name: migrate
        image: your-app
        command: ["python", "manage.py", "migrate"]

Advanced Debugging Techniques

Ephemeral Debug Containers (Kubernetes 1.23+)

If the crashing container has no shell or debugging tools:

# Attach a debug container to the crashing pod
kubectl debug -it  \
  --image=busybox:1.36 \
  --target=app \
  -n 

# Now you can inspect the app container's filesystem, processes, etc.
ls /proc/1/root/app/
cat /proc/1/environ | tr '\0' '\n'

Override the Entrypoint

If you need to inspect the container without it crashing immediately, override the command to keep it running:

# Temporarily change the command in the deployment
kubectl patch deployment your-app -n  --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/command","value":["sleep","3600"]}]'

# Now exec into the running pod
kubectl exec -it  -n  -- /bin/sh

# Manually run the original entrypoint to see the error
./entrypoint.sh

Remember to revert the patch after debugging.

Check Resource Quotas and LimitRanges

A pod might fail to start because namespace resource quotas are exhausted:

# Check quotas
kubectl get resourcequota -n 
kubectl describe resourcequota -n 

# Check LimitRange defaults
kubectl get limitrange -n  -o yaml

A LimitRange can inject default memory limits that are too low, causing unexpected OOM kills even though you did not set explicit limits in your pod spec.

Validate Your Kubernetes YAML

Use SecureBin's YAML Validator to catch syntax errors in your Kubernetes manifests before deploying.

Validate YAML

The CrashLoopBackOff Decision Tree

  1. Run kubectl describe pod. Check Events and Last State.
  2. Exit code 137 + OOMKilled? Increase memory limits. See Docker OOM Killed Fix.
  3. Exit code 1? Run kubectl logs --previous. The application logged the error before crashing.
  4. Exit code 0? The app is not long-running. Fix CMD/entrypoint or use a Job.
  5. Exit code 126/127? Binary not found or not executable. Check image and permissions.
  6. Events show FailedMount? ConfigMap, Secret, or PVC is missing.
  7. Events show Unhealthy + Killing? Probe is failing. Increase timeouts or fix the health endpoint.
  8. Init container failing? Check init container logs separately with -c flag.
  9. No logs at all? Container crashes before writing output. Override entrypoint with sleep and debug interactively.

The Bottom Line

CrashLoopBackOff is a symptom, not a diagnosis. The exit code in kubectl describe pod narrows the cause. The logs from kubectl logs --previous give you the specific error. In 90% of cases, it is one of these 8 causes: OOM kill, missing config, failed probe, image pull auth, permission denied, port conflict, init container failure, or a non-long-running process. Identify which one you are hitting, apply the targeted fix, and your pod will stabilize.

Related Articles

Continue reading: Fix Docker OOM Killed, Fix AWS EFS Permission Denied, Kubernetes Secrets Management, Fix Let's Encrypt Renewal Failed, API Key Rotation Best Practices.

UK
Written by Usman Khan
DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.