Docker Build Cache: Why Your Builds Are Stale and How to Fix Cache Invalidation
Your Docker build takes 15 minutes when it should take 2. Or worse, it finishes fast but serves stale code because a cached layer did not pick up your changes. Both problems trace back to the same root cause: misunderstanding how Docker's build cache decides what to rebuild and what to skip.
TL;DR
Docker caches each instruction as a layer. When any layer changes, that layer and every layer after it gets rebuilt. The fix: put rarely-changing instructions (system packages, dependency installs) before frequently-changing ones (application source code). Use .dockerignore to prevent irrelevant file changes from busting the cache. Use --no-cache to force a full rebuild when you suspect stale layers.
How Docker Build Cache Works
Before you can fix cache problems, you need to understand what Docker is actually caching. Every instruction in a Dockerfile (FROM, RUN, COPY, ADD, ENV, etc.) produces a layer. Docker stores these layers and reuses them on subsequent builds if nothing has changed.
The critical detail is how Docker determines "nothing has changed." It depends on the instruction type:
- RUN instructions: Cached based on the command string only. If the command text is identical, Docker reuses the cached layer. This means
RUN apt-get updatewill use the cache even if new packages are available upstream, because the string has not changed. - COPY and ADD instructions: Cached based on the content hash of the files being copied. Docker computes a checksum of every file in the build context that matches the instruction. If any file changes, the cache is invalidated.
- ENV, ARG, LABEL, EXPOSE: Cached based on the instruction string and its values. Changing an
ARGvalue invalidates the cache for that layer. - FROM: Cached based on the image digest. If the upstream image has a new digest (even with the same tag), the cache breaks.
The cascade rule is what catches most people off guard. When layer N is invalidated, layers N+1, N+2, and every subsequent layer must also be rebuilt, even if their instructions have not changed. This is because each layer depends on the filesystem state produced by all previous layers.
Step 1: Layer Ordering, the Number One Optimization
The single most impactful change you can make to your Dockerfile is reordering instructions so that frequently-changing layers come last. In my experience running CI pipelines for enterprise Magento deployments, this alone can cut build times by 70% or more.
Here is a common Python Dockerfile that wastes cache on every build:
# Bad: COPY . invalidates pip install cache on every code change
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
RUN python -m pytest
CMD ["python", "app.py"]
Every time you change a single line of application code, Docker invalidates the COPY . . layer. Because pip install comes after it, all dependencies get reinstalled from scratch. On a project with 200+ dependencies, that is 3 to 5 minutes wasted per build.
The fix is simple. Copy the dependency manifest first, install dependencies, then copy everything else:
# Good: dependencies only reinstall when requirements.txt changes
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN python -m pytest
CMD ["python", "app.py"]
Now pip install only reruns when requirements.txt actually changes. Application code changes only invalidate the second COPY . . layer and everything after it, which is typically just the test and CMD layers.
The same pattern applies to every language ecosystem. For Node.js, copy package.json and package-lock.json before npm install. For Go, copy go.mod and go.sum before go mod download. For PHP/Composer, copy composer.json and composer.lock before composer install.
Step 2: The .dockerignore Gap
You optimized your layer ordering, but your builds are still busting cache unexpectedly. The likely culprit: files in your build context that should not be there.
When Docker executes COPY . ., it computes content hashes for every file in the build context. If any of those files changes, the layer is invalidated. This includes files you never intended to copy: .git/ directory (changes on every commit), node_modules/ (rebuilt locally), __pycache__/ (Python bytecode), IDE settings, local environment files, and build artifacts.
A proper .dockerignore is not optional. It is a cache correctness requirement:
# .dockerignore
.git
.gitignore
.dockerignore
Dockerfile*
docker-compose*.yml
README.md
LICENSE
docs/
# Dependencies (installed inside container)
node_modules
vendor
__pycache__
*.pyc
.venv
# Build artifacts
dist
build
*.egg-info
# IDE and OS
.vscode
.idea
*.swp
*.swo
.DS_Store
Thumbs.db
# Environment and secrets
.env
.env.*
*.pem
*.key
I have seen builds where the .git directory alone was 500MB. Every commit changed the Git objects, invalidating the COPY layer and forcing a full rebuild. Adding .git to .dockerignore fixed the cache hit rate overnight.
There is a security angle here too. Without a .dockerignore, sensitive files like .env, private keys, and credentials can end up baked into your image layers. Even if a later layer deletes them, they remain extractable from the image history. Check your images with the SecureBin Exposure Checker to verify you are not leaking secrets through your infrastructure.
Step 3: Multi-Stage Builds for Independent Caching
Multi-stage builds let you separate build-time dependencies from runtime dependencies. Each stage has its own cache chain, so changes to build tooling do not invalidate your runtime image, and vice versa.
Here is a real-world multi-stage Dockerfile for a Node.js application:
# Stage 1: Install dependencies
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
# Stage 2: Build the application
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 3: Production runtime
FROM node:20-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production
# Copy only what we need from previous stages
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]
This gives you three separate cache chains. The deps stage only rebuilds when package.json or package-lock.json changes. The builder stage rebuilds when source code changes, but it does not affect the dependency install. The runtime stage produces a minimal image with no build tools, no dev dependencies, and no source code.
For compiled languages, the savings are even more dramatic. A Go multi-stage build can produce a final image under 20MB with a scratch or distroless base, while the build stage carries the entire Go toolchain:
# Build stage with full toolchain
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server
# Runtime stage: ~5MB final image
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]
Step 4: BuildKit Cache Mounts
BuildKit introduced cache mounts (--mount=type=cache) that persist package manager caches across builds without baking them into image layers. This is different from layer caching. Instead of caching the entire layer output, it caches specific directories (like pip's download cache or npm's cache) on the host and mounts them into the build container.
Here are cache mount examples for the most common package managers:
# Python pip
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Node.js npm
RUN --mount=type=cache,target=/root/.npm \
npm ci
# Debian/Ubuntu apt
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt/lists \
apt-get update && apt-get install -y curl git
# PHP Composer
RUN --mount=type=cache,target=/root/.composer/cache \
composer install --no-dev --optimize-autoloader
# Go modules
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
The key advantage: even when the layer cache is invalidated (because you changed requirements.txt), the package manager can reuse previously downloaded packages from its own cache. Instead of downloading 200 packages from PyPI, pip only downloads the 3 that actually changed. In practice, this turns a 4-minute pip install into a 20-second one.
To use cache mounts, you need BuildKit enabled. Set DOCKER_BUILDKIT=1 as an environment variable, or use docker buildx build which enables BuildKit by default.
If you use docker buildx build --cache-from type=gha --cache-to type=gha in GitHub Actions, be aware that BuildKit caches aggressively, including ENV metadata layers. When you use build args to set environment variables for different markets or environments (e.g., --build-arg MARKET=eu), the cached layer may serve a stale ENV MARKET=us from a previous build. The build arg is passed correctly, but the cached ENV layer overrides it silently.
I hit this exact problem running CI for a multi-region e-commerce platform. EU builds were deploying with US configuration because the GHA cache served the US ENV layer. The workaround: override the environment variable in your Kubernetes deployment manifests or Docker Compose files, so the runtime value always takes precedence over whatever was baked into the image. Alternatively, inject configuration at container startup via an entrypoint script that reads from environment variables rather than relying on build-time ENV instructions.
Is Your Infrastructure Leaking Secrets?
Docker images, .env files, exposed configs. Run a free scan to check what your domain is exposing publicly.
Run Free Exposure CheckStep 5: CI/CD Cache Strategies
Local Docker caching works well on developer machines, but CI environments are ephemeral. Every GitHub Actions run, every GitLab CI job, every Jenkins agent starts with an empty Docker cache by default. Without an external cache backend, every CI build is a cold build.
BuildKit supports several external cache backends. Each has different tradeoffs:
Registry Cache
Pushes cache layers to a container registry alongside your images. Any runner that can pull from the registry can reuse the cache. This is the most portable option:
# Push cache to registry
docker buildx build \
--cache-from type=registry,ref=myregistry.com/myapp:cache \
--cache-to type=registry,ref=myregistry.com/myapp:cache,mode=max \
-t myregistry.com/myapp:latest \
--push .
The mode=max flag caches all layers including intermediate ones, not just the final image layers. This gives the best cache hit rate but uses more registry storage.
GitHub Actions Cache
Uses GitHub's built-in cache storage (10 GB per repo). Fast for GHA workflows, but not shared across different CI systems:
name: Build and Push
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.ECR_REGISTRY }}/myapp:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
COMMIT_SHA=${{ github.sha }}
S3 Cache
Stores cache blobs in an S3 bucket. Useful when you want cache sharing across multiple CI systems, or when you need more than the 10 GB GitHub cache limit:
docker buildx build \
--cache-from type=s3,region=us-east-1,bucket=my-docker-cache,name=myapp \
--cache-to type=s3,region=us-east-1,bucket=my-docker-cache,name=myapp,mode=max \
-t myapp:latest .
Step 6: Debugging Cache Misses
When your build is slower than expected, you need to identify which layer is breaking the cache. BuildKit's plain-text progress output shows exactly what happened:
DOCKER_BUILDKIT=1 docker build --progress=plain -t myapp . 2>&1 | head -80
Look for CACHED vs DONE in the output. Cached layers show:
#5 [2/6] COPY requirements.txt .
#5 CACHED
#6 [3/6] RUN pip install -r requirements.txt
#6 CACHED
#7 [4/6] COPY . .
#7 DONE 0.8s
In this example, layers 2 and 3 hit the cache (dependency install was reused), and layer 4 was rebuilt because application code changed. That is exactly the behavior we want.
If you see a layer that should be cached but shows DONE instead, investigate what changed. Use docker history to inspect existing image layers:
docker history myapp:latest --no-trunc --format "table {{.CreatedBy}}\t{{.Size}}"
Common causes of unexpected cache misses:
- Timestamps in generated files: Build tools that embed timestamps into output files (Java's
.classfiles, Python's.pycfiles) cause COPY layers to see different content hashes on every build. Exclude these in.dockerignore. - Changing build context: A new file in the project directory (even one you do not use) can bust the cache if your
COPYinstruction is broad enough to include it. - Upstream image updates: If your
FROMimage uses a mutable tag like:latestor:3.12, the digest can change when the publisher pushes an update. Pin to a specific digest or SHA for deterministic caching. - Build arguments: Different
--build-argvalues produce different cache keys. If your CI passes a commit SHA or timestamp as a build arg, every build will be unique.
Cache Backend Comparison
| Backend | Speed | Shared Across Runners | Size Limit | Setup Complexity | Best For |
|---|---|---|---|---|---|
| Local | Fastest | No | Disk space | None | Local dev, self-hosted CI with persistent storage |
| Inline | Fast | Yes (via registry) | Image size | Low | Simple setups, single-stage builds |
| Registry | Medium | Yes | Registry quota | Low | Cross-platform CI, multi-team sharing |
| GHA | Fast | Same repo only | 10 GB per repo | Low | GitHub Actions workflows |
| S3 | Medium | Yes | Unlimited (pay per GB) | Medium | Large teams, multi-CI, big images |
Cost Savings: The Business Case for Cache Optimization
Build cache optimization is not just a developer convenience. It directly impacts infrastructure costs. In my experience managing CI for a platform with 50+ daily builds, proper caching reduced average build time from 15 minutes to 3 minutes. That is an 80% reduction in CI compute minutes.
Here is what that looks like financially:
- Before optimization: 50 builds/day x 15 min = 750 CI minutes/day = 22,500 min/month
- After optimization: 50 builds/day x 3 min = 150 CI minutes/day = 4,500 min/month
- Savings: 18,000 CI minutes/month
At GitHub Actions pricing ($0.008/min for Linux runners), that is $144/month saved. On larger runners or self-hosted infrastructure with higher compute costs, the savings multiply. For teams running hundreds of builds per day across multiple repositories, annual savings can reach tens of thousands of dollars.
Beyond raw compute costs, faster builds improve developer productivity. A 15-minute build loop means engineers context-switch, lose focus, and ship slower. A 3-minute build keeps them in flow. The productivity gain is harder to quantify but often more valuable than the infrastructure savings.
Build caching also reduces network bandwidth. Without caching, every build downloads base images, system packages, and language dependencies from external registries. That is gigabytes of redundant downloads per day. With proper caching, those downloads happen once and get reused.
Scan Your Public Attack Surface
Misconfigurations in Docker, CI/CD, and cloud infrastructure often expose secrets publicly. Run a free check to see what is visible.
Check Your Exposure FreeCommon Mistakes That Kill Your Cache
After auditing dozens of Dockerfiles across production environments, these are the mistakes I see most frequently:
- COPY . . before dependency install: The number one cache killer. Always copy dependency manifests first, install, then copy application code.
- Missing or incomplete .dockerignore: Without it, IDE configs, local builds, and the
.gitdirectory all contribute to cache-busting COPY hashes. - Embedding timestamps or commit SHAs in early layers: Putting
ARG BUILD_DATEorARG COMMIT_SHAbefore your dependency install means every single build has a unique cache key for all subsequent layers. Move these args to the end of the Dockerfile, or into aLABELin the final stage only. - Using --no-cache unnecessarily: Some teams add
--no-cacheto their CI pipeline "just to be safe." This forces a complete rebuild every time, defeating the entire purpose of caching. Only use--no-cachewhen you specifically need to pull fresh base images or verify a clean build. For routine CI, trust the cache. - Not pinning base image versions:
FROM python:3resolves to whatever the latest 3.x image is today. When Python publishes 3.13, your cache breaks across every service. Pin toFROM python:3.12.3-slimor better, pin to a digest:FROM python:3.12.3-slim@sha256:abc123.... - Ignoring BuildKit entirely: The legacy builder does not support cache mounts, multi-platform builds, or external cache backends. If you are still using
DOCKER_BUILDKIT=0or have not configured buildx, you are leaving significant performance on the table. - Running apt-get update and install in separate RUN instructions:
RUN apt-get updategets cached.RUN apt-get install -y curlalso gets cached. Later, when you addgitto the install line, the update layer is still cached with stale package lists, potentially installing outdated or missing packages. Always combine them:RUN apt-get update && apt-get install -y curl git. - Secrets in build args: Build arguments are baked into the image layer metadata. Anyone with
docker historyaccess can see them. Use BuildKit's--mount=type=secretinstead, which mounts secrets as temporary files that never appear in image layers. For a broader look at infrastructure security, see our AWS security checklist for production.
Frequently Asked Questions
How do I force Docker to rebuild everything from scratch?
Use docker build --no-cache -t myapp . to ignore all cached layers. With BuildKit, you can also use docker builder prune to clear the build cache entirely. Note that --no-cache only affects the build cache. It does not re-pull base images. To force a fresh base image pull, add --pull: docker build --no-cache --pull -t myapp .
Why does my COPY layer keep invalidating even though I did not change any files?
Check for files that change outside your control: .git objects, __pycache__ bytecode, .DS_Store files, IDE metadata, or build outputs. Add them to .dockerignore. Also check if your build tool generates files with embedded timestamps. Some tools write a build manifest or version file that differs on every run.
What is the difference between inline cache and registry cache?
Inline cache (--cache-to type=inline) embeds cache metadata into the image itself. It is simple but limited: it only caches layers that appear in the final image, not intermediate build stages. Registry cache (--cache-to type=registry) stores all layers (with mode=max) in a separate cache image. Registry cache is better for multi-stage builds because it caches the builder stages too.
Can I cache Docker builds across different branches?
Yes. With external cache backends (registry, GHA, S3), you can reference cache from any previous build. A common pattern is to always push cache from the main branch, and have feature branches pull cache from main: --cache-from type=registry,ref=myapp:main-cache. Feature branches benefit from the main branch cache for shared layers (base image, dependencies) while building only their changed code.
Does Docker BuildKit cache work with docker-compose?
Yes, but you need to enable BuildKit. Set DOCKER_BUILDKIT=1 in your environment, or add x-bake configuration to your compose file. With Docker Compose v2 (the docker compose plugin, not the standalone docker-compose binary), BuildKit is enabled by default. For cache mounts in your Dockerfile, no compose changes are needed. For external cache backends, use docker buildx bake with a compose file as input.
The Bottom Line
Docker build cache is not magic. It follows simple, deterministic rules: hash the instruction, compare to the cache, rebuild if different, cascade to all subsequent layers. Once you internalize these rules, every cache problem becomes solvable.
Start with layer ordering (it is free and gives the biggest win). Add a proper .dockerignore. Adopt multi-stage builds for non-trivial applications. Enable BuildKit and use cache mounts. For CI/CD, pick a cache backend that matches your infrastructure. And when things go wrong, use --progress=plain to see exactly where the cache breaks.
Faster builds mean faster feedback loops, lower CI costs, and happier engineers. That is worth the 30 minutes it takes to audit and optimize your Dockerfiles.
Related reading: AWS Security Checklist for Production, Exposure Checker, Docker to Compose Converter, ENV Validator, and 70+ more free tools.
Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.