DevOps March 26, 2026 10 min read

Docker Multi Stage Builds: Cut Image Size by 90% (2026 Guide)

Build tools, compilers, and dev dependencies have no place in production containers. Multi-stage builds let you compile inside Docker and ship only what the runtime actually needs - shrinking images from gigabytes to megabytes.

By Usman Khan

DevOps & Cybersecurity Engineer · MS Cybersecurity, MS CS · CEH · AWS Certified · 10+ years securing enterprise infrastructure. Editorial standards

The Problem: Bloated Production Images

Here is a scenario most backend engineers have encountered. You write a Node.js API, create a Dockerfile that copies your source code and runs npm install, and push the result to a container registry. The image is 1.4 GB. The actual application logic is maybe 5 MB of JavaScript. The rest is Node.js internals, every dev dependency from your package.json, TypeScript, ESLint, test frameworks, and the full Linux userland of node:20.

Large images cause real operational pain:

Slow CI/CD pipelines: A 1.4 GB image takes 3–5 minutes to push and pull. A 150 MB image takes under 30 seconds.
Higher attack surface: Every package in your image is a potential CVE vector. Build tools like gcc and make have no reason to exist in a production container.
Increased registry costs: AWS ECR, Docker Hub, and GCR all charge for storage and data transfer. Multiply a 1 GB image by hundreds of builds per month and the cost adds up.
Slower Kubernetes scaling: When a pod needs to start on a new node, the node must pull the image. A slim image starts in seconds; a bloated image starts in minutes.

The solution is Docker multi-stage builds, a feature available since Docker 17.05 that lets you use multiple FROM statements in a single Dockerfile. Each FROM starts a new build stage. You can copy artifacts from earlier stages into later ones, and only the final stage ends up in the published image.

How Multi-Stage Builds Work

The core idea is to separate your build environment from your runtime environment. In a single-stage build, everything used to compile or bundle your code stays in the final image. In a multi-stage build, you throw away the build environment and only carry over the compiled output.

The syntax uses named stages with AS <name> and copies files between stages with COPY --from=<name>:

# Stage 1: give this stage a name
FROM node:20-alpine AS builder
WORKDIR /app
# ... build steps here ...

# Stage 2: start fresh from a minimal base
FROM node:20-alpine
WORKDIR /app
# Copy ONLY the compiled output from stage 1
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/server.js"]

Docker builds all stages but only the final stage image gets tagged and pushed. The intermediate stages exist only during the build process and are discarded.

Step-by-Step: Node.js TypeScript API

This is the most common use case. A TypeScript application needs the TypeScript compiler at build time but only needs the compiled JavaScript at runtime.

# ---- Stage 1: Build ----
FROM node:20-alpine AS builder
WORKDIR /app

# Copy dependency manifests first (layer cache optimization)
COPY package.json package-lock.json ./
RUN npm ci

# Copy source and compile TypeScript to JavaScript
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
# At this point /app/dist contains compiled JS
# /app/node_modules contains ALL deps including devDeps

# ---- Stage 2: Production Runtime ----
FROM node:20-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production

# Only install production dependencies
COPY package.json package-lock.json ./
RUN npm ci --omit=dev

# Copy compiled output from builder stage
COPY --from=builder /app/dist ./dist

# Run as non-root user for security
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

Typical size reduction for this pattern:

Approach	Base image	Image size
Single-stage (all deps)	node:20	~1.4 GB
Single-stage (alpine)	node:20-alpine	~420 MB
Multi-stage (prod deps only)	node:20-alpine	~150 MB
Multi-stage (distroless)	gcr.io/distroless/nodejs20	~95 MB

Step-by-Step: Go Application

Go is where multi-stage builds shine the brightest. The Go toolchain (compiler, standard library sources, build cache) is around 350 MB. But a compiled Go binary is completely self-contained - it has no runtime dependency and can run in a container with nothing but the binary itself.

# ---- Stage 1: Build ----
FROM golang:1.22-alpine AS builder
WORKDIR /app

# Download dependencies first (cached unless go.mod changes)
COPY go.mod go.sum ./
RUN go mod download

# Build a statically linked binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build \
    -ldflags="-w -s" \
    -o /app/server \
    ./cmd/server

# ---- Stage 2: Minimal runtime ----
# scratch is a completely empty base image - literally nothing
FROM scratch
COPY --from=builder /app/server /server

# Copy TLS certificates so HTTPS calls work
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

EXPOSE 8080
ENTRYPOINT ["/server"]

The result: a Go service that was 380 MB in a full Go image compiles down to an image that is 8–15 MB - just the binary and TLS certs. No shell, no package manager, no attack surface.

The -ldflags="-w -s" flags strip debug symbols and the symbol table from the binary, reducing its size by 20–40% with no impact on runtime performance.

Step-by-Step: Python Django Application

Python images are trickier because Python itself must be present at runtime. The gains come from separating the compilation of native extension packages (which needs gcc and build headers) from the runtime.

# ---- Stage 1: Build wheels ----
FROM python:3.12-slim AS builder
WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# Build binary wheels for all dependencies
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

# ---- Stage 2: Runtime ----
FROM python:3.12-slim AS runtime
WORKDIR /app

# Install only the pre-built wheels (no gcc needed)
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-index --find-links /wheels /wheels/*.whl \
    && rm -rf /wheels

# Copy application source
COPY . .

# Non-root user
RUN useradd -m appuser && chown -R appuser /app
USER appuser

EXPOSE 8000
CMD ["gunicorn", "myproject.wsgi:application", "--bind", "0.0.0.0:8000"]

This pattern eliminates gcc, make, and all the -dev header packages from the production image while still allowing native extensions like psycopg2 and Pillow to compile correctly.

Convert Docker Run Commands to Compose

Paste any docker run command and get a production-ready docker-compose.yml instantly. Free, runs entirely in your browser.

Open Docker to Compose Tool →

Advanced Techniques

Targeting a Specific Stage

You can build only up to a specific stage using --target. This is useful for running tests in CI without building the production image:

# Run tests in the builder stage without creating the production image
docker build --target builder -t myapp:test .
docker run --rm myapp:test npm test

# Build the full production image separately
docker build --target runtime -t myapp:prod .

Caching Dependencies Efficiently

Layer caching is Docker's most powerful optimization. The key rule: copy dependency manifests before source code. Docker only re-runs a layer if its inputs change. If you copy package.json first and npm install second, Docker re-uses the cached install layer on every build where only your source changed.

# GOOD: dependencies cached separately from source
COPY package.json package-lock.json ./
RUN npm ci                          # cached unless package-lock.json changes
COPY src/ ./src/                    # source changes don't bust the npm cache
RUN npm run build

# BAD: any source change invalidates the npm install cache
COPY . .
RUN npm ci
RUN npm run build

Using BuildKit Cache Mounts

Docker BuildKit (enabled by default since Docker 23) supports cache mounts that persist between builds on the same machine, dramatically speeding up dependency installs:

# syntax=docker/dockerfile:1
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci
COPY . .
RUN npm run build

Passing Build Arguments Between Stages

Build arguments (ARG) do not automatically carry between stages. Re-declare them in each stage that needs them:

ARG APP_VERSION=latest

FROM node:20-alpine AS builder
ARG APP_VERSION          # re-declare to use in this stage
RUN echo "Building version $APP_VERSION"

FROM node:20-alpine AS runtime
ARG APP_VERSION          # re-declare again
ENV APP_VERSION=$APP_VERSION

Choosing the Right Base Image

The base image for your final stage has a large impact on image size and security posture:

alpine: ~5 MB base, uses musl libc. Excellent default for most applications. Some native extensions need patches.
slim variants: Debian-based, ~30 MB, broader compatibility with native packages than alpine.
distroless (Google): No shell, no package manager, no OS utilities. Minimal attack surface. Harder to debug.
scratch: Empty image. Only viable for statically compiled binaries (Go, Rust).

For most teams, alpine or a language-specific -alpine variant gives the best tradeoff between image size, compatibility, and debuggability.

Security Benefits

Smaller images are also more secure images. Every tool present in a container is a potential lateral movement vector if an attacker gains code execution. Multi-stage builds naturally enforce the principle of least privilege at the filesystem level:

No compiler or linker means an attacker cannot compile new binaries inside the container
No package manager (apt, apk) means an attacker cannot install new tools
No shell in distroless images means many common exploit techniques do not apply
Fewer packages means fewer CVEs reported by vulnerability scanners like Trivy or Snyk

Running container vulnerability scans (trivy image myapp:prod) on multi-stage builds typically shows 80–95% fewer HIGH/CRITICAL CVEs compared to naive single-stage builds based on full OS images.

Frequently Asked Questions

Does multi-stage build slow down the Docker build?

No - with proper layer caching, multi-stage builds are typically the same speed or faster. The builder stage can be cached between runs. The additional COPY --from step adds milliseconds. The net effect is usually a faster overall pipeline because pushing and pulling the smaller final image is significantly faster.

Can I use multi-stage builds with Docker Compose?

Yes. Docker Compose builds the final stage by default. You can specify a target stage in your compose.yml:

services:
  api:
    build:
      context: .
      dockerfile: Dockerfile
      target: runtime   # build only up to this stage

You can also use our Docker to Compose tool to generate a Compose file from your run commands.

What is the difference between COPY --from and ADD?

COPY --from=<stage> copies files from another build stage. ADD is an older instruction that can also fetch remote URLs and auto-extract archives. For copying between stages, always use COPY --from. Avoid ADD in general - its implicit behaviors are a common source of confusion.

How do I debug a multi-stage build when the final image has no shell?

Build to an intermediate stage that does have a shell: docker build --target builder -t debug . and then docker run -it debug sh. For distroless production images, use docker debug myapp:prod (Docker Desktop feature) or ephemeral debug containers in Kubernetes: kubectl debug -it <pod> --image=busybox --target=app.

Do multi-stage builds work with GitHub Actions and CI/CD?

Yes, and they work particularly well. Use --cache-from with registry-based caching to reuse layers across CI runs:

docker buildx build \
  --cache-from type=registry,ref=myrepo/myapp:cache \
  --cache-to type=registry,ref=myrepo/myapp:cache,mode=max \
  --target runtime \
  -t myrepo/myapp:latest \
  --push .

Should I always use multi-stage builds?

For any image that goes to a registry or runs in production: yes. The only scenario where single-stage makes sense is a local development-only container where image size does not matter and you want the full toolchain available for debugging. Even then, Docker Compose overrides and volume mounts are usually a better approach for dev environments.

The Bottom Line

Multi-stage builds are one of the highest-impact Docker best practices available. They require no external tools, no additional infrastructure, and no changes to your application code. A single Dockerfile refactor can reduce image size by 80%+, cut CI/CD pipeline time, lower registry costs, and significantly improve your container security posture.

The pattern is always the same: use a fat image to build, copy the output to a slim image to run. Start with the Node.js or Go examples above and adapt them to your stack.

Use our free tool here →

Convert any docker run command into a proper docker-compose.yml file. Supports ports, volumes, env vars, networks, and more. No signup required.

Open Docker to Compose Tool

Written by Usman Khan

DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.