← Back to Blog

PDF Optimization Guide: Reduce Size for Web & Email

Sending a 15 MB PDF when it should be 500 KB is a solved problem - but most people do not know the right levers to pull. This guide covers every optimization technique, from the quick browser based route to Ghostscript command-line control, with specific settings for web delivery, email, and print.

Web vs. Email vs. Print: Different Goals, Different Settings

Before optimizing, know your target. Optimal settings differ significantly:

Use CaseImage DPIJPEG QualityTarget File SizeLinearize?
Web download96–15080%< 1 MBYes
Email attachment15082%< 5 MBNo
Embedded (iframe)9675%< 500 KBYes
Print on demand30095%No limitNo
Archival300100% (lossless)No limitNo

Linearization (also called "fast web view") is a PDF optimization specifically for web serving. A linearized PDF allows the first page to start rendering in the browser before the entire file has downloaded. For multi-page PDFs served on a website, this is a meaningful UX improvement.

The Five Pillars of PDF Optimization

1. Image DPI and Resolution

Images are typically the largest component of a PDF. The key insight: a monitor cannot display more than 96–150 DPI. A scanned document at 600 DPI contains 16x more image data than needed for screen display. Downsampling to 150 DPI reduces image data to roughly 6% of the original without any visible difference when read on a screen.

The formula: resampling to half the DPI reduces image file size to approximately 25% of the original (area scales as the square of the linear dimension). Going from 600 DPI to 150 DPI is a 4x linear reduction, which means roughly 16x less data per image.

2. Image Compression Algorithm

Images inside PDFs can be stored in several formats:

  • Uncompressed (raw bitmap): Maximum quality, enormous size. Rare in practice.
  • Lossless (PNG/ZIP/LZW): No quality loss, good for screenshots and graphics with flat colors. Poor for photographs.
  • JPEG: Lossy, tunable quality. Excellent for photographs. Quality 80–85 is typically indistinguishable from lossless for photographic content.
  • JBIG2: Specialized lossless/near-lossless compression for black-and-white images (scanned text). 5–10x better than standard compression for this use case.
  • JPEG 2000: Better than JPEG at equivalent quality, but slower and not universally supported.

3. Font Subsetting and Embedding

A fully embedded font includes all glyphs in the typeface - often thousands of characters. Subsetting includes only the characters actually present in the document. For Latin-script documents using a fraction of available glyphs, subsetting typically reduces font data by 85–95%.

Example: a 300 KB font embedded in full becomes 15 KB when subsetted for a document using 50 characters. Multiply by 3–4 fonts per document and you save 800 KB+ from fonts alone.

4. Metadata and Hidden Objects

A typical PDF created by Microsoft Word or Adobe Acrobat contains:

  • Embedded page thumbnails (one per page, full-resolution raster image)
  • XMP document metadata (author, title, subject, creation date, modification history)
  • ICC color profiles (often 500 KB+ for "sRGB IEC61966-2.1")
  • Unreferenced objects from incremental saves and edit sessions
  • Document structure tags for accessibility (valuable to keep if the PDF needs to be accessible)

Stripping thumbnails alone can save 500 KB–2 MB on a 20-page document. Removing the ICC color profile saves another 500 KB. These are free reductions with zero visual impact.

5. PDF Version and Feature Compatibility

Targeting an older PDF version (1.4 instead of 1.7) removes support for some modern features but improves compatibility and can reduce file size marginally. For web delivery, PDF 1.4 compatibility is a safe baseline that all current PDF viewers support.

Ghostscript: Full Control in One Command

Ghostscript is the reference implementation for all PDF optimization. It is free, open source, and the engine under the hood of many commercial PDF tools:

# Optimal for web download (150 DPI, JPEG 82%)
gs -sDEVICE=pdfwrite \
   -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dBATCH -dQUIET \
   -sOutputFile=optimized.pdf \
   input.pdf

Understanding PDFSETTINGS Presets

# /screen  = 72 DPI, JPEG 72% - smallest possible, visible artifacts
# /ebook   = 150 DPI, JPEG 82% - recommended for web/email
# /printer = 300 DPI, JPEG 92% - for documents that will be printed
# /prepress = 300 DPI, lossless - for professional print production
# /default = 150 DPI, JPEG 75% - generic balance

Custom Settings for Web Optimization

gs -sDEVICE=pdfwrite \
   -dCompatibilityLevel=1.4 \
   -dNOPAUSE -dBATCH -dQUIET \
   -dDownsampleColorImages=true \
   -dColorImageResolution=150 \
   -dDownsampleGrayImages=true \
   -dGrayImageResolution=150 \
   -dDownsampleMonoImages=true \
   -dMonoImageResolution=300 \
   -dColorImageDownsampleType=/Bicubic \
   -dAutoFilterColorImages=false \
   -dColorImageFilter=/DCTEncode \
   -dJPEGQ=82 \
   -dEmbedAllFonts=true \
   -dSubsetFonts=true \
   -dCompressPages=true \
   -dUseCIEColor=false \
   -dPreserveEPSInfo=false \
   -dPreserveOPIComments=false \
   -sOutputFile=web_optimized.pdf \
   input.pdf

Linearization for Fast Web View

A linearized PDF allows progressive loading: the first page renders immediately while the rest downloads in the background. This is critical for large PDFs served on a website. Without linearization, a browser must download the entire file before rendering any page.

# Linearize with qpdf (install: brew install qpdf or apt install qpdf)
qpdf --linearize input.pdf linearized.pdf

# Linearize + optimize in one pipeline
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dBATCH -dQUIET \
   -sOutputFile=- input.pdf | \
   qpdf --linearize - web_ready.pdf

Verify linearization: qpdf --check web_ready.pdf will show "File is not encrypted" and "File is linearized" if successful.

Step-by-Step: Optimize a PDF for a Website

  1. Check the current state. Open the PDF in Chrome DevTools (right-click the PDF → Inspect) and look at the file size. Or check with: ls -lh document.pdf
  2. Run Ghostscript with /ebook settings. This handles image resampling, JPEG compression, font subsetting, and metadata removal in one pass.
  3. Compare the result. Check file size and open the PDF to visually verify quality at 100% zoom. If quality is acceptable, proceed.
  4. Linearize with qpdf. Add the linearization pass for web serving.
  5. Set Content-Disposition headers. On your web server, serve PDFs with: Content-Disposition: inline (to display in browser) and Content-Type: application/pdf. For Nginx:
location ~* \.pdf$ {
    add_header Content-Type application/pdf;
    add_header Content-Disposition inline;
    add_header Cache-Control "public, max-age=86400";
}

Optimize Your PDF Instantly - No Software Needed

Our free PDF Compressor handles image resampling, JPEG compression, font subsetting, and metadata removal in your browser. No upload, no account required.

Open PDF Compressor

Optimization for Email Attachments

Most email servers enforce attachment size limits of 10–25 MB. Gmail limits individual attachments to 25 MB; Outlook limits to 20 MB. Some corporate mail gateways enforce as low as 5 MB. A well-optimized PDF for email should target under 5 MB to pass all common limits.

Key differences from web optimization:

  • Linearization is not needed - the recipient downloads the full file before opening
  • Slightly higher quality (150 DPI, JPEG 85%) is appropriate since the PDF may be printed by the recipient
  • Preserve document structure tags if accessibility is important
  • Do not remove all metadata - preserve title and author for the recipient's reference

Automated Optimization in a CI/CD Pipeline

For document-heavy applications (report generators, contract systems), automate PDF optimization on every generated file:

# Shell function for automated optimization
optimize_pdf() {
    local input="$1"
    local output="${input%.pdf}_optimized.pdf"

    gs -sDEVICE=pdfwrite \
       -dCompatibilityLevel=1.4 \
       -dPDFSETTINGS=/ebook \
       -dNOPAUSE -dBATCH -dQUIET \
       -sOutputFile="$output" \
       "$input"

    # Linearize for web serving
    qpdf --linearize "$output" "${output%.pdf}_web.pdf"

    echo "Original: $(du -h "$input" | cut -f1)"
    echo "Optimized: $(du -h "${output%.pdf}_web.pdf" | cut -f1)"
}

# Usage
optimize_pdf monthly_report.pdf

Use Our Free Tool

For one-off optimization without any software installation, use our browser based tool. Use our free tool here → securebin.ai/tools/pdf-compressor/. It applies all the image and metadata optimizations described in this guide automatically.

Frequently Asked Questions

What is PDF linearization and do I need it?

Linearization (fast web view) restructures a PDF so the first page's data appears at the beginning of the file. When serving a PDF over HTTP, a linearized PDF starts rendering in the browser immediately, while a non-linearized PDF must fully download first. You need it for PDFs embedded in web pages or linked from high-traffic websites. For email attachments or local documents, linearization provides no benefit.

What is the difference between /ebook and /screen Ghostscript settings?

Both are optimization presets but at different quality levels. /screen targets 72 DPI and JPEG quality ~72 - optimized for viewing on screen at 100% zoom, may look soft when zoomed. /ebook targets 150 DPI and JPEG quality ~82 - looks sharp at normal viewing zoom on screen and acceptable when lightly printed. For most web and email use, /ebook is the right choice. Use /screen only when minimizing file size is the absolute priority.

My PDF has no images but is still large. Why?

Text-only PDFs can be large due to: (1) fully embedded fonts not subsetted, (2) embedded ICC color profiles, (3) document structure/accessibility tags with verbose XML, (4) incremental save bloat from multiple edit sessions. Running through Ghostscript with -dSubsetFonts=true and -dCompressPages=true typically reduces these PDFs by 40–70%.

Does PDF optimization affect text searchability?

No. Text in a PDF is stored as a separate stream from the rendered appearance. Image compression affects only the visual rendering of raster images, not the embedded text data. A PDF that was searchable before optimization remains fully searchable after. The only exception is if you use a tool that rasterizes pages (converts the entire page to an image), which destroys text data.

How do I check if a PDF is already optimized?

Use pdfinfo input.pdf (part of the poppler-utils package) to see the PDF version, page count, and whether it is linearized. To inspect image resolutions: pdfimages -list input.pdf shows every image with its DPI and compression type. If all images are at or below 150 DPI, further image optimization will not help much. If fonts are not subsetted, pdffonts input.pdf will show no in the "emb" column for non-embedded fonts or full embedding without subsetting.

UK
Written by Usman Khan
DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.