DevOps2026-04-1414 min read

Linux Disk Space Full: Find and Fix in 60 Seconds

Q: Is it safe to run docker system prune -a on a production server?

It depends. The -a flag removes ALL unused images, not just dangling ones. On production, this means any image not currently used by a running container gets deleted. If you need to quickly roll back to a previous image version, it will not be available locally and will need to be pulled again. On production, consider using docker system prune without the -a flag first, which only removes dangling images, stopped containers, and unused networks. Always verify what will be removed with docker system df before pruning.

Your server just threw a "No space left on device" error. Deployments are failing, logs stopped writing, and your application is down. You have about 60 seconds before someone notices. Here is exactly what to do.

TL;DR

df -h shows which mount is full. du -sh /* | sort -rh | head -20 finds the biggest directory. Common culprits: /var/log, Docker images, old kernels, and /tmp. Scroll to the specific section that matches your situation, or read the full guide for a systematic approach.

Step 1: Identify Which Disk Is Full

Before you start deleting anything, you need to know exactly which filesystem ran out of space. The df command is your starting point.

df -h

This shows all mounted filesystems with human readable sizes. You are looking for any row where Use% is at or near 100%. Here is what typical output looks like:

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       50G   49G  512M  99% /
/dev/xvdb       200G   45G  155G  23% /data
tmpfs           7.8G     0  7.8G   0% /dev/shm

In this example, the root filesystem / is at 99%. That is the one we need to fix.

But here is something most tutorials skip: sometimes the disk has plenty of space but you still get "No space left on device." That means you have run out of inodes, not disk space. Check inodes separately:

df -ih

If the IUse% column is at 100% on any filesystem, jump to Step 9: Inode Exhaustion. Otherwise, keep reading.

Step 2: Find the Biggest Directories

Now that you know which filesystem is full, you need to drill down and find where the space went. Start at the root and work your way in:

du -sh /* 2>/dev/null | sort -rh | head -20

The 2>/dev/null suppresses permission errors from virtual filesystems like /proc and /sys. This gives you the top 20 largest directories at the root level.

Typical output might look like this:

15G     /var
8.2G    /usr
6.1G    /home
4.3G    /opt
2.1G    /tmp

Once you spot the culprit (usually /var), drill deeper:

du -sh /var/* 2>/dev/null | sort -rh | head -10

Then keep drilling. /var/log is the number one offender in my experience. After that, Docker storage under /var/lib/docker comes in second. Keep narrowing until you find the specific files or directories eating your space.

Step 3: Clean Log Files

Log files are the most common reason Linux disks fill up. A busy web server, a chatty application, or a misconfigured log level can produce gigabytes of logs per day. Here is the important part: do not delete active log files with rm. If a process still has the file open, the space will not be freed. Instead, truncate them:

# Truncate a log file (frees space immediately, process keeps writing)
truncate -s 0 /var/log/syslog

# Or use this if truncate is not available
> /var/log/syslog

The difference matters. Deleting a file with rm removes the directory entry, but if a process has the file open, the file descriptor still points to the old inode. The space stays allocated until that process closes the file or gets restarted. Truncating sets the file size to zero while keeping the same inode, so the process can keep writing without issues.

Set Up Logrotate

After you have freed the immediate space, set up logrotate to prevent this from happening again. Create or edit a logrotate config:

# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    create 0640 www-data www-data
    sharedscripts
    postrotate
        systemctl reload myapp > /dev/null 2>&1 || true
    endscript
}

This rotates logs daily, keeps 7 days of history, and compresses old files. The delaycompress directive waits one rotation cycle before compressing, which avoids issues with processes that still have the previous log open.

Clean Systemd Journal

On systemd based systems, the journal can quietly grow to several gigabytes. Check its size and clean it:

# Check journal size
journalctl --disk-usage

# Keep only the last 500MB
journalctl --vacuum-size=500M

# Or keep only the last 7 days
journalctl --vacuum-time=7d

To make this permanent, edit /etc/systemd/journald.conf and set:

SystemMaxUse=500M

Then restart the journal service: systemctl restart systemd-journald

Step 4: Clean Docker

If you run Docker, this is probably where your disk space went. Docker is notoriously aggressive about accumulating images, containers, volumes, and build cache. A single build heavy project can easily consume 20 to 50 GB.

# See exactly how much Docker is using
docker system df

This breaks down usage by images, containers, volumes, and build cache. Now clean it up:

# Remove stopped containers, unused networks, dangling images, and build cache
docker system prune

# Nuclear option: also remove ALL unused images (not just dangling)
docker system prune -a

# Remove unused volumes too (WARNING: data loss risk)
docker system prune -a --volumes

Be careful with --volumes on production servers. This deletes any volume not currently attached to a running container. If you have database volumes from stopped containers, that data is gone.

For a more targeted approach, clean specific categories:

# Remove dangling images only
docker image prune

# Remove images older than 24 hours
docker image prune -a --filter "until=24h"

# Remove build cache
docker builder prune

# Remove stopped containers
docker container prune

For a deeper dive into Docker disk management, check out our Docker Build Cache Invalidation Guide.

Step 5: Clean Old Kernels

On Ubuntu and Debian systems, old kernel packages can accumulate over time. Each kernel version takes around 200 to 300 MB, and if you have never cleaned them up, you might have 10 or more old versions sitting on disk. This is especially common on long running servers that get regular apt upgrades.

# List all installed kernel packages
dpkg -l | grep linux-image

# See which kernel you are currently running (DO NOT remove this one)
uname -r

# Remove old kernels automatically
sudo apt autoremove --purge

# If apt autoremove does not clean enough, manually remove specific old kernels
sudo apt remove --purge linux-image-5.4.0-42-generic

On RHEL and CentOS systems:

# List installed kernels
rpm -q kernel

# Keep only the 2 most recent kernels
sudo package-cleanup --oldkernels --count=2

Never remove the currently running kernel. Always verify with uname -r before removing any kernel package. If you accidentally remove the running kernel and reboot, the system will not come back up.

Step 6: Find Large Files

Sometimes the culprit is not logs or packages but a random large file someone left behind. Maybe a database dump, a core dump, a tar archive, or a forgotten backup. Find them all:

# Find files larger than 100MB
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -k5 -rh

# Find files larger than 1GB
find / -type f -size +1G -exec ls -lh {} \; 2>/dev/null

# Find the 20 largest files on the system
find / -type f -exec du -h {} + 2>/dev/null | sort -rh | head -20

Common places where large forgotten files hide:

/tmp and /var/tmp - temporary files that never got cleaned up
/root - database dumps and backups an admin forgot about
/home - user downloads, logs redirected to files, core dumps
/opt - old application releases that were never removed
/var/cache - package manager cache files

Step 7: Clean Package Manager Cache

Every time you install or upgrade a package, the package manager downloads the .deb or .rpm file and caches it locally. Over time, this cache grows significantly.

# Debian/Ubuntu: see cache size, then clean it
du -sh /var/cache/apt/archives/
sudo apt clean

# RHEL/CentOS: clean all cached packages
sudo yum clean all
# or on newer systems
sudo dnf clean all

# Clear pip cache
pip cache purge

# Clear npm cache
npm cache clean --force

On a server that has been running for a year or more, apt clean alone can free 1 to 5 GB. It is completely safe because it only removes cached package files. The installed packages themselves are untouched.

Step 8: Handle Deleted Files Still Held Open

This is the one that catches people off guard. You deleted a 10 GB log file with rm, but df still shows the disk as full. What happened?

The file is gone from the directory listing, but the process that was writing to it still has a file descriptor open. The kernel will not release the disk space until that file descriptor is closed. Find these phantom files:

# Find deleted files still held open by processes
lsof +L1

This lists all files with a link count less than 1 (meaning deleted from the filesystem but still open). The output shows you the process name, PID, and the size of the deleted file.

You have two options to reclaim the space:

# Option 1: Restart the process (safest)
sudo systemctl restart apache2

# Option 2: Truncate the file descriptor (if you cannot restart)
# Find the fd number from lsof output, then:
sudo truncate -s 0 /proc/PID/fd/FD_NUMBER

Option 2 is a neat trick for production situations where you cannot restart the service. It truncates the open file descriptor directly, freeing the space immediately without affecting the running process.

Step 9: Inode Exhaustion

If df -ih shows 100% inode usage, you have a different problem. You are not out of disk space. You are out of inodes. Every file and directory on the filesystem uses one inode, and the total number of inodes is fixed when the filesystem is created.

This typically happens when an application creates millions of tiny files. Common culprits include PHP session files, mail queue directories, cache directories, and temporary files from crashed processes.

# Check inode usage
df -ih

# Find directories with the most files
find / -xdev -type d -exec sh -c 'echo "$(find "$1" -maxdepth 1 -type f | wc -l) $1"' _ {} \; 2>/dev/null | sort -rn | head -20

That second command takes a while to run because it counts files in every directory. A faster approach is to check the usual suspects directly:

# Count files in common problem directories
find /tmp -type f | wc -l
find /var/spool -type f | wc -l
find /var/lib/php/sessions -type f | wc -l
find /var/cache -type f | wc -l

Once you find the directory with millions of files, clean it up. But be careful with rm. If you try rm -rf /path/to/millions/of/files/*, the shell will try to expand the glob first and you will get an "Argument list too long" error. Instead, use find:

# Delete files older than 7 days in the problem directory
find /var/lib/php/sessions -type f -mtime +7 -delete

# Or delete all files in batches
find /tmp/problem-dir -type f -print0 | xargs -0 rm -f

Pro Tip: ncdu Is the Best Interactive Disk Usage Tool

If you want a faster, more visual way to hunt down disk space hogs, ncdu is hands down the best tool for the job. Think of it as a terminal based version of WinDirStat or Disk Inventory X.

# Install it
sudo apt install ncdu      # Debian/Ubuntu
sudo yum install ncdu      # RHEL/CentOS

# Run it on the root filesystem
ncdu /

It scans the filesystem and presents an interactive, sorted view of disk usage. Navigate with arrow keys, press Enter to drill into directories, and press d to delete files or directories on the spot. Press q to quit.

What makes ncdu so good is that it gives you the same information as chaining du and sort commands, but you can explore interactively without running a new command for every directory. On a 50 GB filesystem, the initial scan takes about 10 to 15 seconds, and after that, navigation is instant.

For servers where you cannot install packages (locked down environments), you can download a static binary from the ncdu website and run it without installation.

Disk Cleanup Command Comparison

Here is a quick reference table for the most common cleanup operations, how much space they typically free, and the risk level of each:

Target	Command	Typical Space Freed	Risk
Log files	`truncate -s 0 /var/log/*.log`	1 - 20 GB	Low
Journal logs	`journalctl --vacuum-size=500M`	500 MB - 5 GB	Low
APT cache	`apt clean`	1 - 5 GB	Low
Old kernels	`apt autoremove --purge`	500 MB - 3 GB	Medium
Docker (safe)	`docker system prune`	2 - 10 GB	Low
Docker (aggressive)	`docker system prune -a --volumes`	10 - 50 GB	High
/tmp files	`find /tmp -mtime +7 -delete`	100 MB - 5 GB	Low
npm/pip cache	`npm cache clean --force`	200 MB - 2 GB	Low
Core dumps	`find / -name "core.*" -delete`	1 - 10 GB	Low

Prevention: Stop It From Happening Again

Fixing a full disk is the easy part. Making sure it never happens again is where the real work is. Here is what I set up on every server I manage:

Set Up Disk Alerts at 80%

You want to know about disk space issues when the disk hits 80%, not 100%. At 80%, you have time to investigate and clean up. At 100%, you are in firefighting mode.

If you are on AWS, set up a CloudWatch alarm on the disk_used_percent metric. For self hosted servers, a simple cron job works:

# /etc/cron.d/disk-alert
*/15 * * * * root df -h / | awk 'NR==2{gsub(/%/,""); if($5>80) system("echo Disk at "$5"% | mail -s DISK_WARNING ops@example.com")}'

For production environments, use proper monitoring with Prometheus + node_exporter and set up alerting rules:

# Prometheus alert rule
- alert: DiskSpaceLow
  expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 20
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Disk space below 20% on {{ $labels.instance }}"

Automate Log Rotation

Every application that writes logs should have a logrotate config. No exceptions. If you are deploying a new service, adding a logrotate config should be part of your deployment checklist. Set sensible defaults: rotate daily, keep 7 to 14 days, compress old files.

Schedule Docker Cleanup

If you run Docker, set up a weekly prune job:

# /etc/cron.weekly/docker-cleanup
#!/bin/bash
docker system prune -f --filter "until=168h" 2>/dev/null
docker image prune -f --filter "until=168h" 2>/dev/null

This removes images, containers, and networks that have not been used in the last 7 days (168 hours). Safe for most setups.

Expand the Volume

Sometimes cleanup is just buying time. If your application legitimately needs more space, expand the volume. On AWS, EBS volumes can be expanded online without downtime:

# After expanding the EBS volume in the AWS console:
sudo growpart /dev/xvda 1
sudo resize2fs /dev/xvda1   # for ext4
# or
sudo xfs_growfs /            # for xfs

No reboot required. The filesystem grows while the server is running and serving traffic.

Common Mistakes to Avoid

I have seen every one of these mistakes in production. Learn from other people's outages:

Running rm -rf on the wrong directory. Always double check the path before pressing Enter. Use ls first to verify what you are about to delete. And never run rm -rf / with a space before the path. rm -rf / var/log is very different from rm -rf /var/log.
Deleting active log files with rm instead of truncating. As covered in Step 3, this does not actually free space if a process has the file open. Use truncate -s 0 instead.
Not checking for inode exhaustion. If df -h shows space available but you still get "No space left on device," you almost certainly have an inode problem. Always run df -ih as part of your diagnosis.
Ignoring /tmp on long running servers. The /tmp directory is supposed to be temporary, but on servers that never reboot, it accumulates files forever. Set up systemd-tmpfiles-clean or a cron job to clean old temp files.
Running docker system prune -a --volumes on production. This deletes all unused volumes, including database data from stopped containers. On production, use docker system prune (without -a or --volumes) first and check what will be removed with docker system df -v.
Cleaning up without finding the root cause. If you freed 10 GB today, but the same application fills it up again tomorrow, you have not fixed anything. Always investigate why the space was consumed and put a permanent fix in place (log rotation, cron cleanup, volume expansion).

Check What Your Server Is Exposing

While you are fixing disk issues, make sure your server is not exposing sensitive files to the internet. Run a quick exposure check to find .env files, open directories, and misconfigurations.

Run Free Exposure Check

Frequently Asked Questions

Why does df show disk full but I cannot find the large files?

This usually means a process is holding a deleted file open. When you delete a file that a running process still has open, the disk space is not actually freed until that process releases the file handle. Run lsof +L1 to find these phantom files. You can either restart the process or, if safe, truncate the file descriptor under /proc/PID/fd/. See Step 8 for the full walkthrough.

What is inode exhaustion and how do I fix it?

Inodes are data structures that store metadata about files. Each filesystem has a fixed number of inodes set at creation time. If you create millions of tiny files (common with session files, mail queues, or cache directories), you can run out of inodes even though you have plenty of disk space. Run df -ih to check inode usage. Fix it by finding and removing the directories with excessive small files using find /path -type f | wc -l to count files per directory.

Is it safe to run docker system prune -a on a production server?

It depends. The -a flag removes ALL unused images, not just dangling ones. On production, this means any image not currently used by a running container gets deleted. If you need to quickly roll back to a previous image version, it will not be available locally and will need to be pulled again. On production, consider using docker system prune without the -a flag first, which only removes dangling images, stopped containers, and unused networks. Always verify what will be removed with docker system df before pruning.

The Bottom Line

A full disk is one of the most common and most preventable server issues. The fix is almost always the same: check df -h, drill down with du, clean the biggest offender, and set up monitoring so it does not happen again. The whole process takes about 60 seconds once you know what to look for.

The commands in this guide will handle 95% of disk full situations you will ever encounter. Bookmark it, share it with your team, and save yourself the 3 AM panic next time a disk fills up.

Written by Usman Khan

DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.