Redis Maxmemory Reached: How to Fix OOM Errors Without Losing Data
Your Redis instance just started rejecting writes with "OOM command not allowed when used memory > maxmemory." Everything downstream is breaking. Here is exactly how to diagnose, fix, and prevent Redis memory exhaustion in production.
TL;DR
Run redis-cli INFO memory to check used_memory vs maxmemory. If maxmemory is 0, Redis will consume all system RAM until the OOM killer strikes. Set maxmemory to 75% of available RAM and choose the right eviction policy: allkeys-lru for cache workloads, noeviction for persistent data stores. Separate your cache, sessions, and full page cache into different Redis instances or databases.
What Happens When Redis Hits Maxmemory
Redis stores everything in RAM. That is its superpower and its biggest operational risk. When Redis runs out of memory, one of three things happens depending on your configuration:
- Write rejection (noeviction policy): Redis returns
OOM command not allowed when used memory > maxmemoryfor every write command. Reads still work, but your application cannot store new data. - Key eviction (LRU/LFU/TTL policies): Redis silently deletes existing keys to make room for new ones. Your application keeps working, but cached data disappears without warning.
- System OOM kill (maxmemory=0): This is the dangerous one. With no limit set, Redis grows until the Linux OOM killer terminates the process entirely. You lose everything in memory.
In my experience running Redis for enterprise Magento platforms, the third scenario is the most common in production incidents. A fresh Redis install defaults to maxmemory 0, which means "use all available memory." On a dedicated Redis server, that might be acceptable temporarily. On a shared server running your application, database, and Redis together, it is a ticking time bomb.
The error message itself varies by client library. In PHP you will see something like RedisException: OOM command not allowed. In Node.js it surfaces as ReplyError: OOM command not allowed when used memory > 'maxmemory'. Python's redis-py raises redis.exceptions.ResponseError. Regardless of the language, the root cause is always the same: Redis has no more memory to allocate.
Step 1: Diagnose Memory Usage
Before you change anything, understand exactly where you stand. Connect to your Redis instance and run the memory info command:
redis-cli INFO memory
Here is what real output looks like on a production server under memory pressure:
# Memory
used_memory:3221225472
used_memory_human:3.00G
used_memory_rss:3758096384
used_memory_rss_human:3.50G
used_memory_peak:3865470566
used_memory_peak_human:3.60G
maxmemory:4294967296
maxmemory_human:4.00G
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.17
mem_allocator:jemalloc-5.2.1
The critical fields to examine are:
- used_memory: Total bytes allocated by Redis. This is the actual data footprint.
- used_memory_rss: Resident Set Size, the actual memory consumed as seen by the operating system. This is always equal to or larger than used_memory.
- maxmemory: The configured memory ceiling. If this is 0, there is no limit.
- maxmemory_policy: What Redis does when it hits the ceiling.
- mem_fragmentation_ratio: The ratio of RSS to used_memory. A healthy value is between 1.0 and 1.5.
If mem_fragmentation_ratio exceeds 1.5, you have a fragmentation problem. Redis allocated and freed many differently-sized objects, leaving holes in memory that cannot be reused efficiently. In extreme cases (ratio above 2.0), you may be wasting half your RAM on fragmentation overhead. The fix is to restart Redis (which compacts memory) or enable activedefrag yes in Redis 4.0 and later.
If used_memory is close to or equal to maxmemory, you are at the threshold. Any new write will trigger either eviction or rejection depending on your policy.
Step 2: Find the Big Keys
Once you know you are running low on memory, the next question is: what is consuming it all? Redis ships with a built-in big key scanner:
redis-cli --bigkeys
This scans the entire keyspace using SCAN internally (so it is safe for production) and reports the largest key of each data type:
# Scanning the entire keyspace to find biggest keys
[00.00%] Biggest string found so far 'session:abc123def456' with 524288 bytes
[12.50%] Biggest hash found so far 'cache:product:14523' with 89 fields
[35.00%] Biggest list found so far 'queue:email:pending' with 145023 items
[67.50%] Biggest set found so far 'fpc:tag:category_42' with 8934 members
-------- summary -------
Sampled 284713 keys in the keyspace!
Total key length in bytes is 11238456 (avg len 39.47)
Biggest string found 'session:abc123def456' has 524288 bytes
Biggest list found 'queue:email:pending' has 145023 items
Biggest set found 'fpc:tag:category_42' has 8934 members
Biggest hash found 'cache:product:14523' has 89 fields
142356 strings with 892145678 bytes (50.00% of keys, avg size 6267.89)
85427 hashes with 234567890 bytes (30.00% of keys, avg size 2745.12)
42513 lists with 123456789 bytes (14.93% of keys, avg size 2904.23)
14417 sets with 45678901 bytes (05.06% of keys, avg size 3168.45)
For a more precise measurement of individual keys, use the MEMORY USAGE command (available in Redis 4.0+):
redis-cli MEMORY USAGE session:abc123def456
# (integer) 524352
redis-cli MEMORY USAGE cache:product:14523
# (integer) 15234
Common memory hogs I have found in production include:
- Serialized session objects: PHP and Node.js session serializers can store entire user state objects. A single session key can balloon to 500KB+ if the application stores cart contents, browsing history, or form state.
- Full page cache entries: Magento FPC stores entire rendered HTML pages. A complex category page can easily be 200KB per cache entry, and with thousands of products across multiple store views, this adds up fast.
- Keys without TTL: The silent killer. Keys that were supposed to be temporary but were never given an expiration. They accumulate over weeks and months until they fill your entire allocation.
- Abandoned queues: Job queues (Sidekiq, Bull, Laravel Horizon) that stopped being consumed but kept receiving items. A single list key with millions of items can consume gigabytes.
Step 3: Choose the Right Eviction Policy
Redis supports eight eviction policies. Choosing the wrong one is responsible for more production incidents than the actual memory limit. Here is what each one does:
- noeviction: Returns errors on write commands when memory is full. No data is ever deleted automatically. Use this for persistent data stores where losing a key is unacceptable.
- allkeys-lru: Evicts the least recently used key across the entire keyspace. The best general-purpose policy for cache workloads. In my experience, this is the correct default for 80% of Redis deployments.
- volatile-lru: Evicts the least recently used key, but only among keys that have a TTL set. Keys without an expiration are never evicted. Useful when you mix persistent and temporary data in the same instance.
- allkeys-lfu: Evicts the least frequently used key (available in Redis 4.0+). Better than LRU when you have a few hot keys that are accessed constantly and many cold keys that should be evicted first.
- volatile-lfu: Same as allkeys-lfu but only considers keys with a TTL.
- volatile-ttl: Evicts keys with the shortest remaining TTL first. Useful when you want keys that are about to expire anyway to be the first to go.
- allkeys-random: Evicts random keys. Simple but unpredictable. Only use this if all keys have equal importance.
- volatile-random: Evicts random keys among those with a TTL set.
To change the eviction policy at runtime without restarting Redis:
# Check current policy
redis-cli CONFIG GET maxmemory-policy
# 1) "maxmemory-policy"
# 2) "noeviction"
# Change to allkeys-lru (takes effect immediately)
redis-cli CONFIG SET maxmemory-policy allkeys-lru
# Persist the change so it survives a restart
redis-cli CONFIG REWRITE
A critical warning: if you are running volatile-lru or volatile-ttl and none of your keys have a TTL set, Redis behaves exactly like noeviction. I have seen this catch teams off guard more than once. They set a volatile policy thinking it would clean up old cache keys, but the application never set TTLs, so Redis filled up and started rejecting writes.
Step 4: Set Proper maxmemory
The rule of thumb is straightforward: set maxmemory to 75% of available RAM on a dedicated Redis server. The remaining 25% is reserved for three things:
- RDB/AOF fork overhead: When Redis performs a background save (BGSAVE) or AOF rewrite, it forks the process. The child process needs memory for copy-on-write pages. Under heavy write load, this can temporarily double memory usage.
- Memory fragmentation: jemalloc's internal overhead typically adds 10-15% on top of used_memory.
- OS and buffer overhead: The kernel needs memory for network buffers, page tables, and file system cache.
Here is how to calculate it for a server with 16 GB of RAM:
# Total RAM: 16 GB
# Reserved for OS and other processes: 2 GB
# Available for Redis: 14 GB
# maxmemory (75% of available): 10.5 GB
redis-cli CONFIG SET maxmemory 10737418240
redis-cli CONFIG REWRITE
If you are running Redis alongside your application on the same server (which I do not recommend for production, but it is common in staging), be more conservative. Allocate no more than 25-30% of total RAM to Redis and monitor the system closely.
For RDB persistence, the fork can temporarily require up to 2x memory. If you have 10 GB of data and trigger a BGSAVE, the server might briefly need 20 GB. If it does not have that, the fork fails with Can't save in background: fork: Cannot allocate memory. You can mitigate this by setting vm.overcommit_memory = 1 in sysctl, but understand the tradeoff: the OOM killer becomes more aggressive when the system actually runs out.
allkeys-lru because losing a cache entry just means a cache miss. Sessions should use noeviction because evicting a session logs out the user unexpectedly. In Kubernetes, deploy separate Redis StatefulSets per function. I run three Redis pods for production Magento: db0 for object cache, db1 for FPC, and db2 for sessions, each with their own maxmemory and eviction policy.
Is Your Infrastructure Exposed?
Redis OOM is just one symptom of misconfigured infrastructure. Run a free exposure scan to check for open ports, leaked configs, and security misconfigurations across your entire domain.
Run Free Exposure CheckStep 5: Key Expiry Strategy
Every cache key should have a TTL. This is the single most effective way to prevent memory exhaustion over time. Keys without expiration accumulate silently, and you will not notice until Redis hits its ceiling weeks or months later.
First, audit your current keyspace to find keys without a TTL. Here is a bash script that uses SCAN (production-safe, non-blocking) to identify offending keys:
#!/bin/bash
# Find Redis keys without TTL (no expiration set)
# Usage: ./audit_redis_ttl.sh [host] [port] [db]
HOST=${1:-127.0.0.1}
PORT=${2:-6379}
DB=${3:-0}
CURSOR=0
NO_TTL_COUNT=0
TOTAL_SCANNED=0
echo "Scanning Redis $HOST:$PORT db$DB for keys without TTL..."
while true; do
RESULT=$(redis-cli -h "$HOST" -p "$PORT" -n "$DB" SCAN "$CURSOR" COUNT 1000)
CURSOR=$(echo "$RESULT" | head -1)
KEYS=$(echo "$RESULT" | tail -n +2)
for KEY in $KEYS; do
TTL=$(redis-cli -h "$HOST" -p "$PORT" -n "$DB" TTL "$KEY")
TOTAL_SCANNED=$((TOTAL_SCANNED + 1))
if [ "$TTL" = "-1" ]; then
SIZE=$(redis-cli -h "$HOST" -p "$PORT" -n "$DB" MEMORY USAGE "$KEY" 2>/dev/null || echo "unknown")
echo "NO_TTL: $KEY (size: $SIZE bytes)"
NO_TTL_COUNT=$((NO_TTL_COUNT + 1))
fi
done
if [ "$CURSOR" = "0" ]; then
break
fi
done
echo ""
echo "Scan complete: $TOTAL_SCANNED keys scanned, $NO_TTL_COUNT keys have no TTL"
Once you have identified keys without TTL, set reasonable expirations based on the data type:
# Set TTL on individual keys
redis-cli EXPIRE session:abc123def456 3600 # 1 hour for sessions
redis-cli EXPIRE cache:product:14523 86400 # 24 hours for product cache
redis-cli EXPIRE fpc:homepage:en_us 7200 # 2 hours for FPC
# Bulk set TTL on keys matching a pattern (use with caution)
redis-cli --scan --pattern "cache:product:*" | while read KEY; do
redis-cli EXPIRE "$KEY" 86400
done
The more important fix is at the application level. Every SET command should include an EX or PX argument. If your application writes keys without TTLs, the audit script is just a bandage. Fix the source.
Step 6: ElastiCache and Managed Redis
If you are running Redis on AWS ElastiCache (or Azure Cache for Redis, or Google Memorystore), the diagnosis and fix are the same, but you get better monitoring out of the box. The key CloudWatch metrics to watch are:
- DatabaseMemoryUsagePercentage: The most important metric. Set an alarm at 70% to give yourself time to react before writes start failing.
- Evictions: The number of keys evicted per period. A sudden spike means memory pressure. A consistent non-zero value means your data set is larger than your allocation.
- CurrConnections: Connection count. A spike after an OOM event often indicates a reconnection storm where every application instance simultaneously reconnects.
- SwapUsage: Should always be 0. If Redis is swapping, performance degrades catastrophically. Scale up immediately.
- ReplicationLag: If you have read replicas, lag increases under memory pressure because the primary cannot fork for RDB transfer.
ElastiCache Serverless (launched in late 2023) automatically scales memory and compute based on demand. You pay per GB-hour of data stored and per ECPU (ElastiCache Compute Unit) consumed. For workloads with variable memory needs, this eliminates the maxmemory problem entirely, but at a cost premium of roughly 2-3x compared to provisioned instances.
For provisioned ElastiCache, scaling up (larger instance type) is the fast fix. Scaling out (adding shards to a cluster-mode-enabled cluster) distributes keys across nodes and is the long-term answer for data sets that keep growing. You can add shards online with zero downtime, but the resharding process itself consumes memory temporarily, so do it before you hit 80%.
Step 7: Magento and WordPress Specific Tuning
Magento 2 is one of the heaviest Redis consumers I have worked with. A typical Magento installation uses Redis for three distinct purposes, each with different requirements:
- Object cache (db0): Stores compiled configuration, EAV attribute data, and layout XML. Moderate size, high read frequency. Eviction policy:
allkeys-lru. - Full page cache / FPC (db1): Stores entire rendered HTML pages. Very large entries (100-300KB each), many keys (one per URL per store view per customer group). Eviction policy:
allkeys-lru. - Sessions (db2): Stores PHP session data for logged-in customers. Smaller entries but critical for user experience. Eviction policy:
noeviction.
The most common misconfiguration I see is running all three in the same Redis database with noeviction. When memory fills up, neither cache nor FPC can write new entries, and your entire site slows to a crawl because every request becomes a cache miss that falls through to MySQL.
Here is the proper Magento 2 env.php Redis configuration with separate databases:
'cache' => [
'frontend' => [
'default' => [
'backend' => 'Magento\\Framework\\Cache\\Backend\\Redis',
'backend_options' => [
'server' => '127.0.0.1',
'port' => '6379',
'database' => '0',
'compress_data' => '1'
]
],
'page_cache' => [
'backend' => 'Magento\\Framework\\Cache\\Backend\\Redis',
'backend_options' => [
'server' => '127.0.0.1',
'port' => '6379',
'database' => '1',
'compress_data' => '0'
]
]
]
],
'session' => [
'save' => 'redis',
'redis' => [
'host' => '127.0.0.1',
'port' => '6379',
'database' => '2',
'max_concurrency' => '20',
'break_after_frontend' => '5',
'break_after_adminhtml' => '30'
]
]
For WordPress, the situation is simpler. Most WordPress Redis setups use a single database for object cache via plugins like Redis Object Cache or W3 Total Cache. The key settings are:
# wp-config.php
define('WP_REDIS_HOST', '127.0.0.1');
define('WP_REDIS_PORT', 6379);
define('WP_REDIS_DATABASE', 0);
define('WP_REDIS_MAXTTL', 86400); // Force max TTL of 24 hours
The WP_REDIS_MAXTTL constant is critical. Without it, some plugins write cache entries with no expiration, leading to the same slow memory leak described earlier.
Eviction Policy Comparison
| Policy | Behavior When Full | Data Safety | Best For | Risk |
|---|---|---|---|---|
noeviction |
Rejects all writes with OOM error | High: no data lost | Sessions, persistent queues | Application errors on write |
allkeys-lru |
Evicts least recently used keys | Low: any key can be evicted | General-purpose cache | Hot keys safe, cold keys lost |
volatile-lru |
Evicts LRU keys with TTL only | Medium: persistent keys safe | Mixed persistent + cache data | Acts like noeviction if no TTLs set |
allkeys-lfu |
Evicts least frequently used keys | Low: any key can be evicted | Cache with hot/cold access patterns | New keys evicted before they become "frequent" |
volatile-ttl |
Evicts keys with shortest remaining TTL | Medium: only expiring keys affected | Time-sensitive cache data | Acts like noeviction if no TTLs set |
allkeys-random |
Evicts random keys | Low: any key can be evicted | Uniform importance keys | Unpredictable, may evict hot keys |
Preventing Future OOM Events
Fixing the immediate OOM is only half the job. Without proper monitoring and guardrails, you will be back here in a few weeks. Here is the checklist I follow for every Redis deployment:
Monitoring and Alerting
- Alert at 70% memory usage: This gives you time to investigate and act before writes start failing. Use
redis_memory_used_bytes / redis_memory_max_bytesin Prometheus, orDatabaseMemoryUsagePercentagein CloudWatch. - Alert on eviction rate: A sudden spike in evictions means your working set exceeds your allocation. Track
evicted_keysfromINFO stats. - Alert on fragmentation ratio: If
mem_fragmentation_ratioexceeds 1.5, schedule a restart during your next maintenance window. - Dashboard the key count: A steadily growing key count with no plateau usually means keys are being written without TTLs.
Key Namespace Conventions
Adopt a consistent key naming pattern so you can quickly identify what is consuming memory:
# Good: namespaced, identifiable, scannable
cache:product:14523
fpc:category:42:store:en_us
session:a1b2c3d4e5f6
queue:email:pending
# Bad: opaque, unscannable
14523
a1b2c3d4e5f6
tmp_data
Namespaced keys let you use SCAN with a pattern to audit memory usage by function. You can instantly answer "how much memory are sessions using?" with redis-cli --scan --pattern "session:*" piped into a size calculation.
Automated TTL Enforcement
Add a cron job or Kubernetes CronJob that runs the TTL audit script weekly. If the count of keys without TTLs grows, investigate which application path is creating them. Prevention at the application layer is always better than cleanup after the fact.
Scan Your Infrastructure for Misconfigurations
Redis OOM errors often accompany other infrastructure issues. Exposed ports, leaked configs, missing security headers. Run a free scan to find what else might be at risk.
Free Infrastructure ScanCommon Mistakes That Cause Redis OOM
After troubleshooting dozens of Redis memory incidents, these are the patterns I see over and over again:
- Leaving maxmemory at 0 in production: The default. Redis will happily consume every byte of RAM on your server. Always set an explicit limit.
- Using noeviction for cache workloads: If your Redis instance is a cache (not a primary data store), use
allkeys-lru. A cache that rejects writes is not a cache at all. - Not separating cache from sessions: When cache eviction deletes a session key, a customer gets logged out. When noeviction protects sessions, cache writes fail. These workloads have fundamentally different requirements and should never share eviction policies.
- Ignoring fragmentation: A fragmentation ratio of 2.0 means you are using twice as much RSS as your actual data requires. Restart Redis or enable activedefrag.
- Running FLUSHALL in production during an incident: The instinct when Redis is full is to flush everything. But FLUSHALL triggers a reconnection storm as every application instance discovers an empty cache simultaneously and hammers your database with uncached queries. Instead, selectively delete the offending keys or let eviction handle it.
- Not monitoring after a fix: You set maxmemory, configured eviction, and moved on. Without alerts, you will not know when the same pattern recurs in three months.
- Using KEYS command in production:
KEYS *blocks the Redis event loop while scanning the entire keyspace. On a database with millions of keys, this can block for seconds, causing timeouts across your entire application. Always useSCANinstead.
Frequently Asked Questions
Can I increase maxmemory without restarting Redis?
Yes. Run redis-cli CONFIG SET maxmemory <bytes> followed by CONFIG REWRITE to persist the change. The new limit takes effect immediately with no downtime. This is the fastest way to relieve an active OOM situation if the server has available RAM.
What is the difference between used_memory and used_memory_rss?
used_memory is the total bytes Redis has allocated for data storage. used_memory_rss is the actual physical memory reported by the operating system, which includes fragmentation overhead, allocator metadata, and memory pages that were allocated but not yet returned to the OS. The ratio between them is mem_fragmentation_ratio. When RSS is significantly higher than used_memory, your Redis instance is suffering from memory fragmentation.
Should I use Redis Cluster or a single instance with more RAM?
For data sets under 25 GB, a single instance with replicas is simpler to operate and performs better (no cross-slot overhead). Above 25 GB, or when you need write throughput that exceeds a single core (Redis is single-threaded for commands), Redis Cluster distributes data across shards. Each shard handles a subset of the 16384 hash slots, so maxmemory applies per shard. If you are on AWS, ElastiCache cluster-mode-enabled handles the sharding automatically.
How do I handle Redis OOM in Kubernetes?
Set resource limits on your Redis pod that match your maxmemory configuration plus 25% overhead. If maxmemory is 4 GB, set the container memory limit to 5 GB. Without this, Kubernetes may OOM-kill the pod even before Redis hits its own limit (because RSS exceeds used_memory). Also set redis.conf values via ConfigMap rather than relying on runtime CONFIG SET, so pods that restart always come up with the correct configuration.
Is it safe to enable activedefrag in production?
Yes, with caveats. Active defragmentation (available since Redis 4.0) runs incrementally in the background and is designed to be production-safe. It consumes a small amount of CPU (configurable via active-defrag-cycle-min and active-defrag-cycle-max). Enable it if your fragmentation ratio consistently exceeds 1.5. The default thresholds (active-defrag-threshold-lower 10, meaning fragmentation above 10%) are reasonable for most workloads. Monitor CPU usage after enabling it, and disable it if you observe latency spikes during peak traffic.
The Bottom Line
Redis maxmemory errors are almost always preventable. Set an explicit maxmemory limit (75% of available RAM), choose the right eviction policy for your workload, ensure every cache key has a TTL, and separate cache from session storage. Monitor memory usage, alert at 70%, and audit your keyspace regularly for keys without expiration. These practices have kept my production Redis instances stable across millions of requests per day.
The worst Redis OOM incidents I have seen were not caused by traffic spikes or sudden load. They were caused by slow, silent memory leaks: keys written without TTLs, growing queues that stopped being consumed, and cache entries that outlived their usefulness. Fix the leak, not just the symptom.
Related reading: Open Ports Security Risks, AWS Security Checklist for Production, Fix Docker Container OOM Killed, Kubernetes Secrets Management, and 70+ free DevOps and security tools.
Usman has 10+ years of experience running Redis at scale for enterprise e-commerce platforms, managing high-traffic Kubernetes clusters, and building zero-knowledge security tools. Read more about the author.