UUID Collision Probability: Can Two UUIDs Ever Be the Same?
Every time you generate a UUID, you are drawing a random number from a pool of 5.3 undecillion possible values. The chance of picking the same number twice is so small it borders on theoretical. But "practically impossible" deserves a precise mathematical answer - and there are real scenarios where UUID uniqueness actually breaks down.
The Problem: When Does UUID Uniqueness Actually Matter?
A common interview question and developer anxiety: "If UUIDs are random, couldn't two systems generate the same UUID at the same time?" The short answer is yes - it is possible. The real question is: how probable is it, and is it worth designing around?
This matters practically in systems where a UUID collision would cause a silent data corruption bug rather than an observable error. If two database rows end up with the same UUID primary key and the database enforces uniqueness, you get an error you can handle. If UUIDs are used as file names, session tokens, or keys in a distributed key-value store without uniqueness constraints, a collision could silently overwrite data.
UUID v4: 122 Bits of Randomness
A UUID (Universally Unique Identifier) is a 128 bit value, typically displayed as:
550e8400-e29b-41d4-a716-446655440000
^^^^ version nibble (4 = random)
^^ variant bits (10xx = RFC 4122)
UUID version 4 uses random or pseudo-random numbers to fill 122 of those 128 bits. The remaining 6 bits are fixed: 4 version bits (always 0100 = version 4) and 2 variant bits (always 10 = RFC 4122 variant). This leaves exactly 2122 possible UUID v4 values.
How large is 2122? Let us put it in perspective:
The number of possible UUIDs is approximately 756 million times the number of stars in the observable universe. The scale is genuinely difficult to comprehend.
The Birthday Problem Applied to UUIDs
The collision probability question is a version of the famous birthday problem: in a room of 23 people, there is a 50% chance two share a birthday. With 366 possible birthdays, 23 people is surprisingly few. The math generalizes: with n items drawn from a pool of N possible values, the probability of at least one collision is approximately:
P(collision) ≈ 1 - e^(-n²/2N)
Where:
N = 2^122 = 5.3 × 10^36 (UUID v4 space)
n = number of UUIDs generated
Solving for n at various collision probability thresholds:
P = 0.01% (1 in 10,000) → n ≈ 1.03 × 10^16 UUIDs
P = 0.1% (1 in 1,000) → n ≈ 3.26 × 10^16 UUIDs
P = 1% (1 in 100) → n ≈ 1.03 × 10^17 UUIDs
P = 50% (coin flip) → n ≈ 2.71 × 10^18 UUIDs
To reach a 50% collision probability, you would need to generate 2.71 × 1018 UUIDs - that is 2.71 quintillion. At a rate of 1 million UUIDs per second (a throughput rate far exceeding most production databases), that would take approximately 86,000 years.
Concrete Risk Numbers
For practical database scale, here are the collision probabilities:
- 1,000 UUIDs: ≈ 10-31 probability. Effectively zero.
- 1 million UUIDs: ≈ 10-25 probability. Still effectively zero.
- 1 billion UUIDs: ≈ 10-19 probability. Immeasurably small.
- 1 trillion UUIDs: ≈ 10-13 probability. Still far smaller than hardware failure rates.
For context, the probability of your database server being struck by a meteorite while simultaneously corrupting a UUID is arguably higher than a UUID v4 collision in any database with fewer than 1015 rows.
The engineering rule of thumb: a UUID v4 collision is something you can mathematically rule out for any system that could realistically be built. The practical limit of UUID generation is storage and performance, not collision avoidance.
When UUID Collisions Actually Happen (Real Cases)
Despite the math, UUID collision incidents have occurred in production. They were never caused by random number exhaustion - they were caused by broken random number generators:
Virtualized or containerized environments
Early versions of some hypervisors and container runtimes had bugs where the VM's random number generator was seeded with the same value at startup, especially when cloning a VM image. Two cloned VMs would generate identical sequences of "random" UUIDs. This is a RNG seeding bug, not a UUID space exhaustion problem.
Low-entropy startup environments
Linux's /dev/urandom (used by most UUID libraries) can produce predictable output immediately after a fresh system boot when the entropy pool has not yet been seeded with enough random events. Systems that generate UUIDs within the first second of boot - common in container startup scripts - may generate poor-quality random values.
Forked processes sharing RNG state
In languages that use a user-space PRNG (pseudo-random number generator), if a process generates a UUID and then forks, both the parent and child start with the same RNG state and will generate the same UUID sequence. Python's os.fork() and Ruby's fork can trigger this unless the child explicitly re-seeds the RNG.
Seeded deterministic UUIDs in tests
Test environments that use deterministic UUID generation (seeded with a fixed value for reproducibility) will produce identical UUIDs across test runs. This is intentional in tests but becomes a problem if test data accidentally leaks into a shared database.
Generate Cryptographically Secure UUIDs
Our free UUID Generator uses the browser's crypto.randomUUID() API to produce truly random UUID v4 values - the same source used by operating system UUID generators.
UUID v4 vs Other UUID Versions
Not all UUID versions use random numbers. Understanding the differences matters for collision analysis:
- UUID v1: Based on MAC address + timestamp. Monotonically increasing. Theoretically no collisions if the MAC address is unique, but MAC addresses can be spoofed or shared in VMs, and the timestamp resolution is only 100 nanoseconds - high-frequency generation can repeat timestamps.
- UUID v3: MD5 hash of a namespace + name. Deterministic - the same inputs always produce the same UUID. Not random; collisions are impossible given the same input but inevitable across different inputs that hash identically (MD5 collision attacks exist).
- UUID v4: Random. The standard choice for most applications. Relies entirely on RNG quality.
- UUID v5: SHA-1 hash of a namespace + name. Deterministic, same as v3 but using SHA-1.
- UUID v7: Time-ordered random UUID (RFC 9562). Combines a millisecond timestamp prefix with random bits for sortability. Approximately as collision-resistant as v4 for the random portion, with the added benefit of database index locality.
Should You Add a Uniqueness Check on Top of UUIDs?
Many developers add a database unique constraint on UUID columns, use INSERT ... ON CONFLICT DO NOTHING, or generate a new UUID on constraint violation. Is this necessary?
For production systems, adding a unique constraint on UUID columns is a good practice - not because UUID collisions are likely, but because it catches implementation bugs like the RNG issues described above. A unique constraint is cheap and provides a safety net against the actual sources of UUID collisions (broken RNGs, cloned VMs, forked processes) even though it does not address theoretical random space exhaustion.
Whether to add application-level collision retry logic is overkill for nearly all applications. If you have a unique constraint and a collision occurs (almost certainly from a RNG bug, not random space exhaustion), the right response is to investigate the RNG issue, not to silently retry with a new UUID.
Step-by-Step: Generating UUIDs Correctly
- Use the platform's cryptographic RNG. In Node.js:
crypto.randomUUID(). In Python:import uuid; uuid.uuid4(). In Java:UUID.randomUUID(). All of these use OS-level cryptographic randomness. Avoid UUID libraries that use Math.random() or other non-cryptographic PRNGs. - In containerized environments, ensure entropy is available. Mount
/dev/urandomin containers. Consider usinghavegedorrng-toolson systems with low entropy. - After fork(), re-seed the RNG. In Ruby: call
Random.srandin the child process. In Python: theosmodule handles this automatically since Python 3.9 viaos.register_at_fork. - Add a unique database constraint. Even though collisions are extraordinarily rare, the constraint catches bugs early at no meaningful performance cost.
- Consider UUID v7 for new systems. UUID v7 provides the same collision resistance as v4 but with lexicographic sorting, which improves B-tree index performance for primary keys.
Frequently Asked Questions
Has anyone ever reported a real UUID v4 collision in production?
There are no documented cases of a true random UUID v4 collision - a collision caused by genuine random space exhaustion. Every reported "UUID collision in production" incident that has been investigated turned out to be a broken RNG, VM cloning without re-seeding, or a bug in UUID generation code. The theoretical probability simply does not manifest at any realistic scale.
Is UUID uniqueness guaranteed across different machines and databases?
UUID v4 provides no global coordination - each machine generates independently. This is by design. The probability of two machines independently generating the same UUID v4 is the same as calculated above: negligible under normal circumstances, but dependent on both machines having quality RNG implementations. UUID v1 used MAC addresses to provide machine-specific uniqueness, but MAC addresses are not reliable identifiers in virtualized environments.
Can I use UUID as a primary key in PostgreSQL or MySQL?
Yes, and it is common practice. PostgreSQL has a native uuid type that stores values efficiently as 16 bytes. The main consideration is index fragmentation: random UUID v4 values cause random insert positions in B-tree indexes, which can degrade write performance at very high insert rates. UUID v7 (time-ordered) solves this by making new UUIDs sort after existing ones, similar to auto-increment IDs but globally unique.
What is the difference between a UUID and a ULID?
ULID (Universally Unique Lexicographically Sortable Identifier) is an alternative to UUID that encodes a 48-bit millisecond timestamp followed by 80 bits of randomness. ULIDs are sortable by creation time, URL safe, and case-insensitive. The collision probability for the random portion of a ULID (280 space) is higher than UUID v4 (2122 space) but still negligible for practical use. UUID v7 provides similar sortability with full UUID compatibility.
Does UUID uniqueness hold when merging data from multiple databases?
Yes - this is one of the primary reasons to use UUIDs instead of auto-increment integers as primary keys. When merging data from two databases that use auto-increment IDs, you get conflicts. With UUIDs, each row's identifier was generated independently with negligible collision probability. Database merges, distributed systems, and microservices that need to generate IDs without central coordination all benefit from UUID primary keys.
How do I choose between UUID v4 and UUID v7 for a new project?
Use UUID v7 if your database tables will grow large (millions of rows) and you use the UUID as the primary key. UUID v7's time-ordered prefix dramatically reduces B-tree index fragmentation, which improves write throughput and reduces index maintenance overhead. Use UUID v4 if you need compatibility with existing systems that expect the standard random UUID format, or if sortability is not a concern.
Use our free tool here → UUID Generator to generate cryptographically secure UUID v4 values using your browser's built-in crypto.randomUUID() API - no server required, nothing logged.
Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.