Regex Lookahead & Lookbehind: Zero-Width Assertions Explained
Lookahead and lookbehind are among the most misunderstood features in regex. They let you match text based on what comes before or after it - without including that context in the match itself. Once you understand zero-width assertions, patterns that seemed impossible become straightforward.
The Problem: Matching Without Consuming
Imagine you have a price list and you want to extract every number that is preceded by a dollar sign. With a plain regex like \d+, you match numbers everywhere - dates, quantities, IDs. You could capture the dollar sign too with \$(\d+), but then your match includes the $ and you have to use a capture group to get just the number.
Lookbehind solves this cleanly: (?<=\$)\d+ matches digits only when preceded by $, and the $ is not included in the match. This is what "zero-width" means - the assertion checks a condition but consumes zero characters from the input string.
The Four Lookaround Types
There are four lookaround assertions, organized by direction (ahead/behind) and polarity (positive/negative):
Assertion Syntax Meaning
----------------------------------------------------------------------
Positive lookahead (?=...) Match if followed by ...
Negative lookahead (?!...) Match if NOT followed by ...
Positive lookbehind (?<=...) Match if preceded by ...
Negative lookbehind (?<!...) Match if NOT preceded by ...
Positive Lookahead: (?=...)
A positive lookahead asserts that what immediately follows the current position matches the lookahead pattern. The lookahead pattern is checked but the characters are not consumed.
Pattern: \d+(?= dollars)
Input: "100 dollars, 50 euros, 200 dollars"
Matches: "100", "200" (the numbers followed by " dollars")
"50" does NOT match (followed by " euros")
Notice the match is just the number - " dollars" is not included, even though it was required by the pattern. This is useful when you need to find a word or number based on what follows it, but do not want to include the context in your result.
// JavaScript: find all words before a colon
const str = "name: Alice, age: 30, city: London";
const keys = str.match(/\w+(?=:)/g);
// ["name", "age", "city"]
Negative Lookahead: (?!...)
A negative lookahead asserts that what follows does NOT match the pattern. This is very useful for exclusion patterns.
Pattern: \bfoo(?!bar)\b
Input: "foo foobar foobaz foo"
Matches: "foo" (first), "foo" (last)
Does NOT match: "foo" in "foobar" or "foobaz"
Real-world use: match filenames that are not .min.js files:
Pattern: \w+(?!\.min)\.js
Input: "app.js", "vendor.min.js", "utils.js"
Matches: "app.js", "utils.js"
Positive Lookbehind: (?<=...)
A positive lookbehind asserts that the text immediately before the current position matches the lookbehind pattern. Like lookahead, it consumes zero characters.
Pattern: (?<=\$)\d+(\.\d{2})?
Input: "$99.99 and €45.00 and $12"
Matches: "99.99", "12" (numbers preceded by $, not €)
// Python: extract values after "Error: "
import re
log = "Warning: low disk. Error: file not found. Error: timeout"
errors = re.findall(r'(?<=Error: )\w[\w ]+', log)
# ["file not found", "timeout"]
Lookbehind patterns must be fixed-width in most regex engines (JavaScript, Python, PCRE). Variable-width lookbehind (e.g., (?<=a+)) is only supported in .NET and some newer engines. This is the most common pitfall when writing lookbehinds.
Negative Lookbehind: (?<!...)
A negative lookbehind asserts that the preceding text does NOT match the pattern. This is useful for avoiding double-processing or excluding specific prefixes.
Pattern: (?<!\$)\d+
Input: "$100 and 200 and $300"
Matches: "200" (only the number NOT preceded by $)
// Match "http" links but not "https"
Pattern: (?<!https:)//[a-zA-Z0-9.-]+
// Matches http:// URLs but not https://
Test Lookahead & Lookbehind Patterns Live
Build and debug your lookaround patterns with instant match highlighting. Completely free, runs in your browser.
Open Regex Tester →Step-by-Step: Password Validation With Lookaheads
Password validation is the classic use case for multiple lookaheads. Instead of one complex pattern that matches the whole structure, you stack independent lookahead assertions that each check one requirement. The engine checks all assertions from the same starting position without advancing.
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Let us break it down assertion by assertion:
^ Start of string
(?=.*[a-z]) Must contain at least one lowercase letter
(?=.*[A-Z]) Must contain at least one uppercase letter
(?=.*\d) Must contain at least one digit
(?=.*[@$!%*?&]) Must contain at least one special character
[A-Za-z\d@$!%*?&]{8,} The actual characters: 8+ from the allowed set
$ End of string
Each (?=.*X) lookahead starts at position 0 and checks anywhere in the string for character X. The final character class [...]{8,} then consumes the actual match and enforces minimum length. This pattern is readable, maintainable, and easy to extend: add (?=.*[0-9]{2}) to require two digits, for example.
Combining Lookaheads and Lookbehinds
You can combine multiple lookaround assertions on the same position. This is one of the most powerful regex techniques:
// Match a word that is preceded by ">" and followed by "<"
// (content inside an HTML tag, simplified)
Pattern: (?<=>)[^<]+(?=<)
Input: "<p>Hello, world</p>"
Match: "Hello, world"
// Extract numbers between $X and some unit
Pattern: (?<=\$)\d+(?= each)
Input: "$5 each, $10 per dozen, $3 each"
Matches: "5", "3"
// Split a camelCase string into words using lookahead
Pattern: (?=[A-Z])
Input: "camelCaseString"
Split result: ["camel", "Case", "String"]
// JavaScript
"camelCaseString".split(/(?=[A-Z])/)
// ["camel", "Case", "String"]
Browser and Engine Compatibility
Lookaheads are universally supported. Lookbehinds have more variation:
- JavaScript: Lookbehind added in ES2018. Supported in all modern browsers (Chrome 62+, Firefox 78+, Safari 16.4+). Not available in IE.
- Python: Both directions supported but lookbehind must be fixed-width.
- PCRE (PHP, nginx, Apache): Fixed-width lookbehind only.
- .NET: Variable-width lookbehind supported.
- Java: Fixed-width lookbehind.
- Go (regexp package): No lookahead or lookbehind support at all - use a different approach.
Frequently Asked Questions
Why are they called "zero-width" assertions?
Because they match a position in the string, not a span of characters. A regular pattern like \d{3} consumes three characters when it matches, advancing the engine's position by three. A lookahead or lookbehind consumes zero characters - the engine's position after a successful assertion is unchanged. This is why you can have multiple lookaheads at the same position checking different conditions independently.
What is the difference between lookahead and a capture group?
A capture group (\d+) consumes its characters and stores them. A lookahead (?=\d+) checks the condition without consuming or storing anything. If you need to extract the matched text, use a capture group. If you just need to verify a condition without affecting what is matched, use lookahead. Often you use them together: (?<=prefix)(\w+)(?=suffix) captures the word between a prefix and suffix.
Why does my lookbehind fail with "variable length lookbehind" error?
Most regex engines require lookbehind patterns to match a fixed number of characters. (?<=https?://) fails because https? can match 4 or 5 characters. The workaround: use two separate lookbehinds combined with alternation: (?<=https://)|(?<=http://). Or restructure the pattern to use a fixed-width lookbehind followed by an optional element outside: (?<=http)s?://\K in PCRE (using \K which resets the match start).
Can lookaheads match across multiple lines?
Yes, when the s (dotall) flag is set, . matches newlines inside lookaheads too. Without the flag, you can use [\s\S] instead of . inside the lookahead to match any character including newlines: (?=[\s\S]*end) asserts that "end" appears anywhere later in the string.
Is there a performance cost to lookaheads?
Yes, especially complex lookaheads with .* inside them. Each (?=.*[A-Z]) scans forward through the remaining string. Stacking five such assertions (as in password validation) means five linear scans per position. For input strings up to a few hundred characters, this is negligible. For matching against very long strings or in tight loops processing millions of records, consider splitting the checks into separate simple patterns for better performance.
The Bottom Line
Lookahead and lookbehind assertions let you add context conditions to a regex match without consuming that context. Positive variants require the condition to be present; negative variants require it to be absent. The critical practical rule: lookbehinds must be fixed-width in most engines.
The most common real-world applications are: password validation (stacked positive lookaheads), extracting values with specific surrounding delimiters (lookbehind + pattern + lookahead), and exclusion patterns (negative lookahead to avoid false positives).
Use our free tool here → Regex Tester to build and test lookahead and lookbehind patterns with instant feedback. Paste your pattern and input and see exactly which positions match.
Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.