Regular Expressions (Regex) Guide for Beginners — With Examples
Regular expressions are one of the most powerful tools in a developer's toolkit. They look intimidating at first, but once you understand the building blocks, they become indispensable. This guide takes you from zero to confident with practical examples.
What Is a Regular Expression?
A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. It is used to match, find, and manipulate text. Think of it as a mini programming language specifically designed for pattern matching.
Regex is supported in virtually every programming language (JavaScript, Python, Java, Go, PHP, Ruby), in command-line tools (grep, sed, awk), in text editors (VS Code, Sublime Text, Vim), and in databases (MySQL, PostgreSQL). Learning regex once gives you a skill that works everywhere.
Basic Patterns: Literal Characters
The simplest regex is just a literal string. The pattern hello matches the text "hello" anywhere it appears:
Pattern: hello
Matches: "say hello world" (matches "hello")
No match: "HELLO" (regex is case-sensitive by default)
Most characters match themselves literally. The exceptions are special characters (metacharacters) that have special meaning in regex: . * + ? ^ $ { } [ ] ( ) | \. To match these literally, you need to escape them with a backslash: \. matches a literal period.
The Dot: Match Any Character
The . (dot) matches any single character except a newline:
Pattern: h.t
Matches: "hat", "hot", "hit", "h9t", "h@t"
No match: "ht" (dot requires exactly one character)
Character Classes
Square brackets [] define a character class — a set of characters, any one of which can match:
Pattern: [aeiou] -- matches any single vowel
Pattern: [0-9] -- matches any digit (0 through 9)
Pattern: [a-zA-Z] -- matches any letter (upper or lower)
Pattern: [^0-9] -- matches any character that is NOT a digit
The ^ inside brackets negates the class. Outside brackets, ^ has a different meaning (see Anchors below).
Shorthand Character Classes
Regex provides shorthand notation for common character classes:
\d— Any digit. Equivalent to[0-9]\D— Any non-digit. Equivalent to[^0-9]\w— Any word character (letter, digit, underscore). Equivalent to[a-zA-Z0-9_]\W— Any non-word character\s— Any whitespace (space, tab, newline)\S— Any non-whitespace
Quantifiers: How Many Times
Quantifiers specify how many times the preceding element must occur:
*— Zero or more times:ab*cmatches "ac", "abc", "abbc", "abbbc"+— One or more times:ab+cmatches "abc", "abbc" but NOT "ac"?— Zero or one time (optional):colou?rmatches "color" and "colour"{n}— Exactly n times:\d{4}matches exactly 4 digits{n,}— n or more times:\d{2,}matches 2 or more digits{n,m}— Between n and m times:\d{2,4}matches 2, 3, or 4 digits
By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible):
Greedy: <.*> on "<b>bold</b>" matches "<b>bold</b>" (entire string)
Lazy: <.*?> on "<b>bold</b>" matches "<b>" (first tag only)
Anchors: Position in the String
Anchors do not match characters — they match positions:
^— Start of string (or start of line with multiline flag)$— End of string (or end of line with multiline flag)\b— Word boundary (between a word character and a non-word character)
Pattern: ^Hello -- matches "Hello world" but not "Say Hello"
Pattern: world$ -- matches "Hello world" but not "world peace"
Pattern: \bcat\b -- matches "the cat sat" but not "caterpillar"
Groups and Capturing
Parentheses () create groups that serve two purposes: grouping elements together and capturing matched text for later use.
Pattern: (ab)+ -- matches "ab", "abab", "ababab" (group repeated)
Pattern: (cat|dog) -- matches "cat" or "dog" (alternation)
Pattern: (\d{3})-(\d{4}) -- captures area code and number separately
Non-Capturing Groups
Use (?:...) when you need grouping but do not need to capture the match:
Pattern: (?:http|https):// -- groups the protocol but does not capture it
Named Groups
Some regex flavors support naming captured groups for clarity:
# Python / JavaScript
Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Match "2026-03-18" captures: year=2026, month=03, day=18
Lookaheads and Lookbehinds
Lookaheads and lookbehinds assert that a pattern exists (or does not exist) at a position without including it in the match:
(?=...)— Positive lookahead: matches if followed by the pattern(?!...)— Negative lookahead: matches if NOT followed by the pattern(?<=...)— Positive lookbehind: matches if preceded by the pattern(?<!...)— Negative lookbehind: matches if NOT preceded by the pattern
# Match a number followed by "px" but don't include "px" in the match
Pattern: \d+(?=px)
Text: "width: 100px" -- matches "100"
# Match a dollar amount (number preceded by $)
Pattern: (?<=\$)\d+\.?\d*
Text: "Price: $19.99" -- matches "19.99"
Test Your Regex Patterns
Use our free Regex Tester to build and test patterns in real time with instant highlighting and match details.
Open Regex TesterCommon Real-World Patterns
Email Address (Basic)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This matches most common email formats. Note that fully RFC 5322-compliant email validation with regex is extremely complex — for production use, validate format loosely and confirm via verification email.
URL
https?://[^\s/$.?#].[^\s]*
Phone Number (US)
^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Matches formats like: 555-123-4567, (555) 123-4567, +1 555.123.4567
IPv4 Address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Strong Password
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requires at least 8 characters, one uppercase, one lowercase, one digit, and one special character. This uses multiple positive lookaheads at the start of the string.
HTML Tag
<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>
The \1 is a backreference that matches the same text captured by the first group. This ensures the closing tag matches the opening tag.
Regex in JavaScript
// Test if a string matches
const pattern = /^\d{3}-\d{4}$/;
pattern.test("555-1234"); // true
// Find matches
const text = "Call 555-1234 or 555-5678";
const matches = text.match(/\d{3}-\d{4}/g);
// ["555-1234", "555-5678"]
// Replace with regex
const cleaned = " extra spaces ".replace(/\s+/g, " ").trim();
// "extra spaces"
// Named groups (ES2018+)
const dateMatch = "2026-03-18".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(dateMatch.groups.year); // "2026"
Regex in Python
import re
# Search for a pattern
match = re.search(r'\d{3}-\d{4}', 'Call 555-1234')
if match:
print(match.group()) # "555-1234"
# Find all matches
matches = re.findall(r'\d{3}-\d{4}', 'Call 555-1234 or 555-5678')
# ['555-1234', '555-5678']
# Replace
cleaned = re.sub(r'\s+', ' ', ' extra spaces ').strip()
# "extra spaces"
# Compile for reuse (faster in loops)
pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
pattern.match('user@example.com') # Match object
Regex in grep
# Basic grep (BRE - Basic Regular Expressions)
grep "error" /var/log/syslog
# Extended regex (-E flag)
grep -E "error|warning|critical" /var/log/syslog
# Case-insensitive
grep -i "error" /var/log/syslog
# Show line numbers
grep -n "TODO" *.js
# Perl-compatible regex (-P flag, GNU grep)
grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" access.log
# Invert match (show lines that do NOT match)
grep -v "^#" config.txt # skip comment lines
# Count matches
grep -c "404" access.log
Flags / Modifiers
Flags change how the regex engine behaves:
g— Global: Find all matches, not just the firsti— Case-insensitive:/hello/imatches "Hello", "HELLO", "hElLo"m— Multiline:^and$match start/end of each line, not just the strings— Dotall:.matches newline characters toou— Unicode: Treat the pattern and subject as Unicode
// JavaScript: flags go after the closing slash
const regex = /hello world/gi;
# Python: flags passed as argument
re.findall(r'hello', text, re.IGNORECASE | re.MULTILINE)
Common Mistakes
- Forgetting to escape special characters:
.matches ANY character. Use\.for a literal period.192.168.1.1as a regex also matches "192x168y1z1". - Greedy by default:
.*matches as much as possible. Use.*?for lazy matching when extracting content between delimiters. - Not anchoring:
\d{3}matches any 3 consecutive digits anywhere, including inside "12345". Use^\d{3}$to match exactly 3 digits. - Catastrophic backtracking: Patterns like
(a+)+can cause exponential runtime on certain inputs. Avoid nested quantifiers on overlapping patterns. - Overusing regex: For complex parsing (HTML, JSON, CSV), use a proper parser. Regex is great for patterns, not for structured data formats.
The Bottom Line
Regular expressions are a fundamental skill for developers. Start with the basics — literal characters, character classes, quantifiers, and anchors — and build up to groups and lookaheads as you need them. The best way to learn is by doing: use a regex tester to experiment with patterns in real time.
For more developer tools, check out our Diff Checker, Word Counter, and JSON Formatter — all free, all running in your browser.