← Back to Blog

Regular Expressions (Regex) Guide for Beginners — With Examples

Regular expressions are one of the most powerful tools in a developer's toolkit. They look intimidating at first, but once you understand the building blocks, they become indispensable. This guide takes you from zero to confident with practical examples.

What Is a Regular Expression?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. It is used to match, find, and manipulate text. Think of it as a mini programming language specifically designed for pattern matching.

Regex is supported in virtually every programming language (JavaScript, Python, Java, Go, PHP, Ruby), in command-line tools (grep, sed, awk), in text editors (VS Code, Sublime Text, Vim), and in databases (MySQL, PostgreSQL). Learning regex once gives you a skill that works everywhere.

Basic Patterns: Literal Characters

The simplest regex is just a literal string. The pattern hello matches the text "hello" anywhere it appears:

Pattern: hello
Matches: "say hello world" (matches "hello")
No match: "HELLO" (regex is case-sensitive by default)

Most characters match themselves literally. The exceptions are special characters (metacharacters) that have special meaning in regex: . * + ? ^ $ { } [ ] ( ) | \. To match these literally, you need to escape them with a backslash: \. matches a literal period.

The Dot: Match Any Character

The . (dot) matches any single character except a newline:

Pattern: h.t
Matches: "hat", "hot", "hit", "h9t", "h@t"
No match: "ht" (dot requires exactly one character)

Character Classes

Square brackets [] define a character class — a set of characters, any one of which can match:

Pattern: [aeiou]       -- matches any single vowel
Pattern: [0-9]         -- matches any digit (0 through 9)
Pattern: [a-zA-Z]      -- matches any letter (upper or lower)
Pattern: [^0-9]        -- matches any character that is NOT a digit

The ^ inside brackets negates the class. Outside brackets, ^ has a different meaning (see Anchors below).

Shorthand Character Classes

Regex provides shorthand notation for common character classes:

  • \d — Any digit. Equivalent to [0-9]
  • \D — Any non-digit. Equivalent to [^0-9]
  • \w — Any word character (letter, digit, underscore). Equivalent to [a-zA-Z0-9_]
  • \W — Any non-word character
  • \s — Any whitespace (space, tab, newline)
  • \S — Any non-whitespace

Quantifiers: How Many Times

Quantifiers specify how many times the preceding element must occur:

  • * — Zero or more times: ab*c matches "ac", "abc", "abbc", "abbbc"
  • + — One or more times: ab+c matches "abc", "abbc" but NOT "ac"
  • ? — Zero or one time (optional): colou?r matches "color" and "colour"
  • {n} — Exactly n times: \d{4} matches exactly 4 digits
  • {n,} — n or more times: \d{2,} matches 2 or more digits
  • {n,m} — Between n and m times: \d{2,4} matches 2, 3, or 4 digits

By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible):

Greedy:  <.*>  on "<b>bold</b>" matches "<b>bold</b>" (entire string)
Lazy:    <.*?> on "<b>bold</b>" matches "<b>" (first tag only)

Anchors: Position in the String

Anchors do not match characters — they match positions:

  • ^ — Start of string (or start of line with multiline flag)
  • $ — End of string (or end of line with multiline flag)
  • \b — Word boundary (between a word character and a non-word character)
Pattern: ^Hello       -- matches "Hello world" but not "Say Hello"
Pattern: world$       -- matches "Hello world" but not "world peace"
Pattern: \bcat\b      -- matches "the cat sat" but not "caterpillar"

Groups and Capturing

Parentheses () create groups that serve two purposes: grouping elements together and capturing matched text for later use.

Pattern: (ab)+        -- matches "ab", "abab", "ababab" (group repeated)
Pattern: (cat|dog)    -- matches "cat" or "dog" (alternation)
Pattern: (\d{3})-(\d{4})  -- captures area code and number separately

Non-Capturing Groups

Use (?:...) when you need grouping but do not need to capture the match:

Pattern: (?:http|https)://   -- groups the protocol but does not capture it

Named Groups

Some regex flavors support naming captured groups for clarity:

# Python / JavaScript
Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Match "2026-03-18" captures: year=2026, month=03, day=18

Lookaheads and Lookbehinds

Lookaheads and lookbehinds assert that a pattern exists (or does not exist) at a position without including it in the match:

  • (?=...) — Positive lookahead: matches if followed by the pattern
  • (?!...) — Negative lookahead: matches if NOT followed by the pattern
  • (?<=...) — Positive lookbehind: matches if preceded by the pattern
  • (?<!...) — Negative lookbehind: matches if NOT preceded by the pattern
# Match a number followed by "px" but don't include "px" in the match
Pattern: \d+(?=px)
Text: "width: 100px" -- matches "100"

# Match a dollar amount (number preceded by $)
Pattern: (?<=\$)\d+\.?\d*
Text: "Price: $19.99" -- matches "19.99"

Test Your Regex Patterns

Use our free Regex Tester to build and test patterns in real time with instant highlighting and match details.

Open Regex Tester

Common Real-World Patterns

Email Address (Basic)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This matches most common email formats. Note that fully RFC 5322-compliant email validation with regex is extremely complex — for production use, validate format loosely and confirm via verification email.

URL

https?://[^\s/$.?#].[^\s]*

Phone Number (US)

^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches formats like: 555-123-4567, (555) 123-4567, +1 555.123.4567

IPv4 Address

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requires at least 8 characters, one uppercase, one lowercase, one digit, and one special character. This uses multiple positive lookaheads at the start of the string.

HTML Tag

<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>

The \1 is a backreference that matches the same text captured by the first group. This ensures the closing tag matches the opening tag.

Regex in JavaScript

// Test if a string matches
const pattern = /^\d{3}-\d{4}$/;
pattern.test("555-1234");  // true

// Find matches
const text = "Call 555-1234 or 555-5678";
const matches = text.match(/\d{3}-\d{4}/g);
// ["555-1234", "555-5678"]

// Replace with regex
const cleaned = "  extra   spaces  ".replace(/\s+/g, " ").trim();
// "extra spaces"

// Named groups (ES2018+)
const dateMatch = "2026-03-18".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(dateMatch.groups.year);  // "2026"

Regex in Python

import re

# Search for a pattern
match = re.search(r'\d{3}-\d{4}', 'Call 555-1234')
if match:
    print(match.group())  # "555-1234"

# Find all matches
matches = re.findall(r'\d{3}-\d{4}', 'Call 555-1234 or 555-5678')
# ['555-1234', '555-5678']

# Replace
cleaned = re.sub(r'\s+', ' ', '  extra   spaces  ').strip()
# "extra spaces"

# Compile for reuse (faster in loops)
pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
pattern.match('user@example.com')  # Match object

Regex in grep

# Basic grep (BRE - Basic Regular Expressions)
grep "error" /var/log/syslog

# Extended regex (-E flag)
grep -E "error|warning|critical" /var/log/syslog

# Case-insensitive
grep -i "error" /var/log/syslog

# Show line numbers
grep -n "TODO" *.js

# Perl-compatible regex (-P flag, GNU grep)
grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" access.log

# Invert match (show lines that do NOT match)
grep -v "^#" config.txt    # skip comment lines

# Count matches
grep -c "404" access.log

Flags / Modifiers

Flags change how the regex engine behaves:

  • gGlobal: Find all matches, not just the first
  • iCase-insensitive: /hello/i matches "Hello", "HELLO", "hElLo"
  • mMultiline: ^ and $ match start/end of each line, not just the string
  • sDotall: . matches newline characters too
  • uUnicode: Treat the pattern and subject as Unicode
// JavaScript: flags go after the closing slash
const regex = /hello world/gi;

# Python: flags passed as argument
re.findall(r'hello', text, re.IGNORECASE | re.MULTILINE)

Common Mistakes

  • Forgetting to escape special characters: . matches ANY character. Use \. for a literal period. 192.168.1.1 as a regex also matches "192x168y1z1".
  • Greedy by default: .* matches as much as possible. Use .*? for lazy matching when extracting content between delimiters.
  • Not anchoring: \d{3} matches any 3 consecutive digits anywhere, including inside "12345". Use ^\d{3}$ to match exactly 3 digits.
  • Catastrophic backtracking: Patterns like (a+)+ can cause exponential runtime on certain inputs. Avoid nested quantifiers on overlapping patterns.
  • Overusing regex: For complex parsing (HTML, JSON, CSV), use a proper parser. Regex is great for patterns, not for structured data formats.

The Bottom Line

Regular expressions are a fundamental skill for developers. Start with the basics — literal characters, character classes, quantifiers, and anchors — and build up to groups and lookaheads as you need them. The best way to learn is by doing: use a regex tester to experiment with patterns in real time.

For more developer tools, check out our Diff Checker, Word Counter, and JSON Formatter — all free, all running in your browser.