Regex to Match URLs: HTTP, HTTPS and Query Strings (2026)
URLs are everywhere in code - log files, user input, scraped HTML, config files. Writing a regex that reliably matches URLs without false positives or missed matches is one of the most common and most misunderstood tasks in text processing. This guide walks through every layer of URL regex, from simple one-liners to production-grade validators.
Why URL Regex Is Harder Than It Looks
A URL looks simple: a scheme, a domain, and a path. But the actual RFC 3986 specification covers dozens of edge cases: IPv6 addresses, ports, percent-encoded characters, Unicode domains (IDN), fragments with #, query strings with nested parameters, paths with colons, and more. A naive pattern like http://.* will match garbage and miss valid edge cases.
The choice of pattern depends entirely on your use case:
- Extracting URLs from text: You need a greedy match that grabs the longest plausible URL from surrounding prose.
- Validating a URL input field: You need an anchored pattern that rejects anything that is not a well-formed URL.
- Finding URLs in log files: You need a pattern that handles encoded characters and query strings without splitting on commas or quotes.
- Checking for HTTPS specifically: You need a pattern that rejects plain HTTP as a security policy check.
Let us build these patterns from the ground up, starting with the simplest case and adding complexity as needed.
Pattern 1: Simple HTTP/HTTPS Match
The most common starting point. This pattern matches any URL beginning with http:// or https:// and continuing until whitespace:
https?:\/\/[^\s]+
How it works:
https?- matcheshttporhttps(thesis optional):\/\/- literal://(forward slashes are escaped in most regex flavors)[^\s]+- one or more characters that are not whitespace
Matches: https://example.com, http://foo.bar/path?q=1
Does not match: ftp://files.example.com, //example.com (protocol-relative URLs)
This is the right choice for quickly extracting URLs from plain text, logs, or Markdown. The downside is that it will include trailing punctuation like periods, commas, and closing parentheses when the URL appears mid-sentence. For those cases, use a negative lookbehind or post-process the match.
Pattern 2: Full URL with Path, Query, and Fragment
This is the most widely cited production regex for URL matching, originally from Diegoperini's benchmark:
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)
Breaking this down component by component:
https?:\/\/- scheme (http or https)(www\.)?- optionalwww.prefix[-a-zA-Z0-9@:%._+~#=]{1,256}- domain name characters (up to 256 chars)\.[a-zA-Z0-9()]{1,6}- top-level domain (e.g.,.com,.io,.museum)\b- word boundary (prevents matching partial domain names)([-a-zA-Z0-9()@:%_+.~#?&/=]*)- optional path, query string, and fragment
Matches: https://www.example.com/path/to/page?id=42&lang=en#section-3
Does not match: https://localhost (no TLD), https://192.168.1.1 (IP address)
No single regex can perfectly validate every valid URL. For strict validation in user-facing forms, prefer the browser's built-in
URLconstructor or a dedicated library likeis-urlover pure regex.
Pattern 3: Matching IP Address URLs
When your application allows direct IP access (internal tools, API endpoints, admin panels), you need a separate pattern for IP-based URLs:
https?:\/\/(\d{1,3}\.){3}\d{1,3}(:\d{1,5})?(\/[^\s]*)?
This matches URLs like http://192.168.1.100:8080/api/v1. Note that this does not validate that each octet is between 0 and 255 - regex is not the right tool for that check. Use a secondary numeric range validation if you need strict IP validation.
Pattern 4: Protocol-Relative URLs
Protocol-relative URLs (//example.com/path) appear in older HTML and some CDN references. To match both absolute and protocol-relative URLs:
(https?:)?\/\/[^\s/$.?#].[^\s]*
This matches https://example.com, http://example.com, and //example.com.
Pattern 5: Extract Just the Domain
A common task in analytics and link parsing is extracting only the hostname from a full URL. Use a capture group:
https?:\/\/([^\/\s?#]+)
The first capture group (group 1) contains the full hostname including any port. For example, from https://api.example.com:443/v2/users?page=1 you extract api.example.com:443.
To further isolate the registered domain without subdomains, you would need additional logic beyond what regex can cleanly handle.
Step-by-Step: Using URL Regex in Real Code
JavaScript: Extract All URLs from Text
const text = 'Visit https://example.com and also http://docs.site.io/guide for more info.';
const urlRegex = /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)/g;
const matches = text.match(urlRegex);
// matches: ['https://example.com', 'http://docs.site.io/guide']
JavaScript: Validate a Single URL Input
function isValidUrl(str) {
// Prefer the built-in URL constructor for strict validation
try {
const url = new URL(str);
return url.protocol === 'http:' || url.protocol === 'https:';
} catch {
return false;
}
}
// For environments without URL constructor, use anchored regex:
function isValidUrlRegex(str) {
const pattern = /^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)$/i;
return pattern.test(str);
}
Python: Find All HTTPS URLs in a File
import re
with open('server.log', 'r') as f:
content = f.read()
pattern = re.compile(r'https://[^\s\'"><,]+')
urls = pattern.findall(content)
for url in set(urls):
print(url)
sed: Replace All HTTP with HTTPS in a Config File
# Replace http:// with https:// in a file (GNU sed)
sed -i 's|http://|https://|g' config.txt
# More targeted: only replace URLs pointing to example.com
sed -i 's|http://example\.com|https://example.com|g' config.txt
Nginx: Block Requests with Suspicious URL Patterns
# Block requests containing encoded path traversal in the URL
if ($request_uri ~* "(\.\./|\.\.%2F|%2E%2E%2F)") {
return 403;
}
# Only allow specific URL patterns to reach the app
location ~* ^/(api|assets|static)/ {
proxy_pass http://backend;
}
Test Your URL Regex Instantly
Paste your regex and test it against sample URLs in real time. Highlights matches, shows capture groups, and explains what each part of the pattern does. Free, 100% client side.
Open Regex Tester →Common Mistakes with URL Regex
- Not anchoring for validation: Without
^and$, a pattern likehttps?://.*will match"not a url https://example.com more text"as valid. Always anchor validation patterns. - Forgetting the global flag: In JavaScript,
/pattern/withoutgstops after the first match. Use/pattern/gwhen extracting multiple URLs. - Greedy matching swallowing too much: The
.*quantifier is greedy by default.https?://.*\.htmlwill match from the firsthttpto the last.htmlin a string, consuming everything in between. Use[^\s]+instead. - Missing URL encoding: Real-world URLs contain percent-encoded characters like
%20for spaces. Your character class must include%to match these. - Case sensitivity: Protocol names are case-insensitive per RFC.
HTTP://Example.COMis valid. Use the case-insensitive flag (/i) for robustness. - Matching ftp:// and mailto: unintentionally: If your pattern matches only
https?://, you will not accidentally match other schemes. Be explicit about what you want.
URL Regex Performance
URL regex patterns with heavy backtracking can cause catastrophic backtracking (ReDoS) when applied to untrusted input. Patterns like (a+)+ are the classic example, but complex URL patterns with nested quantifiers can also be vulnerable. When processing user-supplied text:
- Set a timeout or use a regex engine with backtracking limits
- Limit input length before applying the pattern
- Prefer possessive quantifiers or atomic groups if your engine supports them
- For server side validation, the
URLconstructor is safer and faster than regex
Quick Reference: URL Regex Patterns
- Simple HTTP/HTTPS:
https?:\/\/[^\s]+ - Full URL with TLD:
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*) - HTTPS only:
https:\/\/[^\s]+ - IP-based URL:
https?:\/\/(\d{1,3}\.){3}\d{1,3}(:\d+)?(\/[^\s]*)? - Extract domain:
https?:\/\/([^\/\s?#]+) - Protocol-relative:
(https?:)?\/\/[^\s/$.?#].[^\s]*
Use our free tool here → Regex Tester
Frequently Asked Questions
What is the best regex to validate a URL in JavaScript?
For most cases, the browser's built-in URL constructor is the best approach: try { new URL(str); return true; } catch { return false; }. It handles all RFC edge cases including IDN, IPv6, and percent encoding. If you must use regex (e.g., in a regex-only context like HTML pattern attribute), use the anchored pattern: ^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)$
How do I match only HTTPS URLs and reject HTTP?
Remove the s? optional character: use https:\/\/ instead of https?:\/\/. This makes https required. You can combine this with a negative lookahead to explicitly reject plain HTTP: (?!http:\/\/)https?:\/\/ though it is cleaner to just hardcode the scheme you want.
How do I extract the query string parameters from a URL using regex?
Use a capture group targeting the query string portion: [?&]([^=&#]+)=([^&#]*). This matches each key-value pair. In JavaScript it is better to parse the query with new URLSearchParams(new URL(str).search) rather than regex.
Why does my URL regex match trailing punctuation like periods and commas?
This happens when your character class includes punctuation or when you use a broad quantifier like [^\s]+. URLs in natural language text are often followed by a period or comma that is not part of the URL. Post-process the match to strip trailing punctuation: match.replace(/[.,;:!?)'"]+$/, '').
Can a regex match localhost URLs?
The pattern https?:\/\/localhost(:\d+)?(\/[^\s]*)? specifically matches localhost with an optional port and path. Standard domain-based patterns reject localhost because it has no TLD. If your app needs to match both localhost and regular domains, use an alternation: https?:\/\/(localhost|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})(:\d+)?(\/[^\s]*)?
Is there a single regex that matches all valid URLs?
No. The full URL specification (RFC 3986) includes IPv6 addresses in brackets, internationalized domain names, unusual but valid path characters, and edge cases that cannot be handled cleanly by a single regex without extreme length and complexity. For production URL validation, always combine a regex pre-check with a proper URL parsing library.
Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.