Regular expressions (regex) are one of the most powerful tools in programming—and one of the most intimidating. That cryptic string of symbols like `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$` might look like line noise, but it's actually a precise pattern that matches email addresses. Once you understand the building blocks, regex becomes an indispensable skill for text processing, validation, and data extraction.
Key Takeaways
- 1Regex is a pattern language for matching text—master ~20 symbols and you can read most patterns
- 2Basic building blocks: . (any char), \d (digit), \w (word char), \s (whitespace), ^ (start), $ (end)
- 3Quantifiers specify repetition: * (0+), + (1+), ? (0-1), {n} (exactly n), {n,m} (n to m)
- 4Character classes [abc] match any single character in the set; [^abc] matches anything NOT in the set
- 5Lookaheads (?=...) and lookbehinds (?<=...) match positions without consuming characters
1What Are Regular Expressions?
- **Validation** – Check if input matches expected formats (email, phone, password)
- **Search** – Find all occurrences of a pattern in text
- **Extract** – Pull specific data from unstructured text
- **Replace** – Find-and-replace with pattern matching
- **Split** – Break text into parts based on patterns
Scenario
Find all words starting with "un" in a document
Solution
Pattern: \bun\w+ matches "under", "unless", "unfortunately", but not "sun" or "running". The \b ensures word boundary, "un" is literal, and \w+ matches remaining word characters.
2Basic Building Blocks
| Symbol | Meaning | Example | Matches |
|---|---|---|---|
| . | Any single character (except newline) | h.t | hat, hot, hit, h9t |
| \d | Any digit (0-9) | \d\d\d | 123, 456, 007 |
| \w | Any word character (a-z, A-Z, 0-9, _) | \w+ | hello, user_123 |
| \s | Any whitespace (space, tab, newline) | hello\sworld | hello world |
| ^ | Start of string/line | ^Hello | Hello at line start |
| $ | End of string/line | end$ | the end |
| \b | Word boundary | \bcat\b | cat (not cats or scatter) |
3Quantifiers: How Many?
| Quantifier | Meaning | Example | Matches |
|---|---|---|---|
| * | Zero or more | ab*c | ac, abc, abbc, abbbc |
| + | One or more | ab+c | abc, abbc, abbbc (not ac) |
| ? | Zero or one (optional) | colou?r | color, colour |
| {n} | Exactly n times | \d{3} | 123, 456 (exactly 3 digits) |
| {n,} | n or more times | \d{2,} | 12, 123, 1234 (2+ digits) |
| {n,m} | Between n and m times | \d{2,4} | 12, 123, 1234 (2-4 digits) |
Scenario
Match US phone numbers like 555-123-4567
Solution
Pattern: \d{3}-\d{3}-\d{4} — Three digits, dash, three digits, dash, four digits. Add optional country code: (\+1-)?\d{3}-\d{3}-\d{4}
4Character Classes: Matching Sets
| Pattern | Meaning | Matches |
|---|---|---|
| [abc] | Any of a, b, or c | a, b, c |
| [a-z] | Any lowercase letter | a through z |
| [A-Z] | Any uppercase letter | A through Z |
| [0-9] | Any digit (same as \d) | 0 through 9 |
| [a-zA-Z] | Any letter | Any letter |
| [^abc] | NOT a, b, or c | Anything except a, b, c |
| [aeiou] | Any vowel | a, e, i, o, u |
Scenario
Match CSS hex colors like #FF5733 or #abc
Solution
Pattern: #[0-9A-Fa-f]{3,6} — Hash followed by 3-6 hex characters. More precise: #([0-9A-Fa-f]{6}|[0-9A-Fa-f]{3}) for exactly 3 or 6.
5Groups and Alternatives
| Syntax | Purpose | Example | Matches |
|---|---|---|---|
| (abc) | Capture group | (ha)+ | ha, haha, hahaha |
| (?:abc) | Non-capturing group | (?:ha)+ | Same, but doesn't capture |
| a|b | Alternation (OR) | cat|dog | cat or dog |
| (a|b)c | Grouped alternation | (gray|grey) | gray or grey |
Scenario
Match common image files: photo.jpg, image.png, graphic.gif
Solution
Pattern: \w+\.(jpg|jpeg|png|gif|webp) — Word characters, literal dot, then any of the listed extensions. Case-insensitive flag (/i) handles JPG vs jpg.
Common Regex Patterns
| Use Case | Pattern | Notes |
|---|---|---|
| Email (simple) | ^[\w.-]+@[\w.-]+\.\w{2,}$ | Basic validation; not RFC-compliant |
| URL | https?://[\w.-]+(/\S*)? | Matches http/https URLs |
| Phone (US) | \(?\d{3}\)?[-. ]?\d{3}[-. ]?\d{4} | Flexible separators |
| Date (YYYY-MM-DD) | \d{4}-\d{2}-\d{2} | Doesn't validate ranges |
| IP Address | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} | Doesn't validate 0-255 |
| Alphanumeric | ^[a-zA-Z0-9]+$ | Letters and numbers only |
| Strong Password | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$ | Uses lookaheads |
7Lookaheads and Lookbehinds
| Syntax | Name | Meaning | Example |
|---|---|---|---|
| (?=...) | Positive lookahead | Followed by... | \d(?=px) matches 5 in "5px" |
| (?!...) | Negative lookahead | NOT followed by... | \d(?!px) matches 5 in "5em" |
| (?<=...) | Positive lookbehind | Preceded by... | (?<=\$)\d+ matches 100 in "$100" |
| (?<!...) | Negative lookbehind | NOT preceded by... | (?<!\$)\d+ matches 100 in "€100" |
Scenario
Verify password has at least one uppercase, lowercase, and digit
Solution
Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$ — Three lookaheads check requirements without consuming text, then .{8,} ensures minimum length.
8Regex Best Practices
Regex Best Practices
Start simple and iterate
Begin with a basic pattern and add complexity as needed. Test frequently with real data.
Use a regex tester
Tools like our Regex Tester show matches in real-time and explain what each part does.
Comment complex patterns
Many languages support "verbose" mode where you can add comments. Or explain the pattern in a code comment.
Be specific when possible
Prefer [a-z] over . when you only want letters. Overly broad patterns match unintended text.
Consider edge cases
Empty strings, very long inputs, special characters. Test with unusual data.
Avoid catastrophic backtracking
Patterns like (a+)+ on long strings can take exponential time. Use atomic groups or possessive quantifiers if available.
Test Your Regex Patterns
Use our free Regex Tester to experiment with patterns, see real-time matches, and understand what each part of your expression does.
Open Regex Tester