Regex in Python

Python's re module provides Perl-style regex support. Patterns are typically written as raw strings (r'pattern') to avoid backslash escaping issues. Python 3.11+ added atomic groups and possessive quantifiers.

Code Examples

Basic match and search

import re

# search() finds the first match anywhere in the string
match = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today')
if match:
    print(match.group())  # "555-1234"

# match() only matches at the beginning of the string
result = re.match(r'\d+', '123abc')
print(result.group())  # "123"

re.search() scans the entire string for a match. re.match() only checks from the start. Both return a Match object or None.

Find all matches

import re

text = "Emails: alice@example.com, bob@test.org"
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.]+', text)
print(emails)  # ['alice@example.com', 'bob@test.org']

# finditer() returns Match objects with position info
for m in re.finditer(r'\w+@\w+', text):
    print(f"{m.group()} at position {m.start()}-{m.end()}")

findall() returns a list of matched strings. finditer() returns an iterator of Match objects — use it when you need match positions or groups.

Named groups and groupdict()

import re

pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, 'Date: 2026-03-08')

print(match.group('year'))   # "2026"
print(match.groupdict())     # {'year': '2026', 'month': '03', 'day': '08'}

Python named groups use (?P<name>...) syntax (not the (?<name>...) syntax used in JavaScript). groupdict() returns all named groups as a dictionary.

Substitution with re.sub()

import re

# Simple replacement
result = re.sub(r'\bfoo\b', 'bar', 'foo is not foobar')
print(result)  # "bar is not foobar"

# Replacement with a function
def censor(match):
    return '*' * len(match.group())

print(re.sub(r'\b\w{4,}\b', censor, 'hide long words'))
# "**** **** *****"

re.sub() replaces matches. The replacement can be a string (with \\1, \\g<name> backreferences) or a callable that receives each Match object.

Compile patterns for reuse

import re

# Compile once, reuse many times
ip_pattern = re.compile(
    r'(\d{1,3}\.){3}\d{1,3}'
)

logs = ["192.168.1.1 - GET /", "10.0.0.5 - POST /api"]
for line in logs:
    match = ip_pattern.search(line)
    if match:
        print(match.group())

re.compile() pre-compiles a pattern into a reusable regex object. This avoids recompiling on every call and makes code cleaner when using the same pattern repeatedly.

Verbose mode for readable patterns

import re

email_pattern = re.compile(r"""
    ^[\w.+-]+       # local part
    @                # @ separator
    [\w-]+          # domain name
    \.              # dot
    [\w.]+$         # TLD (may contain dots)
""", re.VERBOSE | re.IGNORECASE)

print(email_pattern.match("user@example.com"))  # Match object

The re.VERBOSE (re.X) flag lets you write multi-line patterns with comments. Whitespace is ignored unless escaped or inside a character class.

Note

Always use raw strings (r'...') for regex patterns in Python — without them, \b means backspace instead of a word boundary. Python 3.11 added atomic groups (?>...) and possessive quantifiers (++, *+, ?+). The regex module (pip install regex) provides additional features like fuzzy matching, Unicode categories, and variable-length lookbehinds.

Regex in Other Languages

Frequently Asked Questions

What is the difference between re.match() and re.search()?

re.match() only checks for a match at the beginning of the string. re.search() scans the entire string and returns the first match anywhere. Use re.fullmatch() (Python 3.4+) to check if the entire string matches the pattern.

Why do I need raw strings (r'...') for regex in Python?

Without raw strings, Python interprets backslash sequences before the regex engine sees them. For example, '\b' is a backspace character, but r'\b' is the literal characters \b — which the regex engine interprets as a word boundary. Raw strings pass backslashes through unchanged.

How do I match across multiple lines in Python?

Use the re.DOTALL (re.S) flag to make . match newline characters. Use the re.MULTILINE (re.M) flag to make ^ and $ match the start/end of each line instead of the entire string. You can combine flags: re.DOTALL | re.MULTILINE.

What is the difference between re and the regex module?

The built-in re module covers standard regex needs. The third-party regex module (pip install regex) adds features like fuzzy matching, Unicode category support (\p{L}), variable-length lookbehinds, atomic groups (in Python < 3.11), and better Unicode handling. It's a drop-in replacement for re.

Want to test a Python regex pattern? Our regex tester runs JavaScript's native RegExp engine in your browser — paste your pattern and see matches in real time.

← Open the Regex Tester