Python | Machine Learning | Coding

Python | Machine Learning | Coding | R

In Python, the re module handles regular expressions (regex) for pattern matching in strings—vital for text processing like validating emails, extracting data from logs, or cleaning user input in interviews; it's compiled for efficiency but can be complex, so start simple and test with tools like regex101.com.

import re

# Basic search: Find if pattern exists (returns Match object or None)
txt = "The rain in Spain"
match = re.search(r"Spain", txt)  # r"" for raw string (avoids escaping issues)
if match:
    print(match.group())  # Output: Spain (full match)
    print(match.start(), match.end())  # Output: 12 17 (positions)

# findall: Extract all matches as list (non-overlapping)
txt = "The rain in Spain stays mainly in the plain"
emails = re.findall(r"\w+@\w+\.com", "Contact: user1@example.com or user2@test.com")
print(emails)  # Output: ['user1@example.com', 'user2@test.com']

# split: Divide string at matches (like str.split but with patterns)
words = re.split(r"\s+", "Hello   world\twith spaces")  # \s+ matches whitespace
print(words)  # Output: ['Hello', 'world', 'with', 'spaces']

# sub: Replace matches (count limits replacements; use \1 for groups)
cleaned = re.sub(r"\d+", "***", "Phone: 123-456-7890 or 098-765-4321", count=1)
print(cleaned)  # Output: Phone: *** or 098-765-4321 (first number replaced)

# Metacharacters basics:. (any char except \n), ^ (start), $ (end), * (0+), + (1+),? (0-1)
match = re.search(r"^The.*Spain$", txt)  # ^ start, $ end,. any, * 0+ of previous
print(match.group() if match else "No match")  # Output: The rain in Spain

# Character classes: \d (digit), \w (word char), [a-z] (range), [^0-9] (not digit)
nums = re.findall(r"\d+", "abc123def456")  # \d+ one or more digits
print(nums)  # Output: ['123', '456']

words_only = re.findall(r"\w+", "Hello123! World?")  # \w+ word chars (alphanum + _)
print(words_only)  # Output: ['Hello123', 'World']

# Groups: () capture parts; use for extraction or alternation
date = re.search(r"(\d{4})-(\d{2})-(\d{2})", "Event on 2023-10-27")
if date:
    print(date.groups())  # Output: ('2023', '10', '27') (tuples of captures)
    print(date.group(1))  # Output: 2023 (first group)

# Alternation: | for OR (e.g., cat|dog)
animals = re.findall(r"cat|dog", "I have a cat and a dog")
print(animals)  # Output: ['cat', 'dog']

# Flags: re.IGNORECASE (case-insensitive), re.MULTILINE (^/$ per line)
text = "Spain\nin\nSpain"
matches = re.findall(r"^Spain", text, re.MULTILINE)  # ^ matches start of each line
print(matches)  # Output: ['Spain', 'Spain']

# Advanced: Greedy vs non-greedy (*? or +?) to match minimal
html = "<div><p>Text</p></div>"
content = re.search(r"<div>.*?</div>", html)  #.*? non-greedy (stops at first </div>)
print(content.group())  # Output: <div><p>Text</p></div>

# Edge cases: Empty string, no match
print(re.search(r"a", ""))  # Output: None
print(re.findall(r"\d", "no numbers"))  # Output: []

# Compile for reuse (faster for multiple uses)
pattern = re.compile(r"\w+@\w+\.com")
email = pattern.search("email@example.com")
print(email.group() if email else "No email")  # Output: email@example.com

Regex tips: Escape special chars with \ (e.g., . for literal dot); use raw strings (r""); test incrementally to avoid frustration—common pitfalls include forgetting anchors (^/$) or overusing.*. For performance, compile patterns; in interviews, explain your pattern step-by-step for clarity. #python #regex #re_module #patterns #textprocessing #interviews #stringmatching

😱

https://t.me/CodeProgrammer

Please open Telegram to view this post

VIEW IN TELEGRAM

Python | Machine Learning | Coding | R

Help and ads: @hussein_sheikho

Discover powerful insights with Python, Machine Learning, Coding, and R—your essential toolkit for data-driven solutions, smart alg

List of our channels:
https://t.me/addlist/8_rRW2scgfRhOTc0

https://telega.io/?r=nikapsOH

❤6

795 views08:56

About

Blog

Apps

Platform