Python | Machine Learning | Coding | R
67.1K subscribers
1.25K photos
89 videos
152 files
900 links
Help and ads: @hussein_sheikho

Discover powerful insights with Python, Machine Learning, Coding, and R—your essential toolkit for data-driven solutions, smart alg

List of our channels:
https://t.me/addlist/8_rRW2scgfRhOTc0

https://telega.io/?r=nikapsOH
Download Telegram
In Python, the re module handles regular expressions (regex) for pattern matching in strings—vital for text processing like validating emails, extracting data from logs, or cleaning user input in interviews; it's compiled for efficiency but can be complex, so start simple and test with tools like regex101.com.

import re

# Basic search: Find if pattern exists (returns Match object or None)
txt = "The rain in Spain"
match = re.search(r"Spain", txt) # r"" for raw string (avoids escaping issues)
if match:
print(match.group()) # Output: Spain (full match)
print(match.start(), match.end()) # Output: 12 17 (positions)

# findall: Extract all matches as list (non-overlapping)
txt = "The rain in Spain stays mainly in the plain"
emails = re.findall(r"\w+@\w+\.com", "Contact: user1@example.com or user2@test.com")
print(emails) # Output: ['user1@example.com', 'user2@test.com']

# split: Divide string at matches (like str.split but with patterns)
words = re.split(r"\s+", "Hello world\twith spaces") # \s+ matches whitespace
print(words) # Output: ['Hello', 'world', 'with', 'spaces']

# sub: Replace matches (count limits replacements; use \1 for groups)
cleaned = re.sub(r"\d+", "***", "Phone: 123-456-7890 or 098-765-4321", count=1)
print(cleaned) # Output: Phone: *** or 098-765-4321 (first number replaced)

# Metacharacters basics:. (any char except \n), ^ (start), $ (end), * (0+), + (1+),? (0-1)
match = re.search(r"^The.*Spain$", txt) # ^ start, $ end,. any, * 0+ of previous
print(match.group() if match else "No match") # Output: The rain in Spain

# Character classes: \d (digit), \w (word char), [a-z] (range), [^0-9] (not digit)
nums = re.findall(r"\d+", "abc123def456") # \d+ one or more digits
print(nums) # Output: ['123', '456']

words_only = re.findall(r"\w+", "Hello123! World?") # \w+ word chars (alphanum + _)
print(words_only) # Output: ['Hello123', 'World']

# Groups: () capture parts; use for extraction or alternation
date = re.search(r"(\d{4})-(\d{2})-(\d{2})", "Event on 2023-10-27")
if date:
print(date.groups()) # Output: ('2023', '10', '27') (tuples of captures)
print(date.group(1)) # Output: 2023 (first group)

# Alternation: | for OR (e.g., cat|dog)
animals = re.findall(r"cat|dog", "I have a cat and a dog")
print(animals) # Output: ['cat', 'dog']

# Flags: re.IGNORECASE (case-insensitive), re.MULTILINE (^/$ per line)
text = "Spain\nin\nSpain"
matches = re.findall(r"^Spain", text, re.MULTILINE) # ^ matches start of each line
print(matches) # Output: ['Spain', 'Spain']

# Advanced: Greedy vs non-greedy (*? or +?) to match minimal
html = "<div><p>Text</p></div>"
content = re.search(r"<div>.*?</div>", html) #.*? non-greedy (stops at first </div>)
print(content.group()) # Output: <div><p>Text</p></div>

# Edge cases: Empty string, no match
print(re.search(r"a", "")) # Output: None
print(re.findall(r"\d", "no numbers")) # Output: []

# Compile for reuse (faster for multiple uses)
pattern = re.compile(r"\w+@\w+\.com")
email = pattern.search("email@example.com")
print(email.group() if email else "No email") # Output: email@example.com


Regex tips: Escape special chars with \ (e.g., . for literal dot); use raw strings (r""); test incrementally to avoid frustration—common pitfalls include forgetting anchors (^/$) or overusing.*. For performance, compile patterns; in interviews, explain your pattern step-by-step for clarity. #python #regex #re_module #patterns #textprocessing #interviews #stringmatching

😱 https://t.me/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
6