Beautiful Soup — a library for extracting data from HTML and XML files, which is perfect for web scraping.
1. Installation
pip install beautifulsoup4
2. Import
from bs4 import BeautifulSoup
import requests
3. Basic parsing
html_doc = "<html><body><p class='text'>Hello, world!</p></body></html>"
soup = BeautifulSoup(html_doc, 'html.parser') # or 'lxml', 'html5lib'
print(soup.p.text) # Hello, world!
4. Finding elements
# First found element
first_p = soup.find('p')
# Search by class or attribute
text_elem = soup.find('p', class_='text')
text_elem = soup.find('p', {'class': 'text'})
# All elements
all_p = soup.find_all('p')
all_text_class = soup.find_all(class_='text')
5. Working with attributes and text
a_tag = soup.find('a')
print(a_tag['href']) # value of the href attribute
print(a_tag.get_text()) # text inside the tag
print(a_tag.text) # alternative6. Navigating the tree
# Moving to parent, children, siblings
parent = soup.p.parent
children = soup.ul.children
next_sibling = soup.p.next_sibling
# Finding the previous/next element
prev_elem = soup.find_previous('p')
next_elem = soup.find_next('div')
7. Parsing a real page
response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html. parser')
title = soup.title.text
links = [a['href'] for a in soup.find_all('a', href=True)]8. CSS selectors
# More powerful and concise search
items = soup.select('div.content > p.text')
first_item = soup.select_one('a.button')
tags: #cheat_sheet #useful
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
Code With Python
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
Admin: @HusseinSheikho || @Hussein_Sheikho
Admin: @HusseinSheikho || @Hussein_Sheikho
❤3👍1