Data Science Jupyter Notebooks

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "myDynamicElement"))
)

• Get the page source after JavaScript has executed.

dynamic_html = driver.page_source

• Close the browser window.

driver.quit()

VII. Common Tasks & Best Practices

• Handle pagination by finding the "Next" link.

next_page_url = soup.find('a', text='Next')['href']

• Save data to a CSV file.

import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Title', 'Link'])
    # writer.writerow([title, url]) in a loop

• Save data to CSV using pandas.

import pandas as pd
df = pd.DataFrame(data, columns=['Title', 'Link'])
df.to_csv('data.csv', index=False)

• Use a proxy with requests.

proxies = {'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080'}
requests.get('http://example.com', proxies=proxies)

• Pause between requests to be polite.

import time
time.sleep(2) # Pause for 2 seconds

• Handle JSON data from an API.

json_response = requests.get('https://api.example.com/data').json()

• Download a file (like an image).

img_url = 'http://example.com/image.jpg'
img_data = requests.get(img_url).content
with open('image.jpg', 'wb') as handler:
    handler.write(img_data)

• Parse a sitemap.xml to find all URLs.

# Get the sitemap.xml file and parse it like any other XML/HTML to extract <loc> tags.

VIII. Advanced Frameworks (Scrapy)

• Create a Scrapy spider (conceptual command).

scrapy genspider example example.com

• Define a parse method to process the response.

# In your spider class:
def parse(self, response):
    # parsing logic here
    pass

• Extract data using Scrapy's CSS selectors.

titles = response.css('h1::text').getall()

• Extract data using Scrapy's XPath selectors.

links = response.xpath('//a/@href').getall()

• Yield a dictionary of scraped data.

yield {'title': response.css('title::text').get()}

• Follow a link to parse the next page.

next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
    yield response.follow(next_page, callback=self.parse)

• Run a spider from the command line.

scrapy crawl example -o output.json

• Pass arguments to a spider.

scrapy crawl example -a category=books

• Create a Scrapy Item for structured data.

import scrapy
class ProductItem(scrapy.Item):
    name = scrapy.Field()
    price = scrapy.Field()

• Use an Item Loader to populate Items.

from scrapy.loader import ItemLoader
loader = ItemLoader(item=ProductItem(), response=response)
loader.add_css('name', 'h1.product-name::text')

#Python #WebScraping #BeautifulSoup #Selenium #Requests

━━━━━━━━━━━━━━━
By: @DataScienceN ✨

❤3

352 views19:45

Data Science Jupyter Notebooks

🔥 Trending Repository: localstack

📝 Description: 💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline

🔗 Repository URL: https://github.com/localstack/localstack

🌐 Website: https://localstack.cloud

📖 Readme: https://github.com/localstack/localstack#readme

📊 Statistics:
🌟 Stars: 61.1K stars
👀 Watchers: 514
🍴 Forks: 4.3K forks

💻 Programming Languages: Python - Shell - Makefile - ANTLR - JavaScript - Java

🏷️ Related Topics:

#python #testing #aws #cloud #continuous_integration #developer_tools #localstack

==================================
🧠 By: https://t.me/DataScienceM

423 views11:02

📥 Download Zip

🚀 Explore Data Science

Data Science Jupyter Notebooks

🔥 Trending Repository: TrendRadar

📝 Description: 🎯 告别信息过载，AI 助你看懂新闻资讯热点，简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台（抖音、知乎、B站、华尔街见闻、财联社等），智能筛选+自动推送+AI对话分析（用自然语言深度挖掘新闻：趋势追踪、情感分析、相似检索等13种工具）。支持企业微信/飞书/钉钉/Telegram/邮件/ntfy推送，30秒网页部署，1分钟手机通知，无需编程。支持Docker部署⭐ 让算法为你服务，用AI理解热点

🔗 Repository URL: https://github.com/sansan0/TrendRadar

🌐 Website: https://github.com/sansan0

📖 Readme: https://github.com/sansan0/TrendRadar#readme

📊 Statistics:
🌟 Stars: 6K stars
👀 Watchers: 21
🍴 Forks: 4.5K forks

💻 Programming Languages: Python - HTML - Batchfile - Shell - Dockerfile

🏷️ Related Topics:

#python #docker #mail #news #telegram_bot #mcp #data_analysis #trending_topics #wechat_robot #dingtalk_robot #ntfy #hot_news #feishu_robot #mcp_server

==================================
🧠 By: https://t.me/DataScienceM

356 views14:00

📥 Download Zip

🚀 Explore Data Science

About

Blog

Apps

Platform