Data Science Jupyter Notebooks
11.9K subscribers
290 photos
44 videos
9 files
876 links
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
Download Telegram
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)

• Get the page source after JavaScript has executed.
dynamic_html = driver.page_source

• Close the browser window.
driver.quit()


VII. Common Tasks & Best Practices

• Handle pagination by finding the "Next" link.
next_page_url = soup.find('a', text='Next')['href']

• Save data to a CSV file.
import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Title', 'Link'])
# writer.writerow([title, url]) in a loop

• Save data to CSV using pandas.
import pandas as pd
df = pd.DataFrame(data, columns=['Title', 'Link'])
df.to_csv('data.csv', index=False)

• Use a proxy with requests.
proxies = {'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080'}
requests.get('http://example.com', proxies=proxies)

• Pause between requests to be polite.
import time
time.sleep(2) # Pause for 2 seconds

• Handle JSON data from an API.
json_response = requests.get('https://api.example.com/data').json()

• Download a file (like an image).
img_url = 'http://example.com/image.jpg'
img_data = requests.get(img_url).content
with open('image.jpg', 'wb') as handler:
handler.write(img_data)

• Parse a sitemap.xml to find all URLs.
# Get the sitemap.xml file and parse it like any other XML/HTML to extract <loc> tags.


VIII. Advanced Frameworks (Scrapy)

• Create a Scrapy spider (conceptual command).
scrapy genspider example example.com

• Define a parse method to process the response.
# In your spider class:
def parse(self, response):
# parsing logic here
pass

• Extract data using Scrapy's CSS selectors.
titles = response.css('h1::text').getall()

• Extract data using Scrapy's XPath selectors.
links = response.xpath('//a/@href').getall()

• Yield a dictionary of scraped data.
yield {'title': response.css('title::text').get()}

• Follow a link to parse the next page.
next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)

• Run a spider from the command line.
scrapy crawl example -o output.json

• Pass arguments to a spider.
scrapy crawl example -a category=books

• Create a Scrapy Item for structured data.
import scrapy
class ProductItem(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()

• Use an Item Loader to populate Items.
from scrapy.loader import ItemLoader
loader = ItemLoader(item=ProductItem(), response=response)
loader.add_css('name', 'h1.product-name::text')


#Python #WebScraping #BeautifulSoup #Selenium #Requests

━━━━━━━━━━━━━━━
By: @DataScienceN
3
🔥 Trending Repository: localstack

📝 Description: 💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline

🔗 Repository URL: https://github.com/localstack/localstack

🌐 Website: https://localstack.cloud

📖 Readme: https://github.com/localstack/localstack#readme

📊 Statistics:
🌟 Stars: 61.1K stars
👀 Watchers: 514
🍴 Forks: 4.3K forks

💻 Programming Languages: Python - Shell - Makefile - ANTLR - JavaScript - Java

🏷️ Related Topics:
#python #testing #aws #cloud #continuous_integration #developer_tools #localstack


==================================
🧠 By: https://t.me/DataScienceM
🔥 Trending Repository: TrendRadar

📝 Description: 🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/飞书/钉钉/Telegram/邮件/ntfy推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署 让算法为你服务,用AI理解热点

🔗 Repository URL: https://github.com/sansan0/TrendRadar

🌐 Website: https://github.com/sansan0

📖 Readme: https://github.com/sansan0/TrendRadar#readme

📊 Statistics:
🌟 Stars: 6K stars
👀 Watchers: 21
🍴 Forks: 4.5K forks

💻 Programming Languages: Python - HTML - Batchfile - Shell - Dockerfile

🏷️ Related Topics:
#python #docker #mail #news #telegram_bot #mcp #data_analysis #trending_topics #wechat_robot #dingtalk_robot #ntfy #hot_news #feishu_robot #mcp_server


==================================
🧠 By: https://t.me/DataScienceM