Data Science Jupyter Notebooks
11.9K subscribers
292 photos
45 videos
9 files
882 links
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
Download Telegram
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)

• Get the page source after JavaScript has executed.
dynamic_html = driver.page_source

• Close the browser window.
driver.quit()


VII. Common Tasks & Best Practices

• Handle pagination by finding the "Next" link.
next_page_url = soup.find('a', text='Next')['href']

• Save data to a CSV file.
import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Title', 'Link'])
# writer.writerow([title, url]) in a loop

• Save data to CSV using pandas.
import pandas as pd
df = pd.DataFrame(data, columns=['Title', 'Link'])
df.to_csv('data.csv', index=False)

• Use a proxy with requests.
proxies = {'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080'}
requests.get('http://example.com', proxies=proxies)

• Pause between requests to be polite.
import time
time.sleep(2) # Pause for 2 seconds

• Handle JSON data from an API.
json_response = requests.get('https://api.example.com/data').json()

• Download a file (like an image).
img_url = 'http://example.com/image.jpg'
img_data = requests.get(img_url).content
with open('image.jpg', 'wb') as handler:
handler.write(img_data)

• Parse a sitemap.xml to find all URLs.
# Get the sitemap.xml file and parse it like any other XML/HTML to extract <loc> tags.


VIII. Advanced Frameworks (Scrapy)

• Create a Scrapy spider (conceptual command).
scrapy genspider example example.com

• Define a parse method to process the response.
# In your spider class:
def parse(self, response):
# parsing logic here
pass

• Extract data using Scrapy's CSS selectors.
titles = response.css('h1::text').getall()

• Extract data using Scrapy's XPath selectors.
links = response.xpath('//a/@href').getall()

• Yield a dictionary of scraped data.
yield {'title': response.css('title::text').get()}

• Follow a link to parse the next page.
next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)

• Run a spider from the command line.
scrapy crawl example -o output.json

• Pass arguments to a spider.
scrapy crawl example -a category=books

• Create a Scrapy Item for structured data.
import scrapy
class ProductItem(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()

• Use an Item Loader to populate Items.
from scrapy.loader import ItemLoader
loader = ItemLoader(item=ProductItem(), response=response)
loader.add_css('name', 'h1.product-name::text')


#Python #WebScraping #BeautifulSoup #Selenium #Requests

━━━━━━━━━━━━━━━
By: @DataScienceN
3
🔥 Trending Repository: localstack

📝 Description: 💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline

🔗 Repository URL: https://github.com/localstack/localstack

🌐 Website: https://localstack.cloud

📖 Readme: https://github.com/localstack/localstack#readme

📊 Statistics:
🌟 Stars: 61.1K stars
👀 Watchers: 514
🍴 Forks: 4.3K forks

💻 Programming Languages: Python - Shell - Makefile - ANTLR - JavaScript - Java

🏷️ Related Topics:
#python #testing #aws #cloud #continuous_integration #developer_tools #localstack


==================================
🧠 By: https://t.me/DataScienceM
🔥 Trending Repository: TrendRadar

📝 Description: 🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/飞书/钉钉/Telegram/邮件/ntfy推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署 让算法为你服务,用AI理解热点

🔗 Repository URL: https://github.com/sansan0/TrendRadar

🌐 Website: https://github.com/sansan0

📖 Readme: https://github.com/sansan0/TrendRadar#readme

📊 Statistics:
🌟 Stars: 6K stars
👀 Watchers: 21
🍴 Forks: 4.5K forks

💻 Programming Languages: Python - HTML - Batchfile - Shell - Dockerfile

🏷️ Related Topics:
#python #docker #mail #news #telegram_bot #mcp #data_analysis #trending_topics #wechat_robot #dingtalk_robot #ntfy #hot_news #feishu_robot #mcp_server


==================================
🧠 By: https://t.me/DataScienceM
🔥 Trending Repository: LEANN

📝 Description: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

🔗 Repository URL: https://github.com/yichuan-w/LEANN

📖 Readme: https://github.com/yichuan-w/LEANN#readme

📊 Statistics:
🌟 Stars: 3.9K stars
👀 Watchers: 34
🍴 Forks: 403 forks

💻 Programming Languages: Python

🏷️ Related Topics:
#python #privacy #ai #offline_first #localstorage #vectors #faiss #rag #vector_search #vector_database #llm #langchain #llama_index #retrieval_augmented_generation #ollama #gpt_oss


==================================
🧠 By: https://t.me/DataScienceM
🔥 Trending Repository: PythonRobotics

📝 Description: Python sample codes and textbook for robotics algorithms.

🔗 Repository URL: https://github.com/AtsushiSakai/PythonRobotics

🌐 Website: https://atsushisakai.github.io/PythonRobotics/

📖 Readme: https://github.com/AtsushiSakai/PythonRobotics#readme

📊 Statistics:
🌟 Stars: 26.3K stars
👀 Watchers: 509
🍴 Forks: 7K forks

💻 Programming Languages: Python

🏷️ Related Topics:
#python #algorithm #control #robot #localization #robotics #mapping #animation #path_planning #slam #autonomous_driving #autonomous_vehicles #ekf #hacktoberfest #cvxpy #autonomous_navigation


==================================
🧠 By: https://t.me/DataScienceM
Error Handling: Always wrap dispatch logic in try-except blocks to gracefully handle network issues, authentication failures, or incorrect receiver addresses.
Security: Never hardcode credentials directly in scripts. Use environment variables (os.environ.get()) or a secure configuration management system. Ensure starttls() is called for encrypted communication.
Rate Limits: SMTP servers impose limits on the number of messages one can send per hour or day. Implement pauses (time.sleep()) between dispatches to respect these limits and avoid being flagged as a spammer.
Opt-Outs: For promotional dispatches, ensure compliance with regulations (like GDPR, CAN-SPAM) by including clear unsubscribe options.

Concluding Thoughts

Automating electronic message dispatch empowers users to scale their communication efforts with remarkable efficiency. By leveraging Python's native capabilities, anyone can construct a powerful, flexible system for broadcasting anything from routine updates to extensive promotional campaigns. The journey into programmatic dispatch unveils a world of streamlined operations and enhanced communicative reach.

#python #automation #email #smtplib #emailautomation #programming #scripting #communication #developer #efficiency

━━━━━━━━━━━━━━━
By: @DataScienceN