Web Scraping & Data Extraction – Telegram

Web Scraping & Data Extraction

934 subscribers

10 photos

2 videos

872 links

Ultimate web scraping related hub.
A shortcut for your learning journey.
Image credit: Clubhouse data extraction by x.com/rashiq

Download Telegram

About

Blog

Apps

Platform

Web Scraping & Data Extraction

934 subscribers

Web Scraping & Data Extraction

Hi Scrapers! 🕷

If you're diving into the world of data extraction, using a proxy is essential to keep your machine hidden and secure.

But let's be real—navigating the pricing structures of different proxy providers can be a real headache when you're trying to budget! 🤕

Each provider has its own unique billing methods and credit systems, which can make comparisons tricky. 💸

Thanks to Pierluigi from TheWebScrapingClub!

He’s made it easier for us by simplifying the comparison process across various proxy providers, giving you a clearer picture of their pricing structures.

Check out the worksheet below! 📊

✨ Google Sheets: Proxy Provider Pricebook

📝 Complete Pierluigi's explanation on TheWebScrapingClub's blog

❤3

956 views04:13

Web Scraping & Data Extraction

This is how your scraping frameworks and libraries was detected while the real browsers are not 😏

A video from John Watson Rooney

This Simple String Blocks Your Web Scrapers

Check Out ProxyScrape here: https://proxyscrape.com/?ref=jhnwr

➡ JOIN MY MAILING LIST
https://johnwr.com

➡ COMMUNITY
https://discord.gg/C4J2uckpbR
https://www.patreon.com/johnwatsonrooney

➡ PROXIES
https://proxyscrape.com/?ref=jhnwr

➡ HOSTING (Digital…

1.08K views00:12

Web Scraping & Data Extraction

Which of the following is NOT a valid locator strategy in Selenium?

Anonymous Quiz

54 voters1.05K views05:40

Web Scraping & Data Extraction

🐙 GitHub repo: OxyMouse

Mouse Movement Algorithms

https://github.com/oxylabs/OxyMouse

GitHub - oxylabs/OxyMouse: Mouse Movement Algorithms

Mouse Movement Algorithms. Contribute to oxylabs/OxyMouse development by creating an account on GitHub.

1.13K views10:49

Web Scraping & Data Extraction

🎉 Mitmproxy new release now fully support HTTP/3 🔥

https://mitmproxy.org/posts/releases/mitmproxy-11/

👍1

1.08K views00:18

Web Scraping & Data Extraction

This media is not supported in your browser

VIEW IN TELEGRAM

This is insane! 😱

1.34K views10:27

Web Scraping & Data Extraction

In Playwright, which method is used to navigate to a specific URL?

Anonymous Quiz

28%

page.url()

36 voters1.26K views03:50

Web Scraping & Data Extraction

🐙 GitHub repo: TikTokLive

Python library to receive live stream events (comments, gifts, etc.) in realtime from TikTok LIVE.

https://github.com/isaackogan/TikTokLive

GitHub - isaackogan/TikTokLive: The definitive Python library to receive livestream events (comments, gifts, etc.) in realtime…

The definitive Python library to receive livestream events (comments, gifts, etc.) in realtime from TikTok LIVE. - isaackogan/TikTokLive

1.88K views02:49

Web Scraping & Data Extraction

🗓 Update on the Scraping Universe

Did you know that TikTok's parent company scrapes all around the web massively? One of the most aggressive scraping on the internet! 🤯

ByteDance, the company behind TikTok, has been scraping data from websites at an insane rate.

According to Sam Crowther, the CEO of Kasada, the bot called Bytespider is blowing away the competition, hoovering up data 25 times faster than GPTbot, which scrapes data for ChatGPT. And 3,000 times quicker than ClaudeBot, the scraper bot used by Anthropic. 🤖

What do you think guys? 😀

❤1

1.51K views09:46

Web Scraping & Data Extraction

What is the purpose of the "Network" tab in Chrome DevTools?

Anonymous Quiz

To view and edit HTML

To inspect and analyze network requests and responses

To manage cookies and storage

To debug JavaScript code

58 voters1.34K views00:55

Web Scraping & Data Extraction

In case you miss Extract Summit 2024 event by Zyte, you can access full days talks on Youtube

Extract Summit 2024 Talks

Enjoy every session from Extract Summit 2024 in Austin, Texas, featuring leaders from Walmart, Apify, PartsAsap, Harvard, Zyte, Massive, Rayobyte, Serversfac...

👍1

1.44K views21:43

Web Scraping & Data Extraction

💼 Job market requirement insight for Scrapers

Zyte is opening a position as Principal Reverse Engineer and these are skills they required for the candidate:

• Hacker mindset
• Understand techniques and tools for crawling, extracting, and processing data
• Proficiency in programming languages: JavaScript/Node.js, Python, Java
• Reverse engineering skills: static, dynamic, and concolic analysis
• Understand operating systems and computer networking concepts
• Can use tools like Wireshark, Burp Suite, etc to intercept and debug network traffic
• Understand browser engines, browser fingerprinting, and ad-blocker mechanisms

And will be liked if:
• Experience with Decompilers, IDA Pro, Ghidra or Frida, Jadx, and Babel
• Experience with C/C++
• Core contributions to Mozilla or Chromium projects

1.56K views09:10

Web Scraping & Data Extraction

🐙 GitHub repo: google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

https://github.com/gosom/google-maps-scraper

GitHub - gosom/google-maps-scraper: scrape data data from Google Maps. Extracts data such as the name, address, phone number,…

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place - goso...

1.91K views01:37

Web Scraping & Data Extraction

⚙ Tech Stack at Apify

Frontend: React.js, styled-components, Storybook, Cypress

Backend: TypeScript/Node.js, Next.js, Nest.js, Docusaurus, Jest

Infra: AWS, Kubernetes, Helm, MongoDB, Redis, DynamoDB, S3, GitHub Actions

Monitoring: New Relic, LogDNA, Sentry, PagerDuty

Tools: GitHub, ZenHub, Notion, GSuite

AI Tools: Langchain, LlamaIndex, Pinecone, OpenAI API, Web agents

2.11K views21:27

Web Scraping & Data Extraction

In XML, which of the following is a valid format?

Anonymous Quiz

<element value="attribute">

<element attribute="value">

<element attribute=value>

<element attribute: value>

64 voters2.14K views01:33

Web Scraping & Data Extraction

What is the primary purpose of the TCP/IP protocol suite in software development?

Anonymous Quiz

To provide a standardized way for devices to communicate over the internet

To manage user authentication and authorization

To handle database transactions

To optimize website performance

45 voters2.28K views14:09

Web Scraping & Data Extraction

🐙 GitHub repo: google-play-scraper

Google play scraper for Python

https://github.com/JoMingyu/google-play-scraper

GitHub - JoMingyu/google-play-scraper: Google play scraper for Python inspired by <facundoolano/google-play-scraper>

Google play scraper for Python inspired by <facundoolano/google-play-scraper> - JoMingyu/google-play-scraper

👍1

2.75K views01:51

Web Scraping & Data Extraction

What is the primary advantage of using Playwright over Selenium?

Anonymous Quiz

Playwright is slower than Selenium

Supports only one browser

Can handle multiple browser contexts in a single test

Does not support headless mode

93 voters2.89K views10:49

Web Scraping & Data Extraction

🐙 GitHub repo: Scrapegraph-ai

Python scraper based on AI

https://github.com/ScrapeGraphAI/Scrapegraph-ai

GitHub - ScrapeGraphAI/Scrapegraph-ai: Python scraper based on AI

Python scraper based on AI. Contribute to ScrapeGraphAI/Scrapegraph-ai development by creating an account on GitHub.

❤1

3.46K views10:16

Web Scraping & Data Extraction

🐙 GitHub repo: Google-Maps-Scraper

Google maps scraper with gui

https://github.com/Zubdata/Google-Maps-Scraper

GitHub - Zubdata/Google-Maps-Scraper: Google maps scraper with gui

Google maps scraper with gui. Contribute to Zubdata/Google-Maps-Scraper development by creating an account on GitHub.

3.91K views07:19

Web Scraping & Data Extraction

🐙 GitHub repo: scrape

CLI utility to scrape emails from websites

https://github.com/lawzava/scrape

GitHub - lawzava/scrape: CLI utility to scrape emails from websites

CLI utility to scrape emails from websites. Contribute to lawzava/scrape development by creating an account on GitHub.

❤2

4.51K views12:16