Web Scraping & Data Extraction
937 subscribers
10 photos
2 videos
872 links
Ultimate web scraping related hub.
A shortcut for your learning journey.
Image credit: Clubhouse data extraction by x.com/rashiq
Download Telegram
βš™ Tech Stack at Apify

Frontend: React.js, styled-components, Storybook, Cypress

Backend: TypeScript/Node.js, Next.js, Nest.js, Docusaurus, Jest

Infra: AWS, Kubernetes, Helm, MongoDB, Redis, DynamoDB, S3, GitHub Actions

Monitoring: New Relic, LogDNA, Sentry, PagerDuty

Tools: GitHub, ZenHub, Notion, GSuite

AI Tools: Langchain, LlamaIndex, Pinecone, OpenAI API, Web agents
JA3 transport is a way to identify a client’s TLS configuration. It includes the list of cipher suites supported, the extensions sent, and other details.

This fingerprint can be used to recognize the browser or device making the connection, even if it's using encryption.

Fingerprinting is a common method to detect automation bot and crawler.

How JA3 fingerprints can be impersonated? πŸ€”
❀3
Camoufox is also fully compatible with the Playwright API, so the code will be similar to any Playwright code that you already have, with only a change in the way the browser is initialized.


An article from ScrapingBee 🐝
Perplexity response to Cloudflare's argument: "automated crawling and user-driven fetching is different!"

https://x.com/perplexity_ai/status/1952531537385456019
πŸ‘1