Crawling Pages with Infinite Scroll using Scrapy and Playwright
This post provides a detailed guide on how to scrape infinite scroll websites using Scrapy and Playwright in Python. It covers the setup process, explains how to implement a custom downloader middleware to handle JavaScript rendering, and demonstrates how to extract data from dynamically loaded content, offering a practical solution for web scraping challenges posed by modern web applica...
https://www.xiegerts.com/post/infinite-scroll-scrapy-playwright/
This post provides a detailed guide on how to scrape infinite scroll websites using Scrapy and Playwright in Python. It covers the setup process, explains how to implement a custom downloader middleware to handle JavaScript rendering, and demonstrates how to extract data from dynamically loaded content, offering a practical solution for web scraping challenges posed by modern web applica...
https://www.xiegerts.com/post/infinite-scroll-scrapy-playwright/
Stephen Siegert
Crawling Pages with Infinite Scroll using Scrapy and Playwright
Setting up a Scrapy spider to crawl pages with infinite scroll using Playwright and Chromium.
Optimizing PyTorch Docker images: how to cut size by 60%
https://mveg.es/posts/optimizing-pytorch-docker-images-cut-size-by-60percent/
https://mveg.es/posts/optimizing-pytorch-docker-images-cut-size-by-60percent/
Marton Veges
Optimizing PyTorch Docker images: how to cut size by 60%
Docker images containing CUDA and PyTorch can be quite large. I show some tips to significantly decrease image sizes, up to 60%.
Django: create sub-commands within a management command
https://adamj.eu/tech/2024/08/14/django-management-command-sub-commands/
https://adamj.eu/tech/2024/08/14/django-management-command-sub-commands/
adamj.eu
Django: create sub-commands within a management command - Adam Johnson
argparse, the standard library module that Django uses for parsing command line options, supports sub-commands. These are pretty neat for providing an expansive API without hundreds of individual commands. Here’s an example of using sub-commands in a Django…
Unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents.
https://github.com/Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents.
https://github.com/Zipstack/unstract
GitHub
GitHub - Zipstack/unstract: No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents - Zipstack/unstract
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention
https://pytorch.org/blog/flexattention/
https://pytorch.org/blog/flexattention/
Some more batteries to do stuff with Mapping related data structures
This library provides utility functions for manipulating and transforming data structures which have or include Mapping-like characteristics.
https://github.com/erivlis/mappingtools
This library provides utility functions for manipulating and transforming data structures which have or include Mapping-like characteristics.
https://github.com/erivlis/mappingtools
GitHub
GitHub - erivlis/mappingtools: Python tools to manipulate Mapping objects
Python tools to manipulate Mapping objects. Contribute to erivlis/mappingtools development by creating an account on GitHub.
CSVs Are Kinda Bad. DSVs Are Kinda Good.
The article argues that CSVs (Comma-Separated Values) are problematic due to various edge cases involving delimiters, quotes, and newlines, and proposes using Delimiter-Separated Values (DSV) with ASCII control characters as a more robust alternative. It demonstrates how DSVs can handle complex data without escaping or quoting issues, but acknowledges that the lack of widespread tool sup...
https://matthodges.com/posts/2024-08-12-csv-bad-dsv-good/
The article argues that CSVs (Comma-Separated Values) are problematic due to various edge cases involving delimiters, quotes, and newlines, and proposes using Delimiter-Separated Values (DSV) with ASCII control characters as a more robust alternative. It demonstrates how DSVs can handle complex data without escaping or quoting issues, but acknowledges that the lack of widespread tool sup...
https://matthodges.com/posts/2024-08-12-csv-bad-dsv-good/
Matt Hodges
CSVs Are Kinda Bad. DSVs Are Kinda Good.
I would simply use delimiter characters to delimit data.
Deep-Live-Cam
Real time face swap and one-click video deepfake with only a single image.
https://github.com/hacksider/Deep-Live-Cam
Real time face swap and one-click video deepfake with only a single image.
https://github.com/hacksider/Deep-Live-Cam
GitHub
GitHub - hacksider/Deep-Live-Cam: real time face swap and one-click video deepfake with only a single image
real time face swap and one-click video deepfake with only a single image - hacksider/Deep-Live-Cam
DeltaDB
DeltaDB is a lightweight, fast, and scalable database built on polars and deltalake.
https://github.com/uname-n/deltadb
DeltaDB is a lightweight, fast, and scalable database built on polars and deltalake.
https://github.com/uname-n/deltadb
GitHub
GitHub - uname-n/deltabase: a lightweight, comprehensive solution for managing delta tables built on polars and deltalake
a lightweight, comprehensive solution for managing delta tables built on polars and deltalake - uname-n/deltabase
Automating Boolean-Based SQL Injection with Python
How to be efficiently lazy at finding hidden gems in predictable places – Database Edition.
https://trebledj.me/posts/automating-boolean-sql-injection-with-python/
How to be efficiently lazy at finding hidden gems in predictable places – Database Edition.
https://trebledj.me/posts/automating-boolean-sql-injection-with-python/
TrebledJ's Pages
Automating Boolean-Based SQL Injection with Python - TrebledJ's Pages
How to be efficiently lazy at finding hidden gems in predictable places – Database Edition
When performing a penetration test, we occasionally come across SQL injection (SQLi) vulnerabilities. One particular class of SQLi is particularly tedious to exploit…
When performing a penetration test, we occasionally come across SQL injection (SQLi) vulnerabilities. One particular class of SQLi is particularly tedious to exploit…
llm_aided_ocr
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
https://github.com/Dicklesworthstone/llm_aided_ocr
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
https://github.com/Dicklesworthstone/llm_aided_ocr
GitHub
GitHub - Dicklesworthstone/llm_aided_ocr: Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM)…
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections. - Dicklesworthstone/llm_aided_ocr
From Boring Object-Oriented to INSANE Functional Code
This video demonstrates that there's a place for both object-oriented and functional code. In Python, these two approaches can be combined effectively, allowing you to leverage the strengths of each for the best results.
https://www.youtube.com/watch?v=DvdZv_DD0DY
This video demonstrates that there's a place for both object-oriented and functional code. In Python, these two approaches can be combined effectively, allowing you to leverage the strengths of each for the best results.
https://www.youtube.com/watch?v=DvdZv_DD0DY
YouTube
Refactoring Object-Oriented to AWESOME Functional Code
👷 Review code better and faster with my 3-Factor Framework: https://arjan.codes/diagnosis.
In this video, I want to show you that there is a place for object-oriented code and a place for more functional code. In Python, you can combine the two, which is…
In this video, I want to show you that there is a place for object-oriented code and a place for more functional code. In Python, you can combine the two, which is…
EDSL
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
https://github.com/expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
https://github.com/expectedparrot/edsl
GitHub
GitHub - expectedparrot/edsl: Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science…
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. - expectedparrot/edsl
Cloudflare R2 x Django. Static Files. User uploads, css, images, js and more. Production-ready.
The video covers setting up and managing Django files, including static and user-uploaded files, using Cloudflare's R2 object storage. It emphasizes best practices for configuring environment variables, securing API keys, and managing static and media files in Django with advanced validation and customization options.
https://www.youtube.com/watch?v=VU3MAN1gs1s
The video covers setting up and managing Django files, including static and user-uploaded files, using Cloudflare's R2 object storage. It emphasizes best practices for configuring environment variables, securing API keys, and managing static and media files in Django with advanced validation and customization options.
https://www.youtube.com/watch?v=VU3MAN1gs1s
YouTube
Cloudflare for Devs Part 1: R2 with Django for Static files, user uploads, css, images, js and more.
Learn how to implement production-ready hosting for your static files and user file uploads.
In this one we only use:
- Cloudflare R2
- Django 5 (4.2+)
- django-storages
- python-decouple
Thank you to Cloudflare for sponsoring this one! Be sure to sign…
In this one we only use:
- Cloudflare R2
- Django 5 (4.2+)
- django-storages
- python-decouple
Thank you to Cloudflare for sponsoring this one! Be sure to sign…
Gemma for Streaming ML with Dataflow
The article demonstrates how to integrate Google's Gemma 2 language model into a Dataflow pipeline for real-time sentiment analysis and response generation in customer support chats. It provides a practical example of using Gemma to process streaming data, including code snippets for creating prompts, running inference, and handling model outputs within a scalable data processing framework.
https://developers.googleblog.com/en/gemma-for-streaming-ml-with-dataflow/
The article demonstrates how to integrate Google's Gemma 2 language model into a Dataflow pipeline for real-time sentiment analysis and response generation in customer support chats. It provides a practical example of using Gemma to process streaming data, including code snippets for creating prompts, running inference, and handling model outputs within a scalable data processing framework.
https://developers.googleblog.com/en/gemma-for-streaming-ml-with-dataflow/
Googleblog
Google for Developers Blog - News about Web, Mobile, AI and Cloud
Use the Gemma language model to gauge customer sentiment, summarize conversations, and assist with crafting responses in near real-time with minimal latency.