oLLM
oLLM is a lightweight Python library for large-context LLM inference, built on top of Huggingface Transformers and PyTorch. It enables running models like Llama-3.1-8B-Instruct on 100k context using ~$200 consumer GPU with 8GB VRAM. Example performance: ~20 min for the first token, ~17s per subsequent token.
https://github.com/Mega4alik/ollm
oLLM is a lightweight Python library for large-context LLM inference, built on top of Huggingface Transformers and PyTorch. It enables running models like Llama-3.1-8B-Instruct on 100k context using ~$200 consumer GPU with 8GB VRAM. Example performance: ~20 min for the first token, ~17s per subsequent token.
https://github.com/Mega4alik/ollm
GitHub
GitHub - Mega4alik/ollm
Contribute to Mega4alik/ollm development by creating an account on GitHub.
TIL: Using SQLModel Asynchronously with FastAPI (and Air) with PostgreSQL
This post explains how to leverage SQLModel with FastAPI and PostgreSQL to enable fully asynchronous database operations, improving scalability and efficiency for concurrent web applications. Key steps include setting up async database engines and sessions, using dependency injection in FastAPI, and aligning everything with non-blocking patterns.
https://daniel.feldroy.com/posts/til-2025-08-using-sqlmodel-asynchronously-with-fastapi-and-air-with-postgresql
This post explains how to leverage SQLModel with FastAPI and PostgreSQL to enable fully asynchronous database operations, improving scalability and efficiency for concurrent web applications. Key steps include setting up async database engines and sessions, using dependency injection in FastAPI, and aligning everything with non-blocking patterns.
https://daniel.feldroy.com/posts/til-2025-08-using-sqlmodel-asynchronously-with-fastapi-and-air-with-postgresql
https://daniel.feldroy.com
TIL: Using SQLModel Asynchronously with FastAPI (and Air) with PostgreSQL
SQLModel is a really useful library for working with SQL databases in Python, built on top of SQLAlchemy and Pydantic. However, AFAIK there's no documentation supporting asynchronous operations for PostgreSQL, which can be a limitation when building high…
Scheduling Background Tasks in Python with Celery and RabbitMQ
We'll build background tasks using Celery and RabbitMQ to create a weather notification service.
https://blog.appsignal.com/2025/08/27/scheduling-background-tasks-in-python-with-celery-and-rabbitmq.html
We'll build background tasks using Celery and RabbitMQ to create a weather notification service.
https://blog.appsignal.com/2025/08/27/scheduling-background-tasks-in-python-with-celery-and-rabbitmq.html
Appsignal
Scheduling Background Tasks in Python with Celery and RabbitMQ | AppSignal Blog
We'll build background tasks using Celery and RabbitMQ to create a weather notification service.
Elysia
Elysia is an agentic platform designed to use tools in a decision tree. A decision agent decides which tools to use dynamically based on its environment and context.
https://github.com/weaviate/elysia
Elysia is an agentic platform designed to use tools in a decision tree. A decision agent decides which tools to use dynamically based on its environment and context.
https://github.com/weaviate/elysia
GitHub
GitHub - weaviate/elysia: Python package and backend for the Elysia platform app.
Python package and backend for the Elysia platform app. - weaviate/elysia
Build an AI Coding Agent in Python
This tutorial teaches how to build a functional agentic AI coding assistant in Python using the free Gemini Flash API, covering agentic loops, tool-calling, file manipulation, and autonomous debugging. By constructing an agent that can read, modify, and execute code, viewers gain practical skills and deep insight into how modern coding agents operate beneath the surface.
https://www.youtube.com/watch?v=YtHdaXuOAks
This tutorial teaches how to build a functional agentic AI coding assistant in Python using the free Gemini Flash API, covering agentic loops, tool-calling, file manipulation, and autonomous debugging. By constructing an agent that can read, modify, and execute code, viewers gain practical skills and deep insight into how modern coding agents operate beneath the surface.
https://www.youtube.com/watch?v=YtHdaXuOAks
YouTube
Guide to Agentic AI – Build a Python Coding Agent with Gemini
Build your own functional AI coding agent from the ground up using Python and the free Gemini Flash API. This project-based tutorial provides a deep understanding of how powerful AI tools work by guiding you through the creation of an agentic loop powered…
playwright-use
playwright-use turns natural-language UI test goals into executable Playwright steps using AI, then produces human-friendly and machine-readable reports with screenshots, video, and traces.
https://pypi.org/project/playwright-use/
playwright-use turns natural-language UI test goals into executable Playwright steps using AI, then produces human-friendly and machine-readable reports with screenshots, video, and traces.
https://pypi.org/project/playwright-use/
PyPI
playwright-use
Natural-language UI test runner
Python: capture stdout and stderr in unittest
The article explains how to capture stdout and stderr during Python unittest runs using contextlib.redirectstdout and redirectstderr, enabling tests to programmatically access console output. It also provides examples and custom context managers to simplify capturing both streams simultaneously, improving test logging and debugging capabilities.
https://adamj.eu/tech/2025/08/29/python-unittest-capture-stdout-stderr/
The article explains how to capture stdout and stderr during Python unittest runs using contextlib.redirectstdout and redirectstderr, enabling tests to programmatically access console output. It also provides examples and custom context managers to simplify capturing both streams simultaneously, improving test logging and debugging capabilities.
https://adamj.eu/tech/2025/08/29/python-unittest-capture-stdout-stderr/
adamj.eu
Python: capture stdout and stderr in unittest - Adam Johnson
When testing code that outputs to the terminal through either standard out (stdout) or standard error (stderr), you might want to capture that output and make assertions on it. To do so, use contextlib.redirect_stdout() and contextlib.redirect_stderr() to…
When You No Longer Need That Object • Dealing With Garbage in Python
Let's explore reference counting and cyclic garbage collection in Python.
https://www.thepythoncodingstack.com/p/python-garbage-collection-reference-counting-and-cyclic
Let's explore reference counting and cyclic garbage collection in Python.
https://www.thepythoncodingstack.com/p/python-garbage-collection-reference-counting-and-cyclic
Thepythoncodingstack
When You No Longer Need That Object • Dealing With Garbage in Python
Let's explore reference counting and cyclic garbage collection in Python
Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels
The post describes how AI models can automatically generate optimized Metal GPU kernels that speed up PyTorch inference on Apple devices by an average of 87% across 215 modules, with some kernels running hundreds of times faster than baseline. Using an agentic swarm approach and adding context like CUDA references and profiling data, the system outperforms standalone models, making kerne...
https://gimletlabs.ai/blog/ai-generated-metal-kernels
The post describes how AI models can automatically generate optimized Metal GPU kernels that speed up PyTorch inference on Apple devices by an average of 87% across 215 modules, with some kernels running hundreds of times faster than baseline. Using an agentic swarm approach and adding context like CUDA references and profiling data, the system outperforms standalone models, making kerne...
https://gimletlabs.ai/blog/ai-generated-metal-kernels
Gimlet Blog
A blog about research on high performance AI systems.
PageIndex
PageIndex is a reasoning-based RAG system that simulates how human experts navigate and extract knowledge from long documents through tree search, enabling LLMs to think and reason their way to the most relevant document sections.
https://github.com/VectifyAI/PageIndex
PageIndex is a reasoning-based RAG system that simulates how human experts navigate and extract knowledge from long documents through tree search, enabling LLMs to think and reason their way to the most relevant document sections.
https://github.com/VectifyAI/PageIndex
GitHub
GitHub - VectifyAI/PageIndex: 📄🧠 PageIndex: Document Index for Reasoning-based RAG
📄🧠 PageIndex: Document Index for Reasoning-based RAG - VectifyAI/PageIndex
❤1
Niche Python tools, libraries and features - whats your favourite?
https://www.reddit.com/r/Python/comments/1n7r4xb/niche_python_tools_libraries_and_features_whats/
https://www.reddit.com/r/Python/comments/1n7r4xb/niche_python_tools_libraries_and_features_whats/
Reddit
From the Python community on Reddit
Explore this post and more from the Python community
How I write Django views
The author advocates using Django's base View class over generic class-based or function-based views for simplicity and flexibility in handling HTTP requests. By avoiding complex mixins and leveraging straightforward helper methods, developers can write clearer, more maintainable view code with minimal cognitive overhead.
https://www.loopwerk.io/articles/2025/django-views/
The author advocates using Django's base View class over generic class-based or function-based views for simplicity and flexibility in handling HTTP requests. By avoiding complex mixins and leveraging straightforward helper methods, developers can write clearer, more maintainable view code with minimal cognitive overhead.
https://www.loopwerk.io/articles/2025/django-views/
Loopwerk
How I write Django views
Why I only use Django's base View class instead of generic class-based views or function-based views.
❤1
toolfront
Simple data retrieval for AI with unmatched control, precision, and speed.
https://github.com/kruskal-labs/toolfront
Simple data retrieval for AI with unmatched control, precision, and speed.
https://github.com/kruskal-labs/toolfront
GitHub
GitHub - statespace-tech/toolfront: Data environments for AI agents
Data environments for AI agents. Contribute to statespace-tech/toolfront development by creating an account on GitHub.
❤1
Sharing a mutable reference between Rust and Python
https://blog.lilyf.org/posts/python-mutable-reference/
https://blog.lilyf.org/posts/python-mutable-reference/
Lily's Blog
Sharing a mutable reference with Python
Background
As part of my ongoing project to reimplement Django’s templating language in Rust, I have been adding support for custom template tags.
Simple tags
The simplest custom tag will look something like:
# time_tags.py
from datetime import datetime…
As part of my ongoing project to reimplement Django’s templating language in Rust, I have been adding support for custom template tags.
Simple tags
The simplest custom tag will look something like:
# time_tags.py
from datetime import datetime…
Youtu-agent
A simple yet powerful agent framework that delivers with open-source models.
https://github.com/Tencent/Youtu-agent
A simple yet powerful agent framework that delivers with open-source models.
https://github.com/Tencent/Youtu-agent
GitHub
GitHub - TencentCloudADP/youtu-agent: A simple yet powerful agent framework that delivers with open-source models
A simple yet powerful agent framework that delivers with open-source models - TencentCloudADP/youtu-agent
Zuban
Zuban is a high-performance Python Language Server and type checker implemented in Rust, by the author of Jedi. Zuban is 20–200× faster than Mypy, while using roughly half the memory and CPU compared to Ty and Pyrefly. It offers both a PyRight-like mode and a Mypy-compatible mode, which behaves just like Mypy; supporting the same config files, command-line flags, and error messages.
https://github.com/zubanls/zuban
Zuban is a high-performance Python Language Server and type checker implemented in Rust, by the author of Jedi. Zuban is 20–200× faster than Mypy, while using roughly half the memory and CPU compared to Ty and Pyrefly. It offers both a PyRight-like mode and a Mypy-compatible mode, which behaves just like Mypy; supporting the same config files, command-line flags, and error messages.
https://github.com/zubanls/zuban
GitHub
GitHub - zubanls/zuban: Python Type Checker / Language Server
Python Type Checker / Language Server. Contribute to zubanls/zuban development by creating an account on GitHub.
WhisperLiveKit
Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI.
https://github.com/QuentinFuxa/WhisperLiveKit
Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI.
https://github.com/QuentinFuxa/WhisperLiveKit
GitHub
GitHub - QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text server.
Real-time & local speech-to-text server. Contribute to QuentinFuxa/WhisperLiveKit development by creating an account on GitHub.
👏1
yesglot
LLM-powered Django translations. Just call me "python manage.py translatemessages"
https://github.com/efe/yesglot
LLM-powered Django translations. Just call me "python manage.py translatemessages"
https://github.com/efe/yesglot
GitHub
GitHub - efe/yesglot: LLM-powered Django translations ✨ Just call me "python manage.py translatemessages"
LLM-powered Django translations ✨ Just call me "python manage.py translatemessages" - efe/yesglot
Polars GPU Execution. (70% speed up)
Polars' new GPU engine, powered by NVIDIA RAPIDS cuDF, accelerates data processing up to 70% compared to CPU-based execution, enabling faster handling of large datasets. The beta release supports common operations, leveraging GPU parallel processing for significant performance gains in data analytics workflows.
https://dataengineeringcentral.substack.com/p/polars-gpu-execution-70-speed-up
Polars' new GPU engine, powered by NVIDIA RAPIDS cuDF, accelerates data processing up to 70% compared to CPU-based execution, enabling faster handling of large datasets. The beta release supports common operations, leveraging GPU parallel processing for significant performance gains in data analytics workflows.
https://dataengineeringcentral.substack.com/p/polars-gpu-execution-70-speed-up
Substack
Polars GPU Execution. (70% speed up)
to the moon
vLLM with torch.compile: Efficient LLM inference on PyTorch
Learn how to optimize PyTorch code with minimal effort using torch.compile, a just-in-time compiler that generates optimized kernels automatically.
https://developers.redhat.com/articles/2025/09/03/vllm-torchcompile-efficient-llm-inference-pytorch
Learn how to optimize PyTorch code with minimal effort using torch.compile, a just-in-time compiler that generates optimized kernels automatically.
https://developers.redhat.com/articles/2025/09/03/vllm-torchcompile-efficient-llm-inference-pytorch
Red Hat Developer
vLLM with torch.compile: Efficient LLM inference on PyTorch | Red Hat Developer
NoteThis post was originally published on the vLLM blog
sync-with-uv
The sync-with-uv package automates version synchronization between uv.lock and .pre-commit-config.yaml, ensuring consistent dependency management for tools like black, ruff, and mypy. It integrates as a pre-commit hook, streamlining workflows by aligning versions from a single source while leaving unspecified tools unchanged.
https://github.com/tsvikas/sync-with-uv
The sync-with-uv package automates version synchronization between uv.lock and .pre-commit-config.yaml, ensuring consistent dependency management for tools like black, ruff, and mypy. It integrates as a pre-commit hook, streamlining workflows by aligning versions from a single source while leaving unspecified tools unchanged.
https://github.com/tsvikas/sync-with-uv
GitHub
GitHub - tsvikas/sync-with-uv: Sync .pre-commit-config.yaml from uv.lock
Sync .pre-commit-config.yaml from uv.lock. Contribute to tsvikas/sync-with-uv development by creating an account on GitHub.