PythonHub
2.44K subscribers
2.35K photos
49.3K links
News & links about Python programming.
https://pythonhub.dev/
Download Telegram
Tiny LLM - LLM Serving in a Week

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

https://skyzh.github.io/tiny-llm/
Context Engineering - Short-Term Memory Management with Sessions from OpenAI Agents SDK

The guide demonstrates how to use the OpenAI Agents SDK’s Session object to manage short-term memory in AI agents, enabling context trimming and compression for efficient, coherent, and cost-effective multi-turn conversations. Effective session memory ensures agents maintain relevant history across turns while reducing noise, latency, and error risk in longer interactions.

https://cookbook.openai.com/examples/agents_sdk/session_memory
Python Tutorial: Build an AI-assisted Reddit Scraping Pipeline

The video provides an in-depth, hands-on tutorial for building a resilient, AI-assisted Reddit scraping pipeline in Python, covering everything from Jupyter prototyping and LangChain agents to a Django-based background worker architecture. It teaches viewers to automate web scraping, integrate Google’s Gemini LLM for query refinement, and store structured results in PostgreSQL, suitable ...

https://www.youtube.com/watch?v=XI-iP-qk_Vk
Defeating Nondeterminism in LLM Inference

LLM inference is often nondeterministic even with temperature set to zero, primarily due to batch-size-dependent kernel behaviors that change results based on server load rather than randomness or floating-point issues. The solution is to use batch-invariant kernels, ensuring reproducible outputs even in high-concurrency environments, which is now possible but may come with some efficien...

https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference
riffq

A toolkit for building PostgreSQL wire-compatible databases in Python, powered by Rust for performance and concurrency.

https://github.com/ybrs/riffq
Tricks from OpenAI gpt-oss YOU ?? can use with transformers

The post details major upgrades that allow models like OpenAI’s GPT-OSS to run, fine-tune, and scale efficiently, including zero-build kernels, 4-bit MXFP4 quantization, tensor and expert parallelism, dynamic layerwise caching, and continuous batching. These improvements cut memory usage, boost speed, and enable larger models to run on affordable hardware, making cutting-edge techniques ...

https://huggingface.co/blog/faster-transformers
Avoid Messy Code: Design Patterns for AI Agents in Python

The video demonstrates how to keep Python code for AI agents clean and maintainable by applying design patterns like Chain of Responsibility (for modular pipelines), Observer (for agent logging and context), and Strategy (for pluggable agent personalities). These patterns help decompose logic, improve scalability, and ensure testability for complex AI workflows.

https://www.youtube.com/watch?v=8_liatgLkLc
Hyperparameter Tuning Tips that 99% of Data Scientists Overlook

This video shows how to tune XGBoost models with Optuna while maximizing speed using XGBoost 3.0’s GPU acceleration for 5–15x faster training. He explains why cross-validation is crucial, recommends smart tuning practices, and demonstrates how Optuna’s visualizations help identify impactful hyperparameters in real-world tabular data workflows.

https://www.youtube.com/watch?v=D9xPjkOwpNk