Minions
Minions is a communication protocol that enables small on-device models to collaborate with frontier models in the cloud. By only reading long contexts locally, we can reduce cloud costs with minimal or no quality degradation.
https://github.com/HazyResearch/minions
Minions is a communication protocol that enables small on-device models to collaborate with frontier models in the cloud. By only reading long contexts locally, we can reduce cloud costs with minimal or no quality degradation.
https://github.com/HazyResearch/minions
GitHub
GitHub - HazyResearch/minions: Big & Small LLMs working together
Big & Small LLMs working together. Contribute to HazyResearch/minions development by creating an account on GitHub.
AIBrix
Cost-efficient and pluggable Infrastructure components for GenAI inference.
https://github.com/vllm-project/aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference.
https://github.com/vllm-project/aibrix
GitHub
GitHub - vllm-project/aibrix: Cost-efficient and pluggable Infrastructure components for GenAI inference
Cost-efficient and pluggable Infrastructure components for GenAI inference - vllm-project/aibrix
Craw4LLM
CRAW4LLM is an efficient web crawling method that prioritizes webpages based on their potential influence on LLM pretraining, replacing traditional graph-connectivity-based priorities. By crawling only 21% of URLs, it achieves the same downstream performance as previous methods, significantly reducing data waste and website burden.
https://github.com/cxcscmu/Craw4LLM
CRAW4LLM is an efficient web crawling method that prioritizes webpages based on their potential influence on LLM pretraining, replacing traditional graph-connectivity-based priorities. By crawling only 21% of URLs, it achieves the same downstream performance as previous methods, significantly reducing data waste and website burden.
https://github.com/cxcscmu/Craw4LLM
GitHub
GitHub - cxcscmu/Craw4LLM: Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining" - cxcscmu/Craw4LLM
Part 2 - Building an LLM RAG with PyFlyde & LangChain
https://blog.kodigy.com/post/visual-ai-engineering-with-pyflyde-pt2-rag/
https://blog.kodigy.com/post/visual-ai-engineering-with-pyflyde-pt2-rag/
A system brought to life
AI Engineering Goes Visual part 2: Building an LLM RAG with PyFlyde & LangChain
This part builds upon the Scraper app that we created in the part 1 and uses our visual programming skills to dive deeper in the LLM engineering world. This time we turn our article database into a RAG app, making use of LangChain, Vector Store and local…
How Python Evolves: From PEP to Feature
Explore how Python gets new functionality and how PEPs have a huge role in this, by looking at PEP484 and its interesting backstory.
https://www.youtube.com/watch?v=TzpOdpdX7pE
Explore how Python gets new functionality and how PEPs have a huge role in this, by looking at PEP484 and its interesting backstory.
https://www.youtube.com/watch?v=TzpOdpdX7pE
YouTube
How Python Evolves: From PEP to Feature
👉 Visit https://brilliant.org/ArjanCodes/ to try Brilliant for free for 30 days. You’ll also get 20% off an annual premium subscription.
In this video, I’ll explore how Python gets new functionality and how PEPs have a huge role in this, by looking at PEP484…
In this video, I’ll explore how Python gets new functionality and how PEPs have a huge role in this, by looking at PEP484…
CCTV_YOLO
Fast Real-time Object Detection with High-Res Output
https://github.com/SanshruthR/CCTV_YOLO
Fast Real-time Object Detection with High-Res Output
https://github.com/SanshruthR/CCTV_YOLO
GitHub
GitHub - SanshruthR/CCTV_YOLO: Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826…
Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826 https://x.com/githubprojects/status/1891370506537910724 https://www.threads.net/@githubproject...
👍1
AI Engineering Goes Visual: Web Scraping & Data Prep with PyFlyde
In part one of two part tutorial, we explore how to use Flyde, a visual programming tool, to build a web scraper that feeds into a Retrieval Augmentation Generation (RAG) system. We will cover the process of scraping web content and storing it locally, setting the stage for more advanced AI engineering tasks.
https://blog.kodigy.com/post/visual-ai-engineering-with-pyflyde-pt1-scraper/
In part one of two part tutorial, we explore how to use Flyde, a visual programming tool, to build a web scraper that feeds into a Retrieval Augmentation Generation (RAG) system. We will cover the process of scraping web content and storing it locally, setting the stage for more advanced AI engineering tasks.
https://blog.kodigy.com/post/visual-ai-engineering-with-pyflyde-pt1-scraper/
A system brought to life
AI Engineering Goes Visual part 1: Web Scraping & Data Prep with PyFlyde
In this tutorial, we explore how to use Flyde, a visual programming tool, to build a web scraper that feeds into a Retrieval Augmentation Generation (RAG) system. We will cover the process of scraping web content and storing it locally, setting the stage…
Python micro event loop library (~250 LOC)
https://gist.github.com/tarruda/5b8c19779c8ff4e8100f0b37eb5981ea
https://gist.github.com/tarruda/5b8c19779c8ff4e8100f0b37eb5981ea
Gist
Micro event loop library to teach the basic concepts of python coroutines and how event loop libraries might be implemented
Micro event loop library to teach the basic concepts of python coroutines and how event loop libraries might be implemented - micro_events.py
NotaGen
NotaGen is a symbolic music generation model leveraging Large Language Models (LLMs) through pre-training on 1.6M musical pieces, fine-tuning on classical compositions, and reinforcement learning using a novel CLaMP-DPO method.
https://github.com/ElectricAlexis/NotaGen
NotaGen is a symbolic music generation model leveraging Large Language Models (LLMs) through pre-training on 1.6M musical pieces, fine-tuning on classical compositions, and reinforcement learning using a novel CLaMP-DPO method.
https://github.com/ElectricAlexis/NotaGen
GitHub
GitHub - ElectricAlexis/NotaGen: NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training…
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms - ElectricAlexis/NotaGen
The features of Python's help() function
Python has a built-in help function for getting help... but what can do with help?
https://www.pythonmorsels.com/help-features/
Python has a built-in help function for getting help... but what can do with help?
https://www.pythonmorsels.com/help-features/
Pythonmorsels
The features of Python's help() function
Python's help() function accepts more than functions, modules, and objects. The help() function can lookup help for symbols, keywords, and topics!
olmOCR
A toolkit for training language models to work with PDF documents in the wild.
https://github.com/allenai/olmocr
A toolkit for training language models to work with PDF documents in the wild.
https://github.com/allenai/olmocr
GitHub
GitHub - allenai/olmocr: Toolkit for linearizing PDFs for LLM datasets/training
Toolkit for linearizing PDFs for LLM datasets/training - allenai/olmocr
How I Automated My Podcast Transcript Production With Local AI
The author automated podcast transcription using roboscribe, a Python tool that combines WhisperX for diarized transcription and a local Large Language Model (LLM) for cleaning up the transcript, significantly improving readability. By leveraging local AI models, the author maintains control and optimizes the transcription process on their own hardware, achieving high-quality results in ...
https://den.dev/blog/how-i-automated-podcast-transcription-with-local-ai/
The author automated podcast transcription using roboscribe, a Python tool that combines WhisperX for diarized transcription and a local Large Language Model (LLM) for cleaning up the transcript, significantly improving readability. By leveraging local AI models, the author maintains control and optimizes the transcription process on their own hardware, achieving high-quality results in ...
https://den.dev/blog/how-i-automated-podcast-transcription-with-local-ai/
den.dev
How I Automated My Podcast Transcript Production With Local AI
Discover my journey creating an open-source tool that uses WhisperX and local LLMs to automatically generate polished podcast transcripts. See how this practical AI application helps me maintain quality and accessibility for my listeners while saving countless…
❤2
MLX-Audio
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
https://github.com/Blaizzy/mlx-audio
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
https://github.com/Blaizzy/mlx-audio
GitHub
GitHub - Blaizzy/mlx-audio: A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX…
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon. - Blaizzy/mlx-audio
VoiceRestore
A cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings.
https://github.com/skirdey/voicerestore
A cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings.
https://github.com/skirdey/voicerestore
GitHub
GitHub - skirdey/voicerestore: VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration
VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration - skirdey/voicerestore
academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources.
https://github.com/apoorvkh/academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources.
https://github.com/apoorvkh/academic-pretraining
GitHub
GitHub - apoorvkh/academic-pretraining: $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources - apoorvkh/academic-pretraining
Performance of the Python 3.14 tail-call interpreter
https://blog.nelhage.com/post/cpython-tail-call/
https://blog.nelhage.com/post/cpython-tail-call/
Made of Bugs
Performance of the Python 3.14 tail-call interpreter
A deep dive into the performance of Python 3.14's tail-call interpreter: How the performance results were confounded by an LLVM regression, the surprising complexity of compiling interpreter loops, and some reflections on performance work, software engineering…