Github Top Repositories
13.2K subscribers
1.04K photos
57 videos
10 files
1.73K links
Top GitHub repositories in one place ๐Ÿš€
Explore the best projects in programming, AI, data science, and more.
Download Telegram
Github Top Repositories
Photo
๐Ÿ”ฅ OpenBMB/VoxCPM is trending โ€” and it deserves your attention.

๐Ÿ”— https://github.com/OpenBMB/VoxCPM
๐Ÿ“ VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The VoxCPM2 project is a cutting-edge, tokenizer-free Text-to-Speech system that generates high-quality, multilingual speech using a diffusion autoregressive architecture. This innovative approach allows for ultra-realistic speech synthesis, voice design, and controllable voice cloning. With support for 30 languages and 48kHz studio-quality audio output, VoxCPM2 is a powerful tool for a wide range of applications.

To get started, you can install VoxCPM2 using pip install voxcpm and then use the Python API or CLI to generate speech. For example, you can use the following Python code to generate speech:
from voxcpm import VoxCPM
import soundfile as sf

model = VoxCPM.from_pretrained(
"openbmb/VoxCPM2",
load_denoiser=False,
)

wav = model.generate(
text="VoxCPM2 is the current recommended release for realistic multilingual speech synthesis.",
cfg_value=2.0,
inference_timesteps=10,
)
sf.write("demo.wav", wav, model.tts_model.sample_rate)

The project is fully open-source and commercial-ready, with weights and code released under the Apache-2.0 license. Whether you're a developer, researcher, or enthusiast, VoxCPM2 is an exciting project that's worth exploring. With its impressive features and ease of use, VoxCPM2 is set to revolutionize the world of speech synthesis: Experience the future of speech synthesis with VoxCPM2.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
โค1
๐Ÿ” Deep-diving into FareedKhan-dev/train-llm-from-scratch โ€” fresh off the trending list.

๐Ÿ”— https://github.com/FareedKhan-dev/train-llm-from-scratch
๐Ÿ“ A straightforward method for training your LLM, from downloading data to generating text.
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The GitHub repository "FareedKhan-dev/train-llm-from-scratch" is designed to train a large language model (LLM) from scratch using PyTorch, based on the paper "Attention is All You Need". The repository provides scripts to train a 13 million or billion parameter LLM using a single GPU.

The train-llm-from-scratch repository is structured into several directories, including src for model definitions, config for default configurations, data_loader for data loading functions, and scripts for training, data preprocessing, and text generation.

To use the repository, you need to clone it, install the required dependencies, and download the training data using the provided scripts. The training data is from the Pile dataset, a diverse and large-scale dataset for training language models.

You can modify the transformer architecture and training configurations according to your needs. The repository also provides a step-by-step code explanation to help you understand the implementation.

The key technical highlights include the implementation of transformer blocks, multi-head attention, and multi-layer perceptron (MLP) modules. The repository is suitable for researchers and developers interested in natural language processing and large language models.

One-liner takeaway: Train your own billion-parameter LLM from scratch with this repository, and unlock the power of large language models for your NLP tasks.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
โค1
๐Ÿš€ Meet stefan-jansen/machine-learning-for-trading: a gem from today's GitHub trending list.

๐Ÿ”— https://github.com/stefan-jansen/machine-learning-for-trading
๐Ÿ“ Code for Machine Learning for Algorithmic Trading, 2nd edition.
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The stefan-jansen/machine-learning-for-trading GitHub repository is a comprehensive resource for learning about machine learning in trading. It's based on a book that covers a broad range of machine learning techniques, from linear regression to deep reinforcement learning, and demonstrates how to build, backtest, and evaluate a trading strategy driven by model predictions.

The repository contains over 150 notebooks that put the concepts, algorithms, and use cases discussed in the book into action. These notebooks provide numerous examples that show how to work with and extract signals from market, fundamental, and alternative text and image data, how to train and tune models that predict returns for different asset classes and investment horizons, and how to design, backtest, and evaluate trading strategies.

The ML4T workflow is a key concept in the repository, which starts with generating ideas for a well-defined investment universe, collecting relevant data, and extracting informative features. It also involves designing, tuning, and evaluating machine learning models suited to the predictive task.

The repository is suitable for traders, data scientists, and machine learning enthusiasts who want to learn about machine learning in trading. The code examples rely on a wide range of Python libraries from the data science and finance domains, including pandas, TensorFlow, and zipline.

To get started, users can install the required libraries and run the notebooks, which are usually in an executed state and often contain additional information not included due to space constraints. The repository also provides detailed instructions on setting up and using a Docker image to run the notebooks.

In summary, the stefan-jansen/machine-learning-for-trading repository is a valuable resource for anyone who wants to learn about machine learning in trading and start building their own trading strategies. Machine learning can be a powerful tool for traders, and this repository provides the perfect starting point for exploring its potential.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
โค1
Github Top Repositories
Photo
๐ŸŽฏ dmtrKovalenko/fff landed on trending. Worth a proper look.

๐Ÿ”— https://github.com/dmtrKovalenko/fff
๐Ÿ“ The fastest and the most accurate file search toolkit for AI agents, Neovim, Rust, C, and NodeJS
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

fff is a file search toolkit designed for both humans and AI agents, offering really fast search capabilities. Its key features include typo-resistant path and content search, frecency-ranked file access, a background watcher, and a lightweight in-memory content index.

The toolkit is way faster than traditional CLIs like ripgrep and fzf, especially in long-running processes that search more than once. Initially started as a Neovim plugin, fff has evolved into a library that provides accurate and fast file search capabilities for various applications, including AI harnesses and code editors.

fff offers several components, including an MCP server and a Pi agent extension, each with its own set of features and installation instructions. The MCP server works with various AI clients, reducing the number of grep roundtrips and providing faster answers. The Pi extension, on the other hand, swaps the native tools for fff implementations and feeds the interactive editor's autocomplete from the frecency-ranked index.

For Neovim users, fff.nvim provides a public API with functions like find_files, live_grep, and scan_files, allowing for programmatic search and integration with other plugins. The plugin also offers customizable configuration options, commands, and keymaps.

Whether you're a developer, AI researcher, or simply a power user, fff is an incredibly powerful tool that can supercharge your file search capabilities. With its flexibility, customizability, and blazing-fast performance, fff is an essential addition to any workflow: search smarter, not harder, with fff.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
๐Ÿ”ฅ codecrafters-io/build-your-own-x is trending โ€” and it deserves your attention.

๐Ÿ”— https://github.com/codecrafters-io/build-your-own-x
๐Ÿ“ Master programming by recreating your favorite technologies from scratch.
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The codecrafters-io/build-your-own-x GitHub repository is a comprehensive collection of guides for building various technologies from scratch. The purpose of this repository is to provide a hands-on learning experience for developers, helping them understand complex systems by recreating them.

Key features include step-by-step tutorials for building a wide range of technologies, such as 3D renderers, AI models, augmented reality systems, blockchains, bots, command-line tools, databases, and more. The repository covers various programming languages, including C, C++, Java, Python, JavaScript, and many others.

To get started, users can browse the repository's table of contents and choose a technology to build. Each guide provides a detailed, step-by-step approach to building the technology, often accompanied by code examples and explanations.

From a technical standpoint, the guides cover various aspects of building these technologies, such as architecture, algorithms, data structures, and implementation details. The repository is suitable for developers of all levels, from beginners looking to learn new concepts to experienced developers seeking to deepen their understanding of complex systems.

In conclusion, the codecrafters-io/build-your-own-x repository is an invaluable resource for anyone looking to learn by doing. By building technologies from scratch, developers can gain a deeper understanding of how they work and develop practical skills to apply in their own projects. So, get building and take your skills to the next level!

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
๐ŸŽฐ Welcome Bonus 1200% โ€” Maczo Crypto Casino
๐ŸŽฎ Crypto exchange ยท Sports ยท Live casino โ€” all in one place
๐Ÿ’ณ USDT instant deposit & withdrawal
โ†’ https://tglink.io/c40ff2717b14c7
๐ŸŽ SPOTO Mid-Year Sale โ€“ Grab Your IT Certification Success Kit!

๐Ÿ”ฅ Whether you're prepping for #Python, #AI, #Cisco, #PMI, #Fortinet, #AWS, #Azure, #Excel, #Comptia, #ITIL, #Cloud or any other hot certification โ€“ SPOTO has your back with real exam dumps and hands-on training!

โœ… Free Resources:
ใƒปFree Python, Excel, Cyber Security, Cisco, SQL, ITIL, PMP, AWS courses: https://bit.ly/4alTSfk
ใƒปIT Certs E-book: https://bit.ly/49ub0zq
ใƒปIT Exams Skill Test: https://bit.ly/4dVPapB
ใƒปFree AI material and support tools: https://bit.ly/4elzcpl
ใƒปFree Cloud Study Guide: https://bit.ly/4u7sdG0

๐ŸŽ Join SPOTO Mid-Year Lucky Draw:
๐Ÿ“ฑ iPhone 17 ๐Ÿ›’ Free Order
๐Ÿ›’ Amazon Gift $100 ๐Ÿ“˜PMP/ AWS/ CCNA Course


๐Ÿ‘‰ Enter the Draw Now โ†’ https://bit.ly/4uN3lVt

๐Ÿ‘‰ Join Our IT Learning Community for free resources & support:
https://chat.whatsapp.com/FmbIbbqm2QhKglVpVTSH4d
๐Ÿ’ฌ Want exam help? Chat with an admin now:
https://wa.link/knicza

โฐ Mid-Year Deal Ends Soon โ€“ Don't Miss Out!
โค1
๐Ÿ’ก chopratejas/headroom just hit the trending charts โ€” here's why it matters.

๐Ÿ”— https://github.com/chopratejas/headroom
๐Ÿ“ Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Headroom is a context compression layer for AI agents that reduces the number of tokens by 60-95%, making it a game-changer for those who run AI coding agents daily. This library, proxy, and MCP server offers local-first and reversible compression, ensuring that originals are never deleted and can be retrieved on demand. With headroom, you can compress everything your AI agent reads, including tool outputs, logs, RAG chunks, files, and conversation history, without changing your code.

The technical highlights of headroom include its ability to work with multiple agents, providing shared memory and cross-agent compatibility. It also features a ContentRouter that detects content type and selects the right compressor, as well as SmartCrusher, CodeCompressor, and Kompress-base for compressing JSON, AST, and prose.

To get started with headroom, you can install it using pip install "headroom-ai[all]" or npm install headroom-ai, then pick your mode by wrapping an agent, using the proxy, or importing the library. The target audience for headroom includes developers who work with AI coding agents and want to reduce costs without compromising performance.

One-liner takeaway: With headroom, you can significantly reduce the number of tokens your AI agent processes, resulting in massive savings without sacrificing accuracy, and that's a total game-changer.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
๐Ÿ’ก microsoft/markitdown just hit the trending charts โ€” here's why it matters.

๐Ÿ”— https://github.com/microsoft/markitdown
๐Ÿ“ Python tool for converting files and office documents to Markdown.
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

MarkItDown is a Python utility for converting various files to Markdown, ideal for text analysis pipelines and large language models. It supports a wide range of file formats, including PDF, PowerPoint, Word, Excel, Images, Audio, and more.

Key features include:
- File format conversion to Markdown
- Preservation of important document structure and content
- Support for large language models and text analysis tools
- Optional dependencies for specific file formats
- Plugin support for extending functionality

Usage is straightforward, with both command-line and Python API interfaces available. For example, you can use the command-line interface like this:
markitdown path-to-file.pdf > document.md

Technical highlights include the use of Azure Content Understanding for higher-quality conversion and structured field extraction, as well as support for Azure Document Intelligence for document conversion.

The target audience for MarkItDown includes developers and data scientists working with text analysis pipelines and large language models.

One-liner takeaway: MarkItDown simplifies file format conversion to Markdown, making it easier to work with text analysis tools and large language models.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
Github Top Repositories
Photo
๐Ÿ“Œ Spotted on GitHub Trending: affaan-m/ECC โ€” let's break it down.

๐Ÿ”— https://github.com/affaan-m/ECC
๐Ÿ“ The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The ECC project is a harness-native operator system designed for agentic work, built from real-world multi-harness engineering workflows. It's not just about configurations, but a complete system that includes skills, instincts, memory optimization, continuous learning, security scanning, and research-first development. With 182K+ stars, 28K+ forks, and 170+ contributors, ECC supports 12+ language ecosystems and enables cross-harness agent workflows.

The system is production-ready and works across various AI agent harnesses, including Codex, Claude Code, Cursor, OpenCode, Gemini, Zed, and GitHub Copilot. ECC provides a range of features, including token optimization, memory persistence, continuous learning, and security scanning.

To get started, users can follow the Shorthand Guide, Longform Guide, or Security Guide, which cover setup, foundations, philosophy, and security best practices. The project also offers a dashboard GUI and a range of operator workflows, including brand-voice, social-graph-ranker, and customer-billing-ops.

The ECC community is active, with discussions, sponsorship, and pro subscriptions available. The project is MIT-licensed and will remain free and open-source forever.

In short, ECC is a powerful tool for agentic work that's constantly evolving to meet the needs of its users. Join the community and start building with ECC today - the future of agentic work is here!

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
โค1
Github Top Repositories
Photo
๐ŸŽฏ D4Vinci/Scrapling landed on trending. Worth a proper look.

๐Ÿ”— https://github.com/D4Vinci/Scrapling
๐Ÿ“ ๐Ÿ•ท๏ธ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Scrapling is an adaptive web scraping framework that streamlines the process of extracting data from websites. Its key features include an intelligent parser that learns from website changes, fetchers that bypass anti-bot systems, and a spider framework for concurrent, multi-session crawls.

Key highlights of Scrapling include:
- Selection methods for precise data extraction
- Fetchers for bypassing anti-bot systems like Cloudflare Turnstile
- Spiders for scalable, concurrent crawls
- Proxy Rotation for automatic rotation of proxies

Technical highlights include:
- Blazing fast crawls with real-time stats and streaming
- StealthyFetcher for fetching websites under the radar
- DynamicFetcher for handling dynamic content

Usage examples include:
from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
StealthyFetcher.adaptive = True
p = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
products = p.css('.product', auto_save=True)
products = p.css('.product', adaptive=True)


Audience: Web scrapers, data extraction professionals, and anyone looking to extract data from websites.

Scrapling handles everything from single requests to full-scale crawls, making it an essential tool for anyone looking to extract data from the web.

Scrapling in a nutshell: Scrape smarter, not harder.

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
Please open Telegram to view this post
VIEW IN TELEGRAM