๐๐ธ 500$ FOR THE FIRST 500 WHO JOIN THE CHANNEL! ๐๐ธ
Join our channel today for free! Tomorrow it will cost 500$!
https://t.me/+-WZeIeP8YI8wM2E6
You can join at this link! ๐๐
https://t.me/+-WZeIeP8YI8wM2E6
Join our channel today for free! Tomorrow it will cost 500$!
https://t.me/+-WZeIeP8YI8wM2E6
You can join at this link! ๐๐
https://t.me/+-WZeIeP8YI8wM2E6
โค6๐2๐2๐ฏ2
๐ Demystifying Activation Functions! ๐ง โจ
Ever wondered why activation functions are so critical in neural networks? ๐ค๐ค
Theyโre the secret sauce that allows models to capture complex, nonlinear relationships! ๐ฅ๐
Do you want to learn how to implement an artificial neural network from scratch in Python using NumPy? ๐๐
Learn more in super-detailed guide: https://lnkd.in/e4CydTtB ๐๐
#NeuralNetworks #DeepLearning #ActivationFunctions #Python #NumPy #AI
Ever wondered why activation functions are so critical in neural networks? ๐ค๐ค
Theyโre the secret sauce that allows models to capture complex, nonlinear relationships! ๐ฅ๐
Do you want to learn how to implement an artificial neural network from scratch in Python using NumPy? ๐๐
Learn more in super-detailed guide: https://lnkd.in/e4CydTtB ๐๐
#NeuralNetworks #DeepLearning #ActivationFunctions #Python #NumPy #AI
โค6๐ฅ2๐1
reader3 ๐โจ
When you want to connect an AI like Gemini to help you analyze books or content, copying text from a reader usually becomes a hassle. ๐ฉ๐ป
Especially if you want to discuss a book by chapters. Highlighting text manually and copying it disrupts the flow and feels like a waste of time. โณ๐ซ
Yesterday, Andrzej Karpati, a well-known AI expert, released a new project to the public: reader3, which solves this problem very neatly. ๐๐ ๏ธ It's a lightweight EPUB reader that allows you to read a book together with AI. ๐ค๐
Its interface is as minimalist as possible: only the necessary reading and navigation functions. ๐๐งญ You can also manage your library through folders. ๐โจ
The key feature is that it breaks an EPUB into chapters and displays the content one chapter at a time. ๐๐
This makes it easy to copy the needed part of the book and pass it to a large model for analysis or discussion. ๐๐ It significantly improves the reading experience when paired with AI. ๐๐ง
And it's very easy to get started - just run two commands via uv. โก๐ ๏ธ As a result, it's an excellent tool for those who love reading and want to use AI as a companion for text analysis. ๐๐ค๐ค
๐ Language: #Python 61.0%
โญ๏ธ Stars: 1.5k
โก๏ธ Link to GitHub https://github.com/karpathy/reader3
#AI #Python #Reader3 #Tech #BookLovers #Github
https://t.me/CodeProgrammerโ
When you want to connect an AI like Gemini to help you analyze books or content, copying text from a reader usually becomes a hassle. ๐ฉ๐ป
Especially if you want to discuss a book by chapters. Highlighting text manually and copying it disrupts the flow and feels like a waste of time. โณ๐ซ
Yesterday, Andrzej Karpati, a well-known AI expert, released a new project to the public: reader3, which solves this problem very neatly. ๐๐ ๏ธ It's a lightweight EPUB reader that allows you to read a book together with AI. ๐ค๐
Its interface is as minimalist as possible: only the necessary reading and navigation functions. ๐๐งญ You can also manage your library through folders. ๐โจ
The key feature is that it breaks an EPUB into chapters and displays the content one chapter at a time. ๐๐
This makes it easy to copy the needed part of the book and pass it to a large model for analysis or discussion. ๐๐ It significantly improves the reading experience when paired with AI. ๐๐ง
And it's very easy to get started - just run two commands via uv. โก๐ ๏ธ As a result, it's an excellent tool for those who love reading and want to use AI as a companion for text analysis. ๐๐ค๐ค
๐ Language: #Python 61.0%
โญ๏ธ Stars: 1.5k
โก๏ธ Link to GitHub https://github.com/karpathy/reader3
#AI #Python #Reader3 #Tech #BookLovers #Github
https://t.me/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
โค6๐4๐ฅ2๐1๐1
Forwarded from Learn Python Coding
Cheat sheet on the basics of Python: ๐๐
basic syntax and language rules ๐
scalar types โ basic data types (int, float, bool, str, NoneType) ๐ข
datetime โ working with date and time ๐ โฐ
data structures โ Python data structures (list, tuple, dict, set) ๐
list โ mutable lists for storing data collections ๐
tuple โ immutable sequences of values ๐
dict (hash map) โ storing data in a key-value format ๐
set โ unique elements without order ๐
slicing โ obtaining parts of sequences through indices and step โ๏ธ
module/library โ connecting modules and libraries ๐
help functions โ using help() and dir() to explore the Python API ๐
#Python #Coding #DataScience #Programming #Tech #DevCommunity
basic syntax and language rules ๐
scalar types โ basic data types (int, float, bool, str, NoneType) ๐ข
datetime โ working with date and time ๐ โฐ
data structures โ Python data structures (list, tuple, dict, set) ๐
list โ mutable lists for storing data collections ๐
tuple โ immutable sequences of values ๐
dict (hash map) โ storing data in a key-value format ๐
set โ unique elements without order ๐
slicing โ obtaining parts of sequences through indices and step โ๏ธ
module/library โ connecting modules and libraries ๐
help functions โ using help() and dir() to explore the Python API ๐
#Python #Coding #DataScience #Programming #Tech #DevCommunity
โค2๐2๐1
Forwarded from Machine Learning
๐ฃ Rust Interview Deep Dive ๐ฆ๐
A repository for systematic preparation for Rust interviews at the middle, senior, and staff levels. ๐ผ๐
Inside 100 real questions from interviews in product and infrastructure companies, detailed analyses with code examples and scenarios of tasks that occur in production. ๐ป๐๏ธ Not "guess the program's output", but the mechanics on which real services are built. ๐ ๏ธ๐
Here are lock-free structures, self-referential types in async, FFI with tensor libraries, correct Send on guards via await, memory ordering under loom, soundness of custom collections. ๐โก And it all starts with the basics. Ownership, borrowing, lifetimes. ๐งฑ๐ Those who want can start from scratch or at the staff level. ๐ถโโ๏ธ๐จโ๐ป
https://github.com/Develp10/rustinterviewquiestions ๐
#Rust #Programming #InterviewPrep #SoftwareEngineering #SystemsProgramming #CareerGrowth
A repository for systematic preparation for Rust interviews at the middle, senior, and staff levels. ๐ผ๐
Inside 100 real questions from interviews in product and infrastructure companies, detailed analyses with code examples and scenarios of tasks that occur in production. ๐ป๐๏ธ Not "guess the program's output", but the mechanics on which real services are built. ๐ ๏ธ๐
Here are lock-free structures, self-referential types in async, FFI with tensor libraries, correct Send on guards via await, memory ordering under loom, soundness of custom collections. ๐โก And it all starts with the basics. Ownership, borrowing, lifetimes. ๐งฑ๐ Those who want can start from scratch or at the staff level. ๐ถโโ๏ธ๐จโ๐ป
https://github.com/Develp10/rustinterviewquiestions ๐
#Rust #Programming #InterviewPrep #SoftwareEngineering #SystemsProgramming #CareerGrowth
GitHub
GitHub - Develp10/rustinterviewquiestions: Rust ะฒะพะฟะพััั ั ัะพะฑะตัะตะดะพะฒะฐะฝะธะน
Rust ะฒะพะฟะพััั ั ัะพะฑะตัะตะดะพะฒะฐะฝะธะน . Contribute to Develp10/rustinterviewquiestions development by creating an account on GitHub.
2โค6๐1๐1
AI is moving fast. Accountability is not.
That is why we built the open source core of Forkit Dev.
Forkit Dev introduces Model Passports and Agent Passports so AI systems can be tracked, verified, and understood across their lifecycle.
Open source repo:
https://github.com/arpitasarker01/Forkit_Dev
If you care about trustworthy AI, open source infrastructure, model lineage, or compliance ready deployment, check it out and share your thoughts.
That is why we built the open source core of Forkit Dev.
Forkit Dev introduces Model Passports and Agent Passports so AI systems can be tracked, verified, and understood across their lifecycle.
Open source repo:
https://github.com/arpitasarker01/Forkit_Dev
If you care about trustworthy AI, open source infrastructure, model lineage, or compliance ready deployment, check it out and share your thoughts.
GitHub
GitHub - Forkit-Dev-Core/Forkit_Dev: Forkit Core is an open source passport layer for AI models and agents with GitHub CI validationโฆ
Forkit Core is an open source passport layer for AI models and agents with GitHub CI validation, local verification, and Hugging Face-compatible export. - Forkit-Dev-Core/Forkit_Dev
2๐3โค2๐1
Machine Learning with Python pinned ยซAI is moving fast. Accountability is not. That is why we built the open source core of Forkit Dev. Forkit Dev introduces Model Passports and Agent Passports so AI systems can be tracked, verified, and understood across their lifecycle. Open source repo:โฆยป
Forwarded from Machine Learning
๐ Master Binary Classification with Neural Networks! ๐ง โจ
Ever wondered how to build a neural network from scratch in Python using NumPy? ๐๐
Binary classification is at the heart of many machine learning applications. ๐ฏ๐ค
Our super-detailed guide walks you through the entire process step by step. ๐๐
๐ก Dive in and start building your own neural network today! ๐๐ฅ
https://tinztwinshub.com/data-science/a-beginners-guide-to-developing-an-artificial-neural-network-from-zero/
#MachineLearning #NeuralNetworks #Python #DataScience #AI #Tech
Ever wondered how to build a neural network from scratch in Python using NumPy? ๐๐
Binary classification is at the heart of many machine learning applications. ๐ฏ๐ค
Our super-detailed guide walks you through the entire process step by step. ๐๐
๐ก Dive in and start building your own neural network today! ๐๐ฅ
https://tinztwinshub.com/data-science/a-beginners-guide-to-developing-an-artificial-neural-network-from-zero/
#MachineLearning #NeuralNetworks #Python #DataScience #AI #Tech
โค7๐1
"Dive into Deep Learning" ๐๐ค is an open-source book that forms the mathematical foundation for large language models. ๐ง ๐
It covers linear algebra, mathematical analysis, probability theory, optimization methods, backpropagation, attention mechanisms, and transformer architectures. ๐งฎ๐๐
The book progressively moves from classical neural networks and convolutional neural networks to modern transformers and practical techniques used in large language models. ๐๐๐ง
It contains over 1,000 pages ๐ and provides clear explanations, practical examples, and exercises. โ ๐ Making it one of the most comprehensive free resources for understanding the mathematical structure of modern artificial intelligence systems and language models. ๐๐๐ค
arxiv.org/pdf/2106.11342 ๐
#DeepLearning #AI #MachineLearning #NeuralNetworks #Transformers #OpenSource
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
It covers linear algebra, mathematical analysis, probability theory, optimization methods, backpropagation, attention mechanisms, and transformer architectures. ๐งฎ๐๐
The book progressively moves from classical neural networks and convolutional neural networks to modern transformers and practical techniques used in large language models. ๐๐๐ง
It contains over 1,000 pages ๐ and provides clear explanations, practical examples, and exercises. โ ๐ Making it one of the most comprehensive free resources for understanding the mathematical structure of modern artificial intelligence systems and language models. ๐๐๐ค
arxiv.org/pdf/2106.11342 ๐
#DeepLearning #AI #MachineLearning #NeuralNetworks #Transformers #OpenSource
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค8๐4๐1๐1
Forwarded from Machine Learning
๐ฅ Awesome open-source project to learn more about Transformer Models! ๐คโจ
We found this interactive website that shows you visually how transformer models work. ๐๐
Transformer Explainer:
https://poloclub.github.io/transformer-explainer/
#TransformerModels #OpenSource #AI #MachineLearning #DataScience #Tech
We found this interactive website that shows you visually how transformer models work. ๐๐
Transformer Explainer:
https://poloclub.github.io/transformer-explainer/
#TransformerModels #OpenSource #AI #MachineLearning #DataScience #Tech
โค6๐1๐1
Forwarded from Data Analytics
Pandas vs Polars vs DuckDB: Which Library Should You Choose? ๐ค๐
pandas remains the default choice for notebooks, exploratory analysis, visualization, and machine learning workflows ๐๐. Polars focus on fast, memory-efficient DataFrame processing โก๐พ, while DuckDB brings a SQL-first approach for querying local files and embedded analytics ๐๏ธ๐.
Each tool fits a different kind of local data workflow ๐ ๏ธ. In this article, we compare pandas, Polars, and DuckDB across performance, architecture, interoperability, and real-world use cases ๐๐.
More: https://www.analyticsvidhya.com/blog/2026/05/pandas-vs-polars-vs-duckdb/ ๐
#DataScience #Pandas #Polars #DuckDB #Python #Analytics
pandas remains the default choice for notebooks, exploratory analysis, visualization, and machine learning workflows ๐๐. Polars focus on fast, memory-efficient DataFrame processing โก๐พ, while DuckDB brings a SQL-first approach for querying local files and embedded analytics ๐๏ธ๐.
Each tool fits a different kind of local data workflow ๐ ๏ธ. In this article, we compare pandas, Polars, and DuckDB across performance, architecture, interoperability, and real-world use cases ๐๐.
More: https://www.analyticsvidhya.com/blog/2026/05/pandas-vs-polars-vs-duckdb/ ๐
#DataScience #Pandas #Polars #DuckDB #Python #Analytics
โค4๐1
Found an easy way to learn math for ML: Mathematics for Machine Learning ๐๐
This is a curated collection on GitHub, including books, research papers, video lectures, and basic materials on math for studying and reviewing the mathematical foundations of machine learning. ๐๐
It helps build a stronger knowledge base by bringing together trusted resources around topics that machine learning engineers constantly encounter: linear algebra, mathematical analysis, probability theory, statistics, information theory, matrix calculus, and deep learning mathematics. ๐งฎ๐ค
Free public repository on GitHub. ๐ปโจ
https://github.com/dair-ai/Mathematics-for-ML
#MachineLearning #Mathematics #DataScience #Learning #GitHub #AI
โจ Join Best TG Channels
https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
This is a curated collection on GitHub, including books, research papers, video lectures, and basic materials on math for studying and reviewing the mathematical foundations of machine learning. ๐๐
It helps build a stronger knowledge base by bringing together trusted resources around topics that machine learning engineers constantly encounter: linear algebra, mathematical analysis, probability theory, statistics, information theory, matrix calculus, and deep learning mathematics. ๐งฎ๐ค
Free public repository on GitHub. ๐ปโจ
https://github.com/dair-ai/Mathematics-for-ML
#MachineLearning #Mathematics #DataScience #Learning #GitHub #AI
โจ Join Best TG Channels
https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
GitHub
GitHub - dair-ai/Mathematics-for-ML: ๐งฎ A collection of resources to learn mathematics for machine learning
๐งฎ A collection of resources to learn mathematics for machine learning - dair-ai/Mathematics-for-ML
โค7๐1
Forwarded from Machine Learning
๐ A huge open-source course on AI Engineering from scratch
In the repository, we've collected:
โ 435 lessons;
โ 320+ hours of content;
โ Python, TypeScript, and Rust;
โ AI agents, MCP servers, prompts, and AI skills.
Moreover, almost every lesson includes practical tasks, so this isn't just theory, but a full-fledged roadmap for AI Engineering. ๐
โ๏ธ Link to the repository
https://github.com/rohitg00/ai-engineering-from-scratch
#AI #MachineLearning #Python #Rust #OpenSource #Tech
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
In the repository, we've collected:
โ 435 lessons;
โ 320+ hours of content;
โ Python, TypeScript, and Rust;
โ AI agents, MCP servers, prompts, and AI skills.
Moreover, almost every lesson includes practical tasks, so this isn't just theory, but a full-fledged roadmap for AI Engineering. ๐
โ๏ธ Link to the repository
https://github.com/rohitg00/ai-engineering-from-scratch
#AI #MachineLearning #Python #Rust #OpenSource #Tech
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค8๐1
Autonomous AI research on Apple Silicon
Port of the project Karpathyโs autoresearch for Apple Silicon based on MLX, which implements autonomous research cycles with control via program.md ๐
Whatโs interesting:
โข native support for Apple Silicon without PyTorch/CUDA
โข fixed training budget (~5 minutes)
โข logging of results in results.tsv
โข simple structure for autonomous experiments
โข optimization of models for more efficient operation
https://github.com/trevin-creator/autoresearch-mlx ๐ฌ
#AppleSilicon #AIResearch #MLX #AutonomousAI #MachineLearning #OpenSource
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Port of the project Karpathyโs autoresearch for Apple Silicon based on MLX, which implements autonomous research cycles with control via program.md ๐
Whatโs interesting:
โข native support for Apple Silicon without PyTorch/CUDA
โข fixed training budget (~5 minutes)
โข logging of results in results.tsv
โข simple structure for autonomous experiments
โข optimization of models for more efficient operation
https://github.com/trevin-creator/autoresearch-mlx ๐ฌ
#AppleSilicon #AIResearch #MLX #AutonomousAI #MachineLearning #OpenSource
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค6
Transformer implementations for vision, audio, and AI agents ๐ค๐๏ธ๐ต
Repo: https://github.com/Nicolepcx/transformers-the-definitive-guide
#AI #MachineLearning #Vision #Audio #Agents #Tech
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Repo: https://github.com/Nicolepcx/transformers-the-definitive-guide
#AI #MachineLearning #Vision #Audio #Agents #Tech
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค3๐3
๐ HelloEncyclo Presale is LIVE!
Master the skills that matter โ Gen-AI, Data Science, Machine Learning and more โ all in one place.
๐ First 250 members get a flat 40% OFF
Use code: PRESALE-BOOK-WAVE-2GFG
โ 13 full courses live right now
โ 40+ more dropping in the next 2โ3 weeks
โ Complete library within 2 months โ built and refined by industry experts
โ 15-day money-back guarantee โ don't love it? Get a full refund.
โ ๏ธ Coupon works only after you log in with Gmail, and it's valid once per member.
๐ Log in now and start learning:
https://helloencyclo.com
Don't wait โ the 40% deal disappears after the first 250 seats. ๐ฅ
Master the skills that matter โ Gen-AI, Data Science, Machine Learning and more โ all in one place.
๐ First 250 members get a flat 40% OFF
Use code: PRESALE-BOOK-WAVE-2GFG
โ 13 full courses live right now
โ 40+ more dropping in the next 2โ3 weeks
โ Complete library within 2 months โ built and refined by industry experts
โ 15-day money-back guarantee โ don't love it? Get a full refund.
โ ๏ธ Coupon works only after you log in with Gmail, and it's valid once per member.
๐ Log in now and start learning:
https://helloencyclo.com
Don't wait โ the 40% deal disappears after the first 250 seats. ๐ฅ
โค2๐2๐ฏ1
Stop discovering ML Python libraries one random tutorial at a time ๐
Best-of Machine Learning with Python is a curated GitHub index of open-source machine learning Python libraries for builders who need a faster way to compare the ecosystem ๐.
It helps you shortlist tools by grouping projects into categories and ranking them with a project-quality score based on metrics collected from GitHub and package managers ๐.
Key features:
โข 920-project index โ a large scan-friendly map of open-source ML Python projects ๐บ๏ธ
โข 34 categories โ browse by area like ML frameworks, NLP, image data, AutoML, deployment, interpretability, and more ๐งฉ
โข Quality-score ranking โ projects are ordered using an automated score from repo and package-manager signals โ๏ธ
โข Rich project metadata โ entries show signals like stars, forks, issues, contributors, activity, downloads, and dependencies ๐
โข Weekly updates + contributions โ the list is updated regularly and can be improved via issues, PRs, or projects.yaml edits ๐
Itโs open-source (CC BY-SA 4.0 license) ๐.
https://github.com/lukasmasuch/best-of-ml-python ๐
#MachineLearning #Python #ML #OpenSource #DataScience #TechStack
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Best-of Machine Learning with Python is a curated GitHub index of open-source machine learning Python libraries for builders who need a faster way to compare the ecosystem ๐.
It helps you shortlist tools by grouping projects into categories and ranking them with a project-quality score based on metrics collected from GitHub and package managers ๐.
Key features:
โข 920-project index โ a large scan-friendly map of open-source ML Python projects ๐บ๏ธ
โข 34 categories โ browse by area like ML frameworks, NLP, image data, AutoML, deployment, interpretability, and more ๐งฉ
โข Quality-score ranking โ projects are ordered using an automated score from repo and package-manager signals โ๏ธ
โข Rich project metadata โ entries show signals like stars, forks, issues, contributors, activity, downloads, and dependencies ๐
โข Weekly updates + contributions โ the list is updated regularly and can be improved via issues, PRs, or projects.yaml edits ๐
Itโs open-source (CC BY-SA 4.0 license) ๐.
https://github.com/lukasmasuch/best-of-ml-python ๐
#MachineLearning #Python #ML #OpenSource #DataScience #TechStack
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค5
Forwarded from Machine Learning
Data leakage is one of the main reasons why ML demos look impressive... and then fail in production. ๐
The model didn't become smarter.
It just happened to see the correct answers in advance.
In 4 minutes, you'll understand where data leaks hide. ๐
Let's break it down below: ๐
1. Data Leakage ๐ณ๏ธ
Data leakage occurs when information that won't be available at the time of actual prediction is used during the model training process.
Because of this, metrics on the validation stage can look much better than the actual quality of the model on new, previously unseen data.
2. Model Evaluation โ๏ธ
The test set isn't just "additional data".
It's a simulation of the future.
Only train the model on the information that would have been available to you at the time of prediction.
Evaluate it on examples that the model couldn't have influenced during training.
3. Direct Leakage ๐จ
This is the most obvious type of leakage.
Examples:
- a field with information from the future;
- an ID that encodes the target variable;
- a variable that appears only after an event has occurred;
- duplicate records in both the training and test sets.
If a feature doesn't exist at the time of inference (prediction), then it's likely a source of data leakage.
4. Indirect Leakage ๐ต๏ธ
This is the type of leakage that most often traps teams.
You perform normalization, imputation, feature selection, outlier removal, or dimensionality reduction before splitting the data into a training and test set.
The model didn't directly see the data from the test set.
But your preprocessing pipeline already saw it.
5. Train/Test Split โ๏ธ
Wrong:
Right:
The same idea applies to imputers, encoders, feature selection, PCA, and any preprocessing step that is trained on the data.
6. Cross-Validation ๐
Each fold is a mini-experiment with a training and test set.
Therefore, preprocessing should be performed within each fold.
If you prepared the entire dataset once and then ran cross-validation, each fold would already have had access to its held-out data.
7. Pipelines ๐ ๏ธ
A pipeline isn't just a way to make the code cleaner.
It's also a defense against data leakage.
Combine preprocessing, feature selection, and the model into a single pipeline, and then pass this pipeline to cross-validation or hyperparameter search (grid search).
8. AI Engineering Version ๐ค
Data leaks also occur in RAG systems and when evaluating LLMs.
Leakage occurs when you tune chunks, prompts, re-rankers, thresholds, or examples on the same evaluation dataset that you later present as "held-out".
As a result, your benchmark turns into training data.
9. Leakage Checklist โ
Before trusting the obtained metric, ask yourself:
- Could this feature exist at the time of prediction?
- Was any transformation (transform) step trained (fit) on the test data?
- Did cross-validation include the entire pipeline?
- Were we tuning parameters on the final evaluation dataset?
If the answer is "yes", then the metric likely doesn't reflect the actual quality of the model.
#MachineLearning #DataScience #MLOps #DataLeakage #ArtificialIntelligence #TechTips
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
The model didn't become smarter.
It just happened to see the correct answers in advance.
In 4 minutes, you'll understand where data leaks hide. ๐
Let's break it down below: ๐
1. Data Leakage ๐ณ๏ธ
Data leakage occurs when information that won't be available at the time of actual prediction is used during the model training process.
Because of this, metrics on the validation stage can look much better than the actual quality of the model on new, previously unseen data.
2. Model Evaluation โ๏ธ
The test set isn't just "additional data".
It's a simulation of the future.
Only train the model on the information that would have been available to you at the time of prediction.
Evaluate it on examples that the model couldn't have influenced during training.
3. Direct Leakage ๐จ
This is the most obvious type of leakage.
Examples:
- a field with information from the future;
- an ID that encodes the target variable;
- a variable that appears only after an event has occurred;
- duplicate records in both the training and test sets.
If a feature doesn't exist at the time of inference (prediction), then it's likely a source of data leakage.
4. Indirect Leakage ๐ต๏ธ
This is the type of leakage that most often traps teams.
You perform normalization, imputation, feature selection, outlier removal, or dimensionality reduction before splitting the data into a training and test set.
The model didn't directly see the data from the test set.
But your preprocessing pipeline already saw it.
5. Train/Test Split โ๏ธ
Wrong:
fit the scaler on all data โ split the data โ evaluate
Right:
split the data โ fit the scaler only on the training set โ apply it to both the training and test sets
The same idea applies to imputers, encoders, feature selection, PCA, and any preprocessing step that is trained on the data.
6. Cross-Validation ๐
Each fold is a mini-experiment with a training and test set.
Therefore, preprocessing should be performed within each fold.
If you prepared the entire dataset once and then ran cross-validation, each fold would already have had access to its held-out data.
7. Pipelines ๐ ๏ธ
A pipeline isn't just a way to make the code cleaner.
It's also a defense against data leakage.
Combine preprocessing, feature selection, and the model into a single pipeline, and then pass this pipeline to cross-validation or hyperparameter search (grid search).
8. AI Engineering Version ๐ค
Data leaks also occur in RAG systems and when evaluating LLMs.
Leakage occurs when you tune chunks, prompts, re-rankers, thresholds, or examples on the same evaluation dataset that you later present as "held-out".
As a result, your benchmark turns into training data.
9. Leakage Checklist โ
Before trusting the obtained metric, ask yourself:
- Could this feature exist at the time of prediction?
- Was any transformation (transform) step trained (fit) on the test data?
- Did cross-validation include the entire pipeline?
- Were we tuning parameters on the final evaluation dataset?
If the answer is "yes", then the metric likely doesn't reflect the actual quality of the model.
#MachineLearning #DataScience #MLOps #DataLeakage #ArtificialIntelligence #TechTips
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Telegram
AI PYTHON ๐
Youโve been invited to add the folder โAI PYTHON ๐โ, which includes 14 chats.
โค5
Forwarded from Github Top Repositories
๐ DataTalksClub/data-engineering-zoomcamp caught my eye on GitHub Trending today.
๐ https://github.com/DataTalksClub/data-engineering-zoomcamp
๐ Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here ๐๐ผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Data Engineering Zoomcamp is a free 9-week course that covers the fundamentals of data engineering. It's designed to help you build an end-to-end data pipeline from scratch, with hands-on experience using industry-standard tools and best practices.
Key features of the course include structured modules, hands-on workshops, and a final project to reinforce your learning. You'll learn about
The course is suitable for anyone with basic coding experience and familiarity with
The course has a strong community and support system, with a dedicated #course-data-engineering channel on Slack for discussions and troubleshooting.
The course is taught by experienced instructors, including Alexey Grigorev and Michael Shoemaker, and is sponsored by companies like Kestra and Bruin.
Overall, the Data Engineering Zoomcamp is a great resource for anyone looking to learn data engineering fundamentals and build a career in the field.
So, what are you waiting for? Join the course and start building your skills today - it's a
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ง Channel: https://t.me/GithubRe
๐ https://github.com/DataTalksClub/data-engineering-zoomcamp
๐ Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here ๐๐ผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Data Engineering Zoomcamp is a free 9-week course that covers the fundamentals of data engineering. It's designed to help you build an end-to-end data pipeline from scratch, with hands-on experience using industry-standard tools and best practices.
Key features of the course include structured modules, hands-on workshops, and a final project to reinforce your learning. You'll learn about
containerization, infrastructure as code, workflow orchestration, data warehousing, and analytics engineering. The course is suitable for anyone with basic coding experience and familiarity with
SQL. No prior data engineering experience is necessary. You can enroll in the course by registering for the next cohort or following the self-paced learning path.The course has a strong community and support system, with a dedicated #course-data-engineering channel on Slack for discussions and troubleshooting.
The course is taught by experienced instructors, including Alexey Grigorev and Michael Shoemaker, and is sponsored by companies like Kestra and Bruin.
Overall, the Data Engineering Zoomcamp is a great resource for anyone looking to learn data engineering fundamentals and build a career in the field.
So, what are you waiting for? Join the course and start building your skills today - it's a
free 9-week course that can change your career!โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ง Channel: https://t.me/GithubRe
โค4
Interactive Explainer ๐ง โจ
The Anatomy of an LLM ๐
A visual walk through the machinery inside a large language model: from raw text, to tokens, to vectors, to attention, to the next token. โ๏ธ๐งฌ
๐ Link: https://www.royvanrijn.com/anatomy-of-an-llm/
#LLM #AI #Tech #NeuralNetworks #MachineLearning #DeepLearning
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
The Anatomy of an LLM ๐
A visual walk through the machinery inside a large language model: from raw text, to tokens, to vectors, to attention, to the next token. โ๏ธ๐งฌ
๐ Link: https://www.royvanrijn.com/anatomy-of-an-llm/
#LLM #AI #Tech #NeuralNetworks #MachineLearning #DeepLearning
โจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk
โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Roy van Rijn
The Anatomy of an LLM | Interactive Visual Guide to How Language Models Work
An interactive visual explainer for developers showing how LLMs work, from tokenization and embeddings to attention, transformers, training, KV cache, and quantization.
โค7