Machine Learning with Python
67.8K subscribers
1.45K photos
124 videos
193 files
1.16K links
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Forwarded from Machine Learning
๐Ÿ”ฅ Awesome open-source project to learn more about Transformer Models! ๐Ÿค–โœจ

We found this interactive website that shows you visually how transformer models work. ๐ŸŒ๐Ÿ“Š

Transformer Explainer:
https://poloclub.github.io/transformer-explainer/

#TransformerModels #OpenSource #AI #MachineLearning #DataScience #Tech
โค7๐Ÿ‘Ž1๐Ÿ‘1
Forwarded from Data Analytics
Pandas vs Polars vs DuckDB: Which Library Should You Choose? ๐Ÿค”๐Ÿ“Š

pandas remains the default choice for notebooks, exploratory analysis, visualization, and machine learning workflows ๐Ÿ“๐Ÿ“ˆ. Polars focus on fast, memory-efficient DataFrame processing โšก๐Ÿ’พ, while DuckDB brings a SQL-first approach for querying local files and embedded analytics ๐Ÿ—„๏ธ๐Ÿ”.

Each tool fits a different kind of local data workflow ๐Ÿ› ๏ธ. In this article, we compare pandas, Polars, and DuckDB across performance, architecture, interoperability, and real-world use cases ๐Ÿ†๐Ÿ”—.

More: https://www.analyticsvidhya.com/blog/2026/05/pandas-vs-polars-vs-duckdb/ ๐Ÿ”—

#DataScience #Pandas #Polars #DuckDB #Python #Analytics
โค5๐Ÿ‘Ž1
Found an easy way to learn math for ML: Mathematics for Machine Learning ๐ŸŽ“๐Ÿ“š

This is a curated collection on GitHub, including books, research papers, video lectures, and basic materials on math for studying and reviewing the mathematical foundations of machine learning. ๐Ÿ“–๐Ÿ“Š

It helps build a stronger knowledge base by bringing together trusted resources around topics that machine learning engineers constantly encounter: linear algebra, mathematical analysis, probability theory, statistics, information theory, matrix calculus, and deep learning mathematics. ๐Ÿงฎ๐Ÿค–

Free public repository on GitHub. ๐Ÿ’ปโœจ

https://github.com/dair-ai/Mathematics-for-ML

#MachineLearning #Mathematics #DataScience #Learning #GitHub #AI

โœจ Join Best TG Channels
https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค8๐Ÿ‘Ž1
Forwarded from Machine Learning
๐Ÿ”– A huge open-source course on AI Engineering from scratch

In the repository, we've collected:
โ€” 435 lessons;
โ€” 320+ hours of content;
โ€” Python, TypeScript, and Rust;
โ€” AI agents, MCP servers, prompts, and AI skills.

Moreover, almost every lesson includes practical tasks, so this isn't just theory, but a full-fledged roadmap for AI Engineering. ๐Ÿš€

โ›“๏ธ Link to the repository
https://github.com/rohitg00/ai-engineering-from-scratch

#AI #MachineLearning #Python #Rust #OpenSource #Tech

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค9๐Ÿ‘Ž1
Autonomous AI research on Apple Silicon

Port of the project Karpathyโ€™s autoresearch for Apple Silicon based on MLX, which implements autonomous research cycles with control via program.md ๐Ÿ

Whatโ€™s interesting:
โ€ข native support for Apple Silicon without PyTorch/CUDA
โ€ข fixed training budget (~5 minutes)
โ€ข logging of results in results.tsv
โ€ข simple structure for autonomous experiments
โ€ข optimization of models for more efficient operation

https://github.com/trevin-creator/autoresearch-mlx ๐Ÿ”ฌ

#AppleSilicon #AIResearch #MLX #AutonomousAI #MachineLearning #OpenSource

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค7
Transformer implementations for vision, audio, and AI agents ๐Ÿค–๐Ÿ‘๏ธ๐ŸŽต

Repo: https://github.com/Nicolepcx/transformers-the-definitive-guide

#AI #MachineLearning #Vision #Audio #Agents #Tech

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค4๐Ÿ‘3
๐Ÿš€ HelloEncyclo Presale is LIVE!

Master the skills that matter โ€” Gen-AI, Data Science, Machine Learning and more โ€” all in one place.

๐ŸŽ First 250 members get a flat 40% OFF

Use code: PRESALE-BOOK-WAVE-2GFG

โœ… 13 full courses live right now

โœ… 40+ more dropping in the next 2โ€“3 weeks

โœ… Complete library within 2 months โ€” built and refined by industry experts

โœ… 15-day money-back guarantee โ€” don't love it? Get a full refund.

โš ๏ธ Coupon works only after you log in with Gmail, and it's valid once per member.

๐Ÿ‘‰ Log in now and start learning:

https://helloencyclo.com

Don't wait โ€” the 40% deal disappears after the first 250 seats. ๐Ÿ”ฅ
โค3๐Ÿ‘2๐Ÿ’ฏ1
Stop discovering ML Python libraries one random tutorial at a time ๐Ÿ›‘

Best-of Machine Learning with Python is a curated GitHub index of open-source machine learning Python libraries for builders who need a faster way to compare the ecosystem ๐Ÿ“š.

It helps you shortlist tools by grouping projects into categories and ranking them with a project-quality score based on metrics collected from GitHub and package managers ๐Ÿ“Š.

Key features:

โ€ข 920-project index โ€“ a large scan-friendly map of open-source ML Python projects ๐Ÿ—บ๏ธ
โ€ข 34 categories โ€“ browse by area like ML frameworks, NLP, image data, AutoML, deployment, interpretability, and more ๐Ÿงฉ
โ€ข Quality-score ranking โ€“ projects are ordered using an automated score from repo and package-manager signals โš™๏ธ
โ€ข Rich project metadata โ€“ entries show signals like stars, forks, issues, contributors, activity, downloads, and dependencies ๐Ÿ“ˆ
โ€ข Weekly updates + contributions โ€“ the list is updated regularly and can be improved via issues, PRs, or projects.yaml edits ๐Ÿ”„

Itโ€™s open-source (CC BY-SA 4.0 license) ๐Ÿ“œ.

https://github.com/lukasmasuch/best-of-ml-python ๐Ÿ”—

#MachineLearning #Python #ML #OpenSource #DataScience #TechStack

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค7
Forwarded from Machine Learning
Data leakage is one of the main reasons why ML demos look impressive... and then fail in production. ๐Ÿ“‰

The model didn't become smarter.
It just happened to see the correct answers in advance.

In 4 minutes, you'll understand where data leaks hide. ๐Ÿ”

Let's break it down below: ๐Ÿ‘‡

1. Data Leakage ๐Ÿ•ณ๏ธ

Data leakage occurs when information that won't be available at the time of actual prediction is used during the model training process.

Because of this, metrics on the validation stage can look much better than the actual quality of the model on new, previously unseen data.

2. Model Evaluation โš–๏ธ

The test set isn't just "additional data".
It's a simulation of the future.

Only train the model on the information that would have been available to you at the time of prediction.
Evaluate it on examples that the model couldn't have influenced during training.

3. Direct Leakage ๐Ÿšจ

This is the most obvious type of leakage.

Examples:
- a field with information from the future;
- an ID that encodes the target variable;
- a variable that appears only after an event has occurred;
- duplicate records in both the training and test sets.

If a feature doesn't exist at the time of inference (prediction), then it's likely a source of data leakage.

4. Indirect Leakage ๐Ÿ•ต๏ธ

This is the type of leakage that most often traps teams.

You perform normalization, imputation, feature selection, outlier removal, or dimensionality reduction before splitting the data into a training and test set.

The model didn't directly see the data from the test set.
But your preprocessing pipeline already saw it.

5. Train/Test Split โœ‚๏ธ

Wrong:
fit the scaler on all data โ†’ split the data โ†’ evaluate

Right:
split the data โ†’ fit the scaler only on the training set โ†’ apply it to both the training and test sets

The same idea applies to imputers, encoders, feature selection, PCA, and any preprocessing step that is trained on the data.

6. Cross-Validation ๐Ÿ”„

Each fold is a mini-experiment with a training and test set.
Therefore, preprocessing should be performed within each fold.

If you prepared the entire dataset once and then ran cross-validation, each fold would already have had access to its held-out data.

7. Pipelines ๐Ÿ› ๏ธ

A pipeline isn't just a way to make the code cleaner.
It's also a defense against data leakage.

Combine preprocessing, feature selection, and the model into a single pipeline, and then pass this pipeline to cross-validation or hyperparameter search (grid search).

8. AI Engineering Version ๐Ÿค–

Data leaks also occur in RAG systems and when evaluating LLMs.

Leakage occurs when you tune chunks, prompts, re-rankers, thresholds, or examples on the same evaluation dataset that you later present as "held-out".

As a result, your benchmark turns into training data.

9. Leakage Checklist โœ…

Before trusting the obtained metric, ask yourself:

- Could this feature exist at the time of prediction?
- Was any transformation (transform) step trained (fit) on the test data?
- Did cross-validation include the entire pipeline?
- Were we tuning parameters on the final evaluation dataset?

If the answer is "yes", then the metric likely doesn't reflect the actual quality of the model.

#MachineLearning #DataScience #MLOps #DataLeakage #ArtificialIntelligence #TechTips

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค6
Forwarded from Github Top Repositories
๐ŸŒŸ DataTalksClub/data-engineering-zoomcamp caught my eye on GitHub Trending today.

๐Ÿ”— https://github.com/DataTalksClub/data-engineering-zoomcamp
๐Ÿ“ Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here ๐Ÿ‘‡๐Ÿผ
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The Data Engineering Zoomcamp is a free 9-week course that covers the fundamentals of data engineering. It's designed to help you build an end-to-end data pipeline from scratch, with hands-on experience using industry-standard tools and best practices.

Key features of the course include structured modules, hands-on workshops, and a final project to reinforce your learning. You'll learn about containerization, infrastructure as code, workflow orchestration, data warehousing, and analytics engineering.

The course is suitable for anyone with basic coding experience and familiarity with SQL. No prior data engineering experience is necessary. You can enroll in the course by registering for the next cohort or following the self-paced learning path.

The course has a strong community and support system, with a dedicated #course-data-engineering channel on Slack for discussions and troubleshooting.

The course is taught by experienced instructors, including Alexey Grigorev and Michael Shoemaker, and is sponsored by companies like Kestra and Bruin.

Overall, the Data Engineering Zoomcamp is a great resource for anyone looking to learn data engineering fundamentals and build a career in the field.
So, what are you waiting for? Join the course and start building your skills today - it's a free 9-week course that can change your career!

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
๐Ÿง  Channel: https://t.me/GithubRe
โค5
Interactive Explainer ๐Ÿง โœจ

The Anatomy of an LLM ๐Ÿ”
A visual walk through the machinery inside a large language model: from raw text, to tokens, to vectors, to attention, to the next token. โš™๏ธ๐Ÿงฌ

๐Ÿ”— Link: https://www.royvanrijn.com/anatomy-of-an-llm/

#LLM #AI #Tech #NeuralNetworks #MachineLearning #DeepLearning

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค8
Forwarded from Machine Learning
FREE MIT books on AI and Machine Learning: ๐Ÿ“š๐Ÿค–

1. Foundations of Machine Learning cs.nyu.edu/~mohri/mlbook/
2. Understanding Deep Learning udlbook.github.io/udlbook/
3. Introduction to Machine Learning Systems โฏ Vol 1: mlsysbook.ai/vol1/assets/do โฏ Vol 2: mlsysbook.ai/vol2/assets/do
4. Algorithms for ML algorithmsbook.com
5. Deep Learning deeplearningbook.org
6. Reinforcement Learning andrew.cmu.edu/course/10-703/
7. Distributional Reinforcement Learning direct.mit.edu/books/oa-monog
8. Multi Agent Reinforcement Learning marl-book.com
9. Agents in the Long Game of AI direct.mit.edu/books/oa-monog
10. Fairness and Machine Learning fairmlbook.org
11. Probabilistic Machine Learning
โฏ Part 1 : probml.github.io/pml-book/book1
โฏ Part 2 : probml.github.io/pml-book/book2

#MIT #AI #MachineLearning #DeepLearning #ReinforcementLearning #FreeBooks

โœจ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

โญ๏ธ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค6
๐Ÿ“ฃ HelloEncyclo Presale

โ€” Now Open
๐ŸŽŸ 40% OFF ยท first 250 members only

Code: PRESALE-BOOK-WAVE-2GFG

๐Ÿง  Inside: Gen-AI, LLMs, RAG, Data Science, ML, Deep Learning, SQL, Advanced Java, Math for AI


๐Ÿ“š 13 live now ยท 40+ in 2โ€“3 weeks ยท full library in ~2 months

๐Ÿ›ก 15-day money-back guarantee
โš ๏ธ Log in with Gmail to apply (valid once).

๐Ÿ”— https://helloencyclo.com/?ref=HUSSEINSHEIKHO

The discount disappears with the last seat. Don't sleep on it.
โค5๐Ÿ”ฅ1
Machine Learning with Python
Photo
Don't miss this opportunity!

Once you register, you will receive future courses for free.
โค2๐Ÿ‘1