Machine Learning with Python
67.8K subscribers
1.44K photos
124 videos
193 files
1.16K links
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
@Codeprogrammer Cheat Sheet Numpy.pdf
213.7 KB
This checklist covers the essentials of NumPy in one place, helping you:

- Create and initialize arrays
- Perform element-wise computations
- Stack and split arrays
- Apply linear algebra functions
- Efficiently index, slice, and manipulate arrays

…and much more!

Feel free to share if you found this useful, and let me know in the comments if I missed anything!

⚑️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟

#NumPy #Python #DataScience #MachineLearning #Automation #DeepLearning #Programming #Tech #DataAnalysis #SoftwareDevelopment #Coding #TechTips #PythonForDataScience
Please open Telegram to view this post
VIEW IN TELEGRAM
❀9πŸ‘8
from SQL to pandas.pdf
1.3 MB
🐼 "Comparison Between SQL and pandas" – A Handy Reference Guide

⚑️ As a data scientist, I often found myself switching back and forth between SQL and pandas during technical interviews. I was confident answering questions in SQL but sometimes struggled to translate the same logic into pandas – and vice versa.

πŸ”Έ To bridge this gap, I created a concise booklet in the form of a comparison table. It maps SQL queries directly to their equivalent pandas implementations, making it easy to understand and switch between both tools.

⚑ This reference guide has become an essential part of my interview prep. Before any interview, I quickly review it to ensure I’m ready to tackle data manipulation tasks using either SQL or pandas, depending on what’s required.

πŸ“• Whether you're preparing for interviews or just want to solidify your understanding of both tools, this comparison guide is a great way to stay sharp and efficient.

#DataScience #SQL #pandas #InterviewPrep #Python #DataAnalysis #CareerGrowth #TechTips #Analytics

βœ‰οΈ Our Telegram channels: https://t.me/addlist/0f6vfFbEMdAwODBk

πŸ“± Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘14❀2
In Python, image processing unlocks powerful capabilities for computer vision, data augmentation, and automationβ€”master these techniques to excel in ML engineering interviews and real-world applications! πŸ–Ό 

# PIL/Pillow Basics - The essential image library
from PIL import Image

# Open and display image
img = Image.open("input.jpg")
img.show()

# Convert formats
img.save("output.png")
img.convert("L").save("grayscale.jpg")  # RGB to grayscale

# Basic transformations
img.rotate(90).save("rotated.jpg")
img.resize((300, 300)).save("resized.jpg")
img.transpose(Image.FLIP_LEFT_RIGHT).save("mirrored.jpg")


more explain: https://hackmd.io/@husseinsheikho/imageprocessing

#Python #ImageProcessing #ComputerVision #Pillow #OpenCV #MachineLearning #CodingInterview #DataScience #Programming #TechJobs #DeveloperTips #AI #DeepLearning #CloudComputing #Docker #BackendDevelopment #SoftwareEngineering #CareerGrowth #TechTips #Python3
❀6πŸ‘1
Forwarded from Machine Learning
Data leakage is one of the main reasons why ML demos look impressive... and then fail in production. πŸ“‰

The model didn't become smarter.
It just happened to see the correct answers in advance.

In 4 minutes, you'll understand where data leaks hide. πŸ”

Let's break it down below: πŸ‘‡

1. Data Leakage πŸ•³οΈ

Data leakage occurs when information that won't be available at the time of actual prediction is used during the model training process.

Because of this, metrics on the validation stage can look much better than the actual quality of the model on new, previously unseen data.

2. Model Evaluation βš–οΈ

The test set isn't just "additional data".
It's a simulation of the future.

Only train the model on the information that would have been available to you at the time of prediction.
Evaluate it on examples that the model couldn't have influenced during training.

3. Direct Leakage 🚨

This is the most obvious type of leakage.

Examples:
- a field with information from the future;
- an ID that encodes the target variable;
- a variable that appears only after an event has occurred;
- duplicate records in both the training and test sets.

If a feature doesn't exist at the time of inference (prediction), then it's likely a source of data leakage.

4. Indirect Leakage πŸ•΅οΈ

This is the type of leakage that most often traps teams.

You perform normalization, imputation, feature selection, outlier removal, or dimensionality reduction before splitting the data into a training and test set.

The model didn't directly see the data from the test set.
But your preprocessing pipeline already saw it.

5. Train/Test Split βœ‚οΈ

Wrong:
fit the scaler on all data β†’ split the data β†’ evaluate

Right:
split the data β†’ fit the scaler only on the training set β†’ apply it to both the training and test sets

The same idea applies to imputers, encoders, feature selection, PCA, and any preprocessing step that is trained on the data.

6. Cross-Validation πŸ”„

Each fold is a mini-experiment with a training and test set.
Therefore, preprocessing should be performed within each fold.

If you prepared the entire dataset once and then ran cross-validation, each fold would already have had access to its held-out data.

7. Pipelines πŸ› οΈ

A pipeline isn't just a way to make the code cleaner.
It's also a defense against data leakage.

Combine preprocessing, feature selection, and the model into a single pipeline, and then pass this pipeline to cross-validation or hyperparameter search (grid search).

8. AI Engineering Version πŸ€–

Data leaks also occur in RAG systems and when evaluating LLMs.

Leakage occurs when you tune chunks, prompts, re-rankers, thresholds, or examples on the same evaluation dataset that you later present as "held-out".

As a result, your benchmark turns into training data.

9. Leakage Checklist βœ…

Before trusting the obtained metric, ask yourself:

- Could this feature exist at the time of prediction?
- Was any transformation (transform) step trained (fit) on the test data?
- Did cross-validation include the entire pipeline?
- Were we tuning parameters on the final evaluation dataset?

If the answer is "yes", then the metric likely doesn't reflect the actual quality of the model.

#MachineLearning #DataScience #MLOps #DataLeakage #ArtificialIntelligence #TechTips

✨ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk

⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
❀5