Data science/ML/AI
13.7K subscribers
561 photos
2 videos
145 files
320 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
πŸ‘‰ https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
πŸ“š Data Science Riddle

During EDA(Explanatory Data Analysis), what's the main reason we use box plots?
Anonymous Quiz
22%
To visualize distributions
64%
To detect outliers
9%
To see correlations
5%
To test normality
❀5
Hey everyone πŸ‘‹

Some time ago, I asked if I should start a Data Science educational series and since 96% of you said yes, I began creating it.

But many of you also asked for real, hands-on experience with projects, not just lessons. So I decided to shift gears. It’s now becoming a full practical coding course! πŸ’»

My goal is to help you build skills that get you job-ready, not just teach theory. It’s taking a bit longer, but I promise it’ll be worth it.

Thank you all for your support and patience ❀️
I’ll let you know as soon as we’re ready to start!
❀21πŸ‘3πŸ₯°1
Pandas Cheatsheet For Data Analysis
❀4
πŸ“š Data Science Riddle

Your batch ETL job runs slower each week despite no code change. What's your first suspect?
Anonymous Quiz
12%
Code inefficiency
20%
Schema mismatch
61%
Data volume growth
7%
Resource throttling
🚨 When & How Jupyter Notebooks Fail (And What To Use Instead)

Hey Data Folks! πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»
Let’s talk about Jupyter Notebooks β€” powerful for exploration, but risky in production. Here’s why:

❌ Problems with Notebooks:
1. Out-of-order execution β†’ hidden bugs.
2. Code changes after execution β†’ inconsistent results.
3. Data leakage β†’ sensitive info in outputs.
4. Security risks β†’ tokens/keys exposed.
5. Hard to apply engineering practices β†’ no modular code, testing, CI/CD.
6. Collaboration pain β†’ merge conflicts, JSON issues.
7. Reproducibility issues β†’ missing dependencies, versions.

βœ… When They’re Useful:
- Quick data exploration & prototyping.
- Knowledge sharing (clean, runnable from top to bottom).
- Teaching / hands-on tutorials (with solution notebooks).

πŸ”§ What to Use Instead:
- For production code β†’ .py files + IDEs.
- For workflows β†’ template repos & reproducible setups.
- For deployment β†’ MLOps tools, pipelines, automation.

πŸ’‘ Key Takeaways:
- Use notebooks for exploration & teaching.
- Use structured code + pipelines for production & deployment.
- Always document dependencies, keep notebooks clean, never commit secrets!
❀6πŸ‘2
List of AI Project Ideas πŸ‘¨πŸ»β€πŸ’»

Beginner Projects

πŸ”Ή Sentiment Analyzer
πŸ”Ή Image Classifier
πŸ”Ή Spam Detection System
πŸ”Ή Face Detection
πŸ”Ή Chatbot (Rule-based)
πŸ”Ή Movie Recommendation System
πŸ”Ή Handwritten Digit Recognition
πŸ”Ή Speech-to-Text Converter
πŸ”Ή AI-Powered Calculator
πŸ”Ή AI Hangman Game

Intermediate Projects

πŸ”Έ AI Virtual Assistant
πŸ”Έ Fake News Detector
πŸ”Έ Music Genre Classification
πŸ”Έ AI Resume Screener
πŸ”Έ Style Transfer App
πŸ”Έ Real-Time Object Detection
πŸ”Έ Chatbot with Memory
πŸ”Έ Autocorrect Tool
πŸ”Έ Face Recognition Attendance System
πŸ”Έ AI Sudoku Solver

Advanced Projects

πŸ”Ί AI Stock Predictor
πŸ”Ί AI Writer (GPT-based)
πŸ”Ί AI-powered Resume Builder
πŸ”Ί Deepfake Generator
πŸ”Ί AI Lawyer Assistant
πŸ”Ί AI-Powered Medical Diagnosis
πŸ”Ί AI-based Game Bot
πŸ”Ί Custom Voice Cloning
πŸ”Ί Multi-modal AI App
πŸ”Ί AI Research Paper Summarizer
❀9πŸ‘1
πŸ“š Data Science Riddle

You discover your regression model performs poorly on recent data. The relationships between variables have shifted. What's this called?
Anonymous Quiz
39%
Model Overfitting
39%
Concept Drift
11%
Sampling Error
11%
Data Leakage
Regularization: The Art of Keeping Models Humble

Overfitting is the β€œego problem” of models. They memorize training data and forget how to generalize.
Regularization is how we humble them.

➑️ L1 (Lasso): Shrinks some weights to zero β†’ performs feature selection.
➑️ L2 (Ridge): Reduces all weights slightly β†’ smooths learning.
➑️ Dropout: Randomly removes neurons during training β†’ prevents co-dependence.

It’s not about punishment but it’s about discipline.
Regularization teaches models to focus on patterns, not exceptions.

πŸ’­ Remember: The best models don’t just fit data. They respect uncertainty.
❀9😁1
Explaining LLMs By BigData Specialist.pdf
4.3 MB
This is our latest post from Instagram page, saved as PDF.

If you want a very comprehensive breakdown on what's LLMs are and how they actually work, you might want to check it out.

Here's our Instagram post: Explaining LLMs
❀9
Skills Needed To Become Data Analyst
❀5
πŸ“š Data Science Riddle

Why might your SQL join explode the number of rows unexpectedly?
Anonymous Quiz
21%
Index missing
38%
Wrong join key
34%
Duplicate keys
7%
Slow query optimizer
Top 6 Types of AI Models
❀4
Database Querying Using SQL.pdf
136.4 KB
Notes on SQL for data management and analysis, including queries and integration with R, from University of South Carolina.
❀2πŸ‘1
πŸ“š Data Science Riddle

A business team wants interpretable insights, not just predictions. What's the best model to start with?
Anonymous Quiz
30%
Random Forest
38%
Logistic Regression
14%
XGBoost
18%
Deep Neural Net
Top Data Science Tools By Function
❀3πŸ‘1
Forwarded from Cool GitHub repositories
lerobot

This is an end-to-end library for robot learning. It handles the entire pipeline from loading and processing robotics datasets to training policies and deploying them in simulation or on real hardware.

Creator:   huggingface
Stars ⭐️:  19,000
Forked by: 3,000

Github Repo:
https://github.com/huggingface/lerobot

#robotics #AI
βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–    
Join @github_repositories_bds for more cool repositories. This channel belongs to @bigdataspecialist group
❀3
Descriptive Statistics and Exploratory Data Analysis.pdf
1 MB
Covers basic numerical and graphical summaries with practical examples, from University of Washington.
❀5πŸ‘2πŸ‘1