Data science/ML/AI
13.7K subscribers
561 photos
2 videos
145 files
320 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
πŸ‘‰ https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
πŸ“š Data Science Riddle

Two team members run the same notebook but get different results. What's the culprit?
Anonymous Quiz
5%
Loss Curves
13%
Batch shapes
62%
Random seeds
21%
Metric choice
The Simplest Machine Learning Cheatsheet
❀6πŸ‘1
πŸ“š Data Science Riddle

A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
53%
Add indexes
18%
Use aliases
19%
Add DISTINCT
10%
Increase RAM
Everything You need To Know About Databricks
❀3
πŸ“š Data Science Riddle

You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
54%
Box plot
30%
Heatmap
9%
Line chart
7%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
❀4
If you want to become a Data Scientist, this is the path to follow.
πŸ‘6
πŸ“š Data Science Riddle

You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
29%
Checkpoints
17%
Contracts
41%
Indexes
13%
Sharding
❀1
πŸ› οΈ Running Code in Jupyter Notebooks

Jupyter Notebooks let you write & run code interactively.
Here’s a quick guide to make your workflow smoother:

▢️ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.

⌨️ Useful Shortcuts
- Shift + Enter β†’ run current cell, move to next
- Alt + Enter β†’ run current cell, insert new one below
- Ctrl + Enter β†’ run current cell, stay in place

πŸ”„ Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.

πŸ–₯️ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as they’re generated.
- Large outputs can be scrolled or collapsed for clarity.

πŸ’‘ Pro Tip:
Always β€œRestart & Run All” before sharing or saving a notebook.
This ensures reproducibility and clean results.

πŸ‘‰   Explore
❀3
πŸ“š Data Science Riddle

You need fast reads of small files. What storage options fits best?
Anonymous Quiz
21%
Distributed FS
10%
Cold storage
24%
Object Storage
46%
Local SSD
❀4
6 Must-Know Data Engineering Tools For Beginners
❀2πŸ‘2
πŸ“š Data Science Riddle

A feature has low importance but domain experts insist it matters. What do you do?
Anonymous Quiz
25%
Encode it differently
20%
Scale it
10%
Drop the feature
45%
Check interaction effects
Advanced Data Science on Spark.pdf
1.8 MB
Covers Spark for ML, graph processing (GraphFrames), and integration with Hadoop from Stanford University.
❀4
πŸ“š Data Science Riddle

Your estimate has high variance. Best fix?
Anonymous Quiz
54%
Increase sample size
27%
Change confidence level
9%
Reduce bin count
11%
Switch to bootstrap
The Difference Between Model Accuracy and Business Accuracy

A model can be 95% accurate…
yet deliver 0% business value.

Why❔
Because data science metrics β‰  business metrics.

πŸ“Œ Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue

Always align ML metrics with business KPIs.
Otherwise, your β€œgreat model” is just a great illusion.
❀7
πŸ“š Data Science Riddle

Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Anonymous Quiz
28%
Gradient exploding
40%
Weak regularization
21%
Small batch size
11%
Slow optimizer
πŸ‘1
βœ… Complete AI (Artificial Intelligence) Roadmap πŸ€–πŸš€ 

1️⃣ Basics of AI 
πŸ”Ή What is AI? 
πŸ”Ή Types: Narrow AI vs General AI 
πŸ”Ή AI vs ML vs DL 
πŸ”Ή Real-world applications 

2️⃣ Python for AI
πŸ”Ή Python syntax & libraries 
πŸ”Ή NumPy, Pandas for data handling 
πŸ”Ή Matplotlib, Seaborn for visualization 

3️⃣ Math Foundation
πŸ”Ή Linear Algebra: Vectors, Matrices 
πŸ”Ή Probability & Statistics 
πŸ”Ή Calculus basics 
πŸ”Ή Optimization techniques 

4️⃣ Machine Learning (ML)
πŸ”Ή Supervised vs Unsupervised 
πŸ”Ή Regression, Classification, Clustering 
πŸ”Ή Scikit-learn for ML 
πŸ”Ή Model evaluation metrics 

5️⃣ Deep Learning (DL)
πŸ”Ή Neural Networks basics 
πŸ”Ή Activation functions, backpropagation 
πŸ”Ή TensorFlow / PyTorch 
πŸ”Ή CNNs, RNNs, LSTMs 

6️⃣ NLP (Natural Language Processing)
πŸ”Ή Text cleaning & tokenization 
πŸ”Ή Word embeddings (Word2Vec, GloVe) 
πŸ”Ή Transformers & BERT 
πŸ”Ή Chatbots & summarization 

7️⃣ Computer Vision
πŸ”Ή Image processing basics 
πŸ”Ή OpenCV for CV tasks 
πŸ”Ή Object detection, image classification 
πŸ”Ή CNN architectures (ResNet, YOLO) 

8️⃣ Model Deployment
πŸ”Ή Streamlit / Flask APIs 
πŸ”Ή Docker for containerization 
πŸ”Ή Deploy on cloud: Render, Hugging Face, AWS 

9️⃣ Tools & Ecosystem
πŸ”Ή Git & GitHub 
πŸ”Ή Jupyter Notebooks
πŸ”Ή DVC, MLflow (for tracking models) 

πŸ”Ÿ Build AI Projects
πŸ”Ή Chatbot, Face recognition 
πŸ”Ή Spam classifier, Stock prediction 
πŸ”Ή Language translator, Object detector 
❀7πŸ‘1
πŸ“š Data Science Riddle - CNN Kernels

Which convolution increases channel depth but not spatial size?
Anonymous Quiz
10%
1x1 convolution
32%
3x3 convolution
42%
Depthwise convolution
16%
Transposed convolution
❀1
Normalization vs Standardization: Why They’re Not the Same

People treat these two as interchangeable. they’re not.

πŸ‘‰ Normalization (Min-Max scaling):
Compresses values to 0–1.
Useful when magnitude matters (pixel values, distances).

πŸ‘‰ Standardization (Z-score):
Centers data around mean=0, std=1.
Useful when distribution shape matters (linear/logistic regression, PCA).

πŸ”‘ Key idea:
Normalization preserves relative proportions.
Standardization preserves statistical structure.

Pick the wrong one, and your model’s geometry becomes distorted.
❀4πŸ‘3
Hey everyone πŸ‘‹

Tomorrow we are kicking off a new short & free series called:

πŸ“Š Data Importing Series πŸ“Š

We’ll go through all the real ways to pull data into Python:
β†’ CSV, Excel, JSON and more
β†’ Databases & SQL databases 
β†’ APIs, Google Sheets, even PDFs

Short lessons, ready-to-copy code, zero boring theory.

First part drops tomorrow.
Turn on notifications so you don’t miss it πŸ””

Who’s excited? React with a πŸ”₯ if you are.
πŸ”₯17❀2
Data science/ML/AI
Hey everyone πŸ‘‹ Tomorrow we are kicking off a new short & free series called: πŸ“Š Data Importing Series πŸ“Š We’ll go through all the real ways to pull data into Python: β†’ CSV, Excel, JSON and more β†’ Databases & SQL databases  β†’ APIs, Google Sheets, even PDFs…
Loading a CSV file in Python

CSV stands for Comma-Separated Values the most common format for tabular data everywhere.
With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.

# Import the pandas library
import pandas as pd

# Specify the path to your CSV file
filename = "data.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(filename)

#Checking the first five rows
df.head()


Next up ➑️ Loading an Excel file in Python

πŸ‘‰Join @datascience_bds for more
Part of the @bigdataspecialist family
❀10