π Data Science Riddle
Two team members run the same notebook but get different results. What's the culprit?
Two team members run the same notebook but get different results. What's the culprit?
Anonymous Quiz
5%
Loss Curves
13%
Batch shapes
62%
Random seeds
21%
Metric choice
π Data Science Riddle
A query runs slowly due to large table scans. What's the most targeted fix?
A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
53%
Add indexes
18%
Use aliases
19%
Add DISTINCT
10%
Increase RAM
π Data Science Riddle
You want to detect extreme values visually in one plot. Which one is best?
You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
54%
Box plot
30%
Heatmap
9%
Line chart
7%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
β€4
π Data Science Riddle
You want to prevent inconsistent data across environments. What helps most?
You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
29%
Checkpoints
17%
Contracts
41%
Indexes
13%
Sharding
β€1
π οΈ Running Code in Jupyter Notebooks
Jupyter Notebooks let you write & run code interactively.
Hereβs a quick guide to make your workflow smoother:
βΆοΈ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.
β¨οΈ Useful Shortcuts
- Shift + Enter β run current cell, move to next
- Alt + Enter β run current cell, insert new one below
- Ctrl + Enter β run current cell, stay in place
π Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.
π₯οΈ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as theyβre generated.
- Large outputs can be scrolled or collapsed for clarity.
π‘ Pro Tip:
Always βRestart & Run Allβ before sharing or saving a notebook.
This ensures reproducibility and clean results.
π Explore
Jupyter Notebooks let you write & run code interactively.
Hereβs a quick guide to make your workflow smoother:
βΆοΈ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.
β¨οΈ Useful Shortcuts
- Shift + Enter β run current cell, move to next
- Alt + Enter β run current cell, insert new one below
- Ctrl + Enter β run current cell, stay in place
π Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.
π₯οΈ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as theyβre generated.
- Large outputs can be scrolled or collapsed for clarity.
π‘ Pro Tip:
Always βRestart & Run Allβ before sharing or saving a notebook.
This ensures reproducibility and clean results.
π Explore
β€3
π Data Science Riddle
You need fast reads of small files. What storage options fits best?
You need fast reads of small files. What storage options fits best?
Anonymous Quiz
21%
Distributed FS
10%
Cold storage
24%
Object Storage
46%
Local SSD
β€4
π Data Science Riddle
A feature has low importance but domain experts insist it matters. What do you do?
A feature has low importance but domain experts insist it matters. What do you do?
Anonymous Quiz
25%
Encode it differently
20%
Scale it
10%
Drop the feature
45%
Check interaction effects
Advanced Data Science on Spark.pdf
1.8 MB
Covers Spark for ML, graph processing (GraphFrames), and integration with Hadoop from Stanford University.
β€4
π Data Science Riddle
Your estimate has high variance. Best fix?
Your estimate has high variance. Best fix?
Anonymous Quiz
54%
Increase sample size
27%
Change confidence level
9%
Reduce bin count
11%
Switch to bootstrap
The Difference Between Model Accuracy and Business Accuracy
A model can be 95% accurateβ¦
yet deliver 0% business value.
Whyβ
Because data science metrics β business metrics.
π Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue
Always align ML metrics with business KPIs.
Otherwise, your βgreat modelβ is just a great illusion.
A model can be 95% accurateβ¦
yet deliver 0% business value.
Whyβ
Because data science metrics β business metrics.
π Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue
Always align ML metrics with business KPIs.
Otherwise, your βgreat modelβ is just a great illusion.
β€7
π Data Science Riddle
Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Anonymous Quiz
28%
Gradient exploding
40%
Weak regularization
21%
Small batch size
11%
Slow optimizer
π1
β
Complete AI (Artificial Intelligence) Roadmap π€π
1οΈβ£ Basics of AI
πΉ What is AI?
πΉ Types: Narrow AI vs General AI
πΉ AI vs ML vs DL
πΉ Real-world applications
2οΈβ£ Python for AI
πΉ Python syntax & libraries
πΉ NumPy, Pandas for data handling
πΉ Matplotlib, Seaborn for visualization
3οΈβ£ Math Foundation
πΉ Linear Algebra: Vectors, Matrices
πΉ Probability & Statistics
πΉ Calculus basics
πΉ Optimization techniques
4οΈβ£ Machine Learning (ML)
πΉ Supervised vs Unsupervised
πΉ Regression, Classification, Clustering
πΉ Scikit-learn for ML
πΉ Model evaluation metrics
5οΈβ£ Deep Learning (DL)
πΉ Neural Networks basics
πΉ Activation functions, backpropagation
πΉ TensorFlow / PyTorch
πΉ CNNs, RNNs, LSTMs
6οΈβ£ NLP (Natural Language Processing)
πΉ Text cleaning & tokenization
πΉ Word embeddings (Word2Vec, GloVe)
πΉ Transformers & BERT
πΉ Chatbots & summarization
7οΈβ£ Computer Vision
πΉ Image processing basics
πΉ OpenCV for CV tasks
πΉ Object detection, image classification
πΉ CNN architectures (ResNet, YOLO)
8οΈβ£ Model Deployment
πΉ Streamlit / Flask APIs
πΉ Docker for containerization
πΉ Deploy on cloud: Render, Hugging Face, AWS
9οΈβ£ Tools & Ecosystem
πΉ Git & GitHub
πΉ Jupyter Notebooks
πΉ DVC, MLflow (for tracking models)
π Build AI Projects
πΉ Chatbot, Face recognition
πΉ Spam classifier, Stock prediction
πΉ Language translator, Object detector
1οΈβ£ Basics of AI
πΉ What is AI?
πΉ Types: Narrow AI vs General AI
πΉ AI vs ML vs DL
πΉ Real-world applications
2οΈβ£ Python for AI
πΉ Python syntax & libraries
πΉ NumPy, Pandas for data handling
πΉ Matplotlib, Seaborn for visualization
3οΈβ£ Math Foundation
πΉ Linear Algebra: Vectors, Matrices
πΉ Probability & Statistics
πΉ Calculus basics
πΉ Optimization techniques
4οΈβ£ Machine Learning (ML)
πΉ Supervised vs Unsupervised
πΉ Regression, Classification, Clustering
πΉ Scikit-learn for ML
πΉ Model evaluation metrics
5οΈβ£ Deep Learning (DL)
πΉ Neural Networks basics
πΉ Activation functions, backpropagation
πΉ TensorFlow / PyTorch
πΉ CNNs, RNNs, LSTMs
6οΈβ£ NLP (Natural Language Processing)
πΉ Text cleaning & tokenization
πΉ Word embeddings (Word2Vec, GloVe)
πΉ Transformers & BERT
πΉ Chatbots & summarization
7οΈβ£ Computer Vision
πΉ Image processing basics
πΉ OpenCV for CV tasks
πΉ Object detection, image classification
πΉ CNN architectures (ResNet, YOLO)
8οΈβ£ Model Deployment
πΉ Streamlit / Flask APIs
πΉ Docker for containerization
πΉ Deploy on cloud: Render, Hugging Face, AWS
9οΈβ£ Tools & Ecosystem
πΉ Git & GitHub
πΉ Jupyter Notebooks
πΉ DVC, MLflow (for tracking models)
π Build AI Projects
πΉ Chatbot, Face recognition
πΉ Spam classifier, Stock prediction
πΉ Language translator, Object detector
β€7π1
π Data Science Riddle - CNN Kernels
Which convolution increases channel depth but not spatial size?
Which convolution increases channel depth but not spatial size?
Anonymous Quiz
10%
1x1 convolution
32%
3x3 convolution
42%
Depthwise convolution
16%
Transposed convolution
β€1
Normalization vs Standardization: Why Theyβre Not the Same
People treat these two as interchangeable. theyβre not.
π Normalization (Min-Max scaling):
Compresses values to 0β1.
Useful when magnitude matters (pixel values, distances).
π Standardization (Z-score):
Centers data around mean=0, std=1.
Useful when distribution shape matters (linear/logistic regression, PCA).
π Key idea:
Normalization preserves relative proportions.
Standardization preserves statistical structure.
Pick the wrong one, and your modelβs geometry becomes distorted.
People treat these two as interchangeable. theyβre not.
π Normalization (Min-Max scaling):
Compresses values to 0β1.
Useful when magnitude matters (pixel values, distances).
π Standardization (Z-score):
Centers data around mean=0, std=1.
Useful when distribution shape matters (linear/logistic regression, PCA).
π Key idea:
Normalization preserves relative proportions.
Standardization preserves statistical structure.
Pick the wrong one, and your modelβs geometry becomes distorted.
β€4π3
Hey everyone π
Tomorrow we are kicking off a new short & free series called:
π Data Importing Series π
Weβll go through all the real ways to pull data into Python:
β CSV, Excel, JSON and more
β Databases & SQL databases
β APIs, Google Sheets, even PDFs
Short lessons, ready-to-copy code, zero boring theory.
First part drops tomorrow.
Turn on notifications so you donβt miss it π
Whoβs excited? React with a π₯ if you are.
Tomorrow we are kicking off a new short & free series called:
π Data Importing Series π
Weβll go through all the real ways to pull data into Python:
β CSV, Excel, JSON and more
β Databases & SQL databases
β APIs, Google Sheets, even PDFs
Short lessons, ready-to-copy code, zero boring theory.
First part drops tomorrow.
Turn on notifications so you donβt miss it π
Whoβs excited? React with a π₯ if you are.
π₯17β€2
Data science/ML/AI
Hey everyone π Tomorrow we are kicking off a new short & free series called: π Data Importing Series π Weβll go through all the real ways to pull data into Python: β CSV, Excel, JSON and more β Databases & SQL databases β APIs, Google Sheets, even PDFsβ¦
Loading a CSV file in Python
CSV stands for Comma-Separated Values the most common format for tabular data everywhere.
With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.
Next up β‘οΈ Loading an Excel file in Python
πJoin @datascience_bds for more
Part of the @bigdataspecialist family
CSV stands for Comma-Separated Values the most common format for tabular data everywhere.
With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.
# Import the pandas library
import pandas as pd
# Specify the path to your CSV file
filename = "data.csv"
# Read the CSV file into a DataFrame
df = pd.read_csv(filename)
#Checking the first five rows
df.head()
Next up β‘οΈ Loading an Excel file in Python
πJoin @datascience_bds for more
Part of the @bigdataspecialist family
β€10