Data science/ML/AI
13.7K subscribers
561 photos
2 videos
145 files
320 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
πŸ‘‰ https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
Great Packages for R
❀2
Big Data 5V
πŸ‘2❀1
πŸ“š Data Science Riddle

Why does bagging reduce variance?
Anonymous Quiz
13%
Uses deeper trees
50%
Averages multiple models
28%
Penalizes weights
10%
Learns Sequentially
πŸ“Š Infographic Elements That Every Data Person Should Master πŸš€

After years of working with data, I can tell you one thing:
πŸ‘‰ The chart ou choose is as important as the data itself.

Here’s your quick visual toolkit πŸ‘‡

πŸ”Ή Timelines

* Sequential ⏩ great for processes
* Scaled ⏳ best for real dates/events

πŸ”Ή Circular Charts

* Donut 🍩 & Pie πŸ₯§ for proportions
* Radial 🌌 for progress or cycles
* Venn 🎯 when you want to show overlaps

πŸ”Ή Creative Comparisons

* Bubble 🫧 & Area πŸ”΅ for impact by size
* Dot Matrix πŸ”΄ for colorful distributions
* Pictogram πŸ‘₯ when storytelling matters most

πŸ”Ή Classic Must-Haves

* Bar πŸ“Š & Histogram πŸ“ (clear, reliable)
* Line πŸ“ˆ for trends
* Area 🌊 & Stacked Area for the β€œbig picture”

πŸ”Ή Advanced Tricks

* Stacked Bar πŸ— when categories add up
* Span πŸ“ for ranges
* Arc 🌈 for relationships

πŸ’‘ Pro tip from experience:
If your audience doesn’t β€œget it” in 3 seconds, change the chart. The best visualizations speak louder than numbers
❀8πŸ”₯3
Most Common Data Science Skills in Job Posting
❀5
Machine Learning Cheatsheet
❀4
πŸ“š Data Science Riddle

Which Metric is best for imbalanced classification?
Anonymous Quiz
19%
Accuracy
18%
Precision
18%
Recall
45%
F1-Score
SQL JOINS
❀3
Introduction To Linear Regression
❀8
πŸ“š Data Science Riddle

A dataset has 20% missing values in a critical column. What's the most practical choice?
Anonymous Quiz
7%
Drop all rows
48%
Fill with mean/median
40%
Use model-based imputation
5%
Ignore missing data
❀3
ML models don’t all think alike πŸ€–

❇️ Naive Bayes = probability
❇️ KNN = proximity
❇️ Discriminant Analysis = decision boundaries

Different paths, same goal: accurate classification.

Which one do you reach for first?
❀4
πŸ“š Data Science Riddle

In a medical diagnosis project, what's more important?
Anonymous Quiz
34%
High precision
15%
High recall
37%
High accuracy
14%
High F1-score
❀1
Important LLM Terms

πŸ”Ή Transformer Architecture
πŸ”Ή Attention Mechanism
πŸ”Ή Pre-training
πŸ”Ή Fine-tuning
πŸ”Ή Parameters
πŸ”Ή Self-Attention
πŸ”Ή Embeddings
πŸ”Ή Context Window
πŸ”Ή Masked Language Modeling (MLM)
πŸ”Ή Causal Language Modeling (CLM)
πŸ”Ή Multi-Head Attention
πŸ”Ή Tokenization
πŸ”Ή Zero-Shot Learning
πŸ”Ή Few-Shot Learning
πŸ”Ή Transfer Learning
πŸ”Ή Overfitting
πŸ”Ή Inference

πŸ”Ή Language Model Decoding
πŸ”Ή Hallucination
πŸ”Ή Latency
❀11
Cheatsheet: Bayes Theroem And Classifier
❀9
Why is Kafka Called Kafka❔

Here’s a fun fact that surprises a lot of people.

The β€œKafka” you use for real-time data pipelines is… named after the novelist Franz Kafka.

Why? Jay Kreps (the creator) once explained it simply:

- He liked the name.
- It sounded mysterious.
- And Kafka (the author) wrote a lot.

That last part is key.
Because Apache Kafka is all about writing: streams of events, logs, and data in motion.
So the name stuck.

Today, Millions of engineers across the globe talk about β€œKafka” every single day… and most don’t realize they’re also invoking a 20th-century novelist.

It's funny how small choices like naming your project can shape how the world remembers it.
❀5πŸ‘1😁1
πŸ“š Data Science Riddle

Why do CNNs use pooling layers?
Anonymous Quiz
50%
Reduce dimensionality
16%
Increase non-linearity
13%
Normalize activations
22%
Improve learning rate
❀4
Data Analyst πŸ†š Data Engineer: Key Differences

Confused about the roles of a Data Analyst and Data Engineer? πŸ€” Here's a breakdown:

πŸ‘¨β€πŸ’» Data Analyst:

🎯 Role: Analyzes, interprets, & visualizes data to extract insights for business decisions.

πŸ‘ Best For: Those who enjoy finding patterns, trends, & actionable insights.

πŸ”‘ Responsibilities:
  🧹 Cleaning & organizing data.
  πŸ“Š Using tools like Excel, Power BI, Tableau & SQL.
  πŸ“ Creating reports & dashboards.
  🀝 Collaborating with business teams.

Skills: Analytical skills, SQL, Excel, reporting tools, statistical analysis, business intelligence.

βœ… Outcome: Guides decision-making in business, marketing, finance, etc.

βš™οΈ Data Engineer:

πŸ—οΈ Role: Designs, builds, & maintains data infrastructure.

πŸ‘ Best For: Those who enjoy technical data management & architecture for large-scale analysis.

πŸ”‘ Responsibilities:
  πŸ—„οΈ Managing databases & data pipelines.
  πŸ”„ Developing ETL processes.
  πŸ”’ Ensuring data quality & security.
  ☁️ Working with big data technologies like Hadoop, Spark, AWS, Azure & Google Cloud.

Skills: Python, Java, Scala, database management, big data tools, data architecture, cloud technologies.

βœ… Outcome: Creates infrastructure & pipelines for efficient data flow for analysis.

In short: Data Analysts extract insights, while Data Engineers build the systems for data storage, processing, & analysis. Data Analysts focus on business outcomes, while Data Engineers focus on the technical foundation.
❀6
Data Visualization Cheatsheet
❀5