Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence
37.7K subscribers
287 photos
76 files
342 links
Free Datasets For Data Science Projects & Portfolio

Buy ads: https://telega.io/c/DataPortfolio

For Promotions/ads: @coderfun @love_data
Download Telegram
πŸ”° Python program to convert text to speech
❀8
⚠️ Mistakes Beginners Repeat for Years

❌ Ignoring fundamentals
❌ Copy-pasting without understanding
❌ Overusing frameworks
❌ Avoiding debugging
❌ Skipping tests
❌ Fear of refactoring

React 🧑 if you want more of this type of content

#techinfo
❀15πŸ”₯2
βœ… GitHub Profile Tips for Data Analysts πŸŒπŸ’Ό

Your GitHub is more than code β€” it’s your digital resume. Here's how to make it stand out:

1️⃣ Clean README (Profile)
β€’ Add your name, title & tools
β€’ Short about section
β€’ Include: skills, top projects, certificates, contact
βœ… Example:
β€œHi, I’m Rahul – a Data Analyst skilled in SQL, Python & Power BI.”

2️⃣ Pin Your Best Projects
β€’ Show 3–6 strong repos
β€’ Add clear README for each project:
- What it does
- Tools used
- Screenshots or demo links
βœ… Bonus: Include real data or visuals

3️⃣ Use Commits & Contributions
β€’ Contribute regularly
β€’ Avoid empty profiles
βœ… Daily commits > 1 big push once a month

4️⃣ Upload Resume Projects
β€’ Excel dashboards
β€’ SQL queries
β€’ Python notebooks (Jupyter)
β€’ BI project links (Power BI/Tableau public)

5️⃣ Add Descriptions & Tags
β€’ Use repo tags: sql, python, EDA, dashboard
β€’ Write short project summary in repo description

🧠 Tips:
β€’ Push only clean, working code
β€’ Use folders, not messy files
β€’ Update your profile bio with your LinkedIn

πŸ“Œ Practice Task:
Upload your latest project β†’ Write a README β†’ Pin it to your profile

πŸ’¬ Tap ❀️ for more!
❀13
🚨 Anthropic dropped a FREE 33-page playbook revealing Claude's very own cheat code:

The 'Skills' folder.

Spend 30 minutes building it,
and you’ll never have to explain your process again.

Top-tier users don't just type commands, they build systems.

Grab your free copy of Anthropic's official guide to building Claude skills right here: https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf
❀9
πŸ“’ Advertising in this channel

You can place an ad via Telegaβ€€io. It takes just a few minutes.

Formats and current rates: View details
βœ… Useful Platform to Practice SQL Programming 🧠πŸ–₯️

Learning SQL is just the first step β€” practice is what builds real skill. Here are the best platforms for hands-on SQL:

1️⃣ LeetCode – For Interview-Oriented SQL Practice
β€’ Focus: Real interview-style problems
β€’ Levels: Easy to Hard
β€’ Schema + Sample Data Provided
β€’ Great for: Data Analyst, Data Engineer, FAANG roles
βœ” Tip: Start with Easy β†’ filter by β€œDatabase” tag
βœ” Popular Section: Database β†’ Top 50 SQL Questions
Example Problem: β€œFind duplicate emails in a user table” β†’ Practice filtering, GROUP BY, HAVING

2️⃣ HackerRank – Structured & Beginner-Friendly
β€’ Focus: Step-by-step SQL track
β€’ Has certification tests (SQL Basic, Intermediate)
β€’ Problem sets by topic: SELECT, JOINs, Aggregations, etc.
βœ” Tip: Follow the full SQL track
βœ” Bonus: Company-specific challenges
Try: β€œRevising Aggregations – The Count Function” β†’ Build confidence with small wins

3️⃣ Mode Analytics – Real-World SQL in Business Context
β€’ Focus: Business intelligence + SQL
β€’ Uses real-world datasets (e.g., e-commerce, finance)
β€’ Has an in-browser SQL editor with live data
βœ” Best for: Practicing dashboard-level queries
βœ” Tip: Try the SQL case studies & tutorials

4️⃣ StrataScratch – Interview Questions from Real Companies
β€’ 500+ problems from companies like Uber, Netflix, Google
β€’ Split by company, difficulty, and topic
βœ” Best for: Intermediate to advanced level
βœ” Tip: Try β€œHard” questions after doing 30–50 easy/medium

5️⃣ DataLemur – Short, Practical SQL Problems
β€’ Crisp and to the point
β€’ Good UI, fast learning
β€’ Real interview-style logic
βœ” Use when: You want fast, smart SQL drills

πŸ“Œ How to Practice Effectively:
β€’ Spend 20–30 mins/day
β€’ Focus on JOINs, GROUP BY, HAVING, Subqueries
β€’ Analyze problem β†’ write β†’ debug β†’ re-write
β€’ After solving, explain your logic out loud

πŸ§ͺ Practice Task:
Try solving 5 SQL questions from LeetCode or HackerRank this week. Start with SELECT, WHERE, and GROUP BY.

πŸ’¬ Tap ❀️ for more!
❀11
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data Science

Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.

1. Basic python and statistics

Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset

2. Advanced Statistics

Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset

3. Supervised Learning

a) Regression Problems

How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview

b) Classification problems

Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking

4. Some helpful Data science projects for beginners

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/digit-recognizer

https://www.kaggle.com/c/titanic

5. Intermediate Level Data science Projects

Black Friday Data : https://www.kaggle.com/sdolezel/black-friday

Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones

Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset

Million Song Data : https://www.kaggle.com/c/msdchallenge

Census Income Data : https://www.kaggle.com/c/census-income/data

Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset

Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2

Share with credits: https://t.me/sqlproject

ENJOY LEARNING πŸ‘πŸ‘
❀6πŸ‘2
πŸ”Ή DATA SCIENCE – INTERVIEW REVISION SHEET

1️⃣ What is Data Science?
> β€œData science is the process of using data, statistics, and machine learning to extract insights and build predictive or decision-making models.”

Difference from Data Analytics:
β€’ Data Analytics β†’ past  present (what/why)
β€’ Data Science β†’ future  automation (what will happen)

2️⃣ Data Science Lifecycle (Very Important)
1. Business problem understanding
2. Data collection
3. Data cleaning  preprocessing
4. Exploratory Data Analysis (EDA)
5. Feature engineering
6. Model building
7. Model evaluation
8. Deployment  monitoring
Interview line:
> β€œI always start from business understanding, not the model.”

3️⃣ Data Types
β€’ Structured β†’ tables, SQL
β€’ Semi-structured β†’ JSON, logs
β€’ Unstructured β†’ text, images

4️⃣ Statistics You MUST Know
β€’ Central tendency: Mean, Median (use when outliers exist)
β€’ Spread: Variance, Standard deviation
β€’ Correlation β‰  causation
β€’ Normal distribution
β€’ Skewness (income β†’ right skewed)

5️⃣ Data Cleaning  Preprocessing
Steps you should say in interviews:
1. Handle missing values
2. Remove duplicates
3. Treat outliers
4. Encode categorical variables
5. Scale numerical data
Scaling:
β€’ Min-Max β†’ bounded range
β€’ Standardization β†’ normal distribution

6️⃣ Feature Engineering (Interview Favorite)
> β€œFeature engineering is creating meaningful input variables that improve model performance.”
Examples:
β€’ Extract month from date
β€’ Create customer lifetime value
β€’ Binning age groups

7️⃣ Machine Learning Basics
β€’ Supervised learning: Regression, Classification
β€’ Unsupervised learning: Clustering, Dimensionality reduction

8️⃣ Common Algorithms (Know WHEN to use)
β€’ Regression: Linear regression β†’ continuous output
β€’ Classification: Logistic regression, Decision tree, Random forest, SVM
β€’ Unsupervised: K-Means β†’ segmentation, PCA β†’ dimensionality reduction

9️⃣ Overfitting vs Underfitting
β€’ Overfitting β†’ model memorizes training data
β€’ Underfitting β†’ model too simple
Fixes:
β€’ Regularization
β€’ More data
β€’ Cross-validation

πŸ”Ÿ Model Evaluation Metrics
β€’ Classification: Accuracy, Precision, Recall, F1 score, ROC-AUC
β€’ Regression: MAE, RMSE
Interview line:
> β€œMetric selection depends on business problem.”

1️⃣1️⃣ Imbalanced Data Techniques
β€’ Class weighting
β€’ Oversampling / undersampling
β€’ SMOTE
β€’ Metric preference: Precision, Recall, F1, ROC-AUC

1️⃣2️⃣ Python for Data Science
Core libraries:
β€’ NumPy
β€’ Pandas
β€’ Matplotlib / Seaborn
β€’ Scikit-learn
Must know:
β€’ loc vs iloc
β€’ Groupby
β€’ Vectorization

1️⃣3️⃣ Model Deployment (Basic Understanding)
β€’ Batch prediction
β€’ Real-time prediction
β€’ Model monitoring
β€’ Model drift
Interview line:
> β€œModels must be monitored because data changes over time.”

1️⃣4️⃣ Explain Your Project (Template)
> β€œThe goal was . I cleaned the data using . I performed EDA to identify . I built model and evaluated using . The final outcome was .”

1️⃣5️⃣ HR-Style Data Science Answers
Why data science?
> β€œI enjoy solving complex problems using data and building models that automate decisions.”
Biggest challenge:
β€œHandling messy real-world data.”
Strength:
β€œStrong foundation in statistics and ML.”

πŸ”₯ LAST-DAY INTERVIEW TIPS
β€’ Explain intuition, not math
β€’ Don’t jump to algorithms immediately
β€’ Always connect model β†’ business value
β€’ Say assumptions clearly

Double Tap β™₯️ For More
❀9πŸ”₯1
If I need to teach someone data analytics from the basics, here is my strategy:

1. I will first remove the fear of tools from that person

2. i will start with the excel because it looks familiar and easy to use

3. I put more emphasis on projects like at least 5 to 6 with the excel. because in industry you learn by doing things

4. I will release the person from the tutorial hell and move into a more action oriented person

5. Then I move to the sql because every job wants it , even with the ai tools you need strong understanding for it if you are going to use it daily

6. After strong understanding, I will push the person to solve 100 to 150 Sql problems from basic to advance

7. It helps the person to develop the analytical thinking

8. Then I push the person to solve 3 case studies as it helps how we pull the data in the real life

9. Then I move the person to power bi to do again 5 projects by using either sql or excel files

10. Now the fear is removed.

11. Now I push the person to solve unguided challenges and present them by video recording as it increases the problem solving, communication and data story telling skills

12. Further it helps you to clear case study round given by most of the companies

13. Now i help the person how to present them in resume and also how these tools are used in real world.

14. You know the interesting fact, all of above is present free in youtube and I also mentor the people through existing youtube videos.

15. But people stuck in the tutorial hell, loose motivation , stay confused that they are either in the right direction or not.

16. As a personal mentor , I help them to get of the tutorial hell, set them in the right direction and they stay motivated when they start to see the difference before amd after mentorship

I have curated best 80+ top-notch Data Analytics Resources πŸ‘‡πŸ‘‡
https://topmate.io/analyst/861634

Hope this helps you 😊
❀9
Real-world Data Science projects ideas: πŸ’‘πŸ“ˆ

1. Credit Card Fraud Detection

πŸ“ Tools: Python (Pandas, Scikit-learn)

Use a real credit card transactions dataset to detect fraudulent activity using classification models.

Skills you build: Data preprocessing, class imbalance handling, logistic regression, confusion matrix, model evaluation.

2. Predictive Housing Price Model

πŸ“ Tools: Python (Scikit-learn, XGBoost)

Build a regression model to predict house prices based on various features like size, location, and amenities.

Skills you build: Feature engineering, EDA, regression algorithms, RMSE evaluation.


3. Sentiment Analysis on Tweets or Reviews

πŸ“ Tools: Python (NLTK / TextBlob / Hugging Face)

Analyze customer reviews or Twitter data to classify sentiment as positive, negative, or neutral.

Skills you build: Text preprocessing, NLP basics, vectorization (TF-IDF), classification.


4. Stock Price Prediction

πŸ“ Tools: Python (LSTM / Prophet / ARIMA)

Use time series models to predict future stock prices based on historical data.

Skills you build: Time series forecasting, data visualization, recurrent neural networks, trend/seasonality analysis.


5. Image Classification with CNN

πŸ“ Tools: Python (TensorFlow / PyTorch)

Train a Convolutional Neural Network to classify images (e.g., cats vs dogs, handwritten digits).

Skills you build: Deep learning, image preprocessing, CNN layers, model tuning.


6. Customer Segmentation with Clustering

πŸ“ Tools: Python (K-Means, PCA)

Use unsupervised learning to group customers based on purchasing behavior.

Skills you build: Clustering, dimensionality reduction, data visualization, customer profiling.


7. Recommendation System

πŸ“ Tools: Python (Surprise / Scikit-learn / Pandas)

Build a recommender system (e.g., movies, products) using collaborative or content-based filtering.

Skills you build: Similarity metrics, matrix factorization, cold start problem, evaluation (RMSE, MAE).


πŸ‘‰ Pick 2–3 projects aligned with your interests.
πŸ‘‰ Document everything on GitHub, and post about your learnings on LinkedIn.

Here you can find the project datasets: https://whatsapp.com/channel/0029VbAbnvPLSmbeFYNdNA29

React ❀️ for more
❀5