Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence
37.6K subscribers
285 photos
76 files
337 links
Free Datasets For Data Science Projects & Portfolio

Buy ads: https://telega.io/c/DataPortfolio

For Promotions/ads: @coderfun @love_data
Download Telegram
๐Ÿšจ30 FREE Dataset Sources for Data Science Projects๐Ÿ”ฅ

Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/

US Government Dataset: https://www.data.gov/

Open Government Data (OGD) Platform India: https://data.gov.in/

The World Bank Open Data: https://data.worldbank.org/

Data World: https://data.world/

BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics

The Humanitarian Data Exchange (HDX): https://data.humdata.org/

Data at World Health Organization (WHO): https://www.who.int/data

FBIโ€™s Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/

AWS Open Data Registry: https://registry.opendata.aws/

FiveThirtyEight: https://data.fivethirtyeight.com/

IMDb Datasets: https://www.imdb.com/interfaces/

Kaggle: https://www.kaggle.com/datasets

UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php

Google Dataset Search: https://datasetsearch.research.google.com/

Nasdaq Data Link: https://data.nasdaq.com/

Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Reddit - Datasets: https://www.reddit.com/r/datasets/

Open Data Network by Socrata: https://www.opendatanetwork.com/

Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/

Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/

IEEE Data Port: https://ieee-dataport.org/

Wikipedia: Database: https://dumps.wikimedia.org/

BuzzFeed News: https://github.com/BuzzFeedNews/everything

Academic Torrents: https://academictorrents.com/

Yelp Open Dataset: https://www.yelp.com/dataset

The NLP Index by Quantum Stat: https://index.quantumstat.com/

Computer Vision Online: http://www.computervisiononline.com/dataset

Visual Data Discovery: https://www.visualdata.io/

Roboflow Public Datasets: https://public.roboflow.com/

Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
๐Ÿ‘8โค5
Top 10 Datasets for your next SQL Project ๐Ÿ‘‡๐Ÿ‘‡
https://medium.com/mr-plan-publication/top-10-datasets-for-your-next-sql-project-b3925f471147

Like for more โค๏ธ
๐Ÿ‘5โค1
To transition from Data Analyst โžก๏ธ Data Scientist, you will have to focus on building relevant projects! ๐ŸŽฏ

โœ… Predictive Analytics Project
โ†’ Built a model to predict customer behaviour by analyzing past purchase patterns and used time series forecasting to predict future trends.

โœ… Sentiment Analysis using NLP
โ†’ Developed a sentiment analysis model that categorized customer feedback into positive, neutral, and negative sentiments to improve products.

โœ… Personalized Recommendation Engine
โ†’ Created a recommendation engine using collaborative and content-based filtering to give personalized suggestions based on userโ€™s browsing history and preferences.

Tailor every project to focus on business impact and user experience, which can help you stand out to recruiters. ๐Ÿ’ช๐Ÿป
๐Ÿ‘8โค2
Knowing the tools won't be enough to become a master of data analytics!

See if your soft skills are worthy of the rank of master:

1. ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Can you translate your findings into easily digestible insights for non-technical stakeholders?

2. ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ-๐—ฆ๐—ผ๐—น๐˜ƒ๐—ถ๐—ป๐—ด: Is your work focused on solving actual business problems, and are you able to pick the most efficient approach to solve them?

3. ๐—ฆ๐˜๐—ฎ๐—ธ๐—ฒ๐—ต๐—ผ๐—น๐—ฑ๐—ฒ๐—ฟ ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: Are you building strong relationships with your stakeholders, understanding their needs, and providing them with regular updates?

4. ๐—–๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด: The data landscape is constantly changing. Are you keeping up with new tools and trends?

5. ๐—ฃ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜/๐—ฃ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜ ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: Are you aware of the life cycle of your data products? Do you have a structured approach to plan, prioritize, and track your work?

6. ๐—•๐˜‚๐˜€๐—ถ๐—ป๐—ฒ๐˜€๐˜€ ๐—”๐—ฐ๐˜‚๐—บ๐—ฒ๐—ป: Can you understand the language and needs of the business and put your data work into context?

7. ๐——๐—ผ๐—บ๐—ฎ๐—ถ๐—ป ๐—ž๐—ป๐—ผ๐˜„๐—น๐—ฒ๐—ฑ๐—ด๐—ฒ: Do you know the processes, products, and challenges of your domain?


If you want to earn the rank of master in the data field, start working on your soft skills now.
๐Ÿ‘9โค1
Few ways to optimise SQL Queries ๐Ÿ‘‡๐Ÿ‘‡

Use Indexing: Properly indexing your database tables can significantly speed up query performance by allowing the database to quickly locate the rows needed for a query.

Optimize Joins: Minimize the number of joins and use appropriate join types (e.g., INNER JOIN, LEFT JOIN) to ensure efficient data retrieval.

Avoid SELECT * : Instead of selecting all columns using SELECT *, explicitly specify only the columns needed for the query to reduce unnecessary data transfer and processing overhead.

Use WHERE Clause Wisely: Filter rows early in the query using WHERE clause to reduce the dataset size before joining or aggregating data.

Avoid Subqueries: Whenever possible, rewrite subqueries as JOINs or use Common Table Expressions (CTEs) for better performance.

Limit the Use of DISTINCT: Minimize the use of DISTINCT as it requires sorting and duplicate removal, which can be resource-intensive for large datasets.

Optimize GROUP BY and ORDER BY: Use GROUP BY and ORDER BY clauses judiciously, and ensure that they are using indexed columns whenever possible to avoid unnecessary sorting.

Consider Partitioning: Partition large tables to distribute data across multiple nodes, which can improve query performance by reducing I/O operations.

Monitor Query Performance: Regularly monitor query performance using tools like query execution plans, database profiler, and performance monitoring tools to identify and address bottlenecks.

Hope it helps :)
๐Ÿ‘7โค3
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data Science
Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.

1. Basic python and statistics

Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset

2. Advanced Statistics

Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset

3. Supervised Learning

a) Regression Problems

How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview

b) Classification problems

Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking

4. Some helpful Data science projects for beginners

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/digit-recognizer

https://www.kaggle.com/c/titanic

5. Intermediate Level Data science Projects

Black Friday Data : https://www.kaggle.com/sdolezel/black-friday

Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones

Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset

Million Song Data : https://www.kaggle.com/c/msdchallenge

Census Income Data : https://www.kaggle.com/c/census-income/data

Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset

Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2

ENJOY LEARNING โœ…๏ธโœ…๏ธ


#datascienceprojects
๐Ÿ‘10โค3