Sign Language Detection using Images.zip
267.8 MB
Datasets Name: Sign Language Detection using Images
๐5
Amazon Reviews Data 2023.zip
270.8 MB
Datasets Name: Amazon Reviews Data 2023
archive.zip.002
2.6 GB
Clothing dataset (full, high resolution)
๐5
Health and sleep statistics.zip
1.2 KB
Datasets Name: Health and sleep statistics
๐จ30 FREE Dataset Sources for Data Science Projects๐ฅ
Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/
US Government Dataset: https://www.data.gov/
Open Government Data (OGD) Platform India: https://data.gov.in/
The World Bank Open Data: https://data.worldbank.org/
Data World: https://data.world/
BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics
The Humanitarian Data Exchange (HDX): https://data.humdata.org/
Data at World Health Organization (WHO): https://www.who.int/data
FBIโs Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/
AWS Open Data Registry: https://registry.opendata.aws/
FiveThirtyEight: https://data.fivethirtyeight.com/
IMDb Datasets: https://www.imdb.com/interfaces/
Kaggle: https://www.kaggle.com/datasets
UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php
Google Dataset Search: https://datasetsearch.research.google.com/
Nasdaq Data Link: https://data.nasdaq.com/
Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html
Reddit - Datasets: https://www.reddit.com/r/datasets/
Open Data Network by Socrata: https://www.opendatanetwork.com/
Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/
Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/
IEEE Data Port: https://ieee-dataport.org/
Wikipedia: Database: https://dumps.wikimedia.org/
BuzzFeed News: https://github.com/BuzzFeedNews/everything
Academic Torrents: https://academictorrents.com/
Yelp Open Dataset: https://www.yelp.com/dataset
The NLP Index by Quantum Stat: https://index.quantumstat.com/
Computer Vision Online: http://www.computervisiononline.com/dataset
Visual Data Discovery: https://www.visualdata.io/
Roboflow Public Datasets: https://public.roboflow.com/
Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/
US Government Dataset: https://www.data.gov/
Open Government Data (OGD) Platform India: https://data.gov.in/
The World Bank Open Data: https://data.worldbank.org/
Data World: https://data.world/
BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics
The Humanitarian Data Exchange (HDX): https://data.humdata.org/
Data at World Health Organization (WHO): https://www.who.int/data
FBIโs Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/
AWS Open Data Registry: https://registry.opendata.aws/
FiveThirtyEight: https://data.fivethirtyeight.com/
IMDb Datasets: https://www.imdb.com/interfaces/
Kaggle: https://www.kaggle.com/datasets
UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php
Google Dataset Search: https://datasetsearch.research.google.com/
Nasdaq Data Link: https://data.nasdaq.com/
Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html
Reddit - Datasets: https://www.reddit.com/r/datasets/
Open Data Network by Socrata: https://www.opendatanetwork.com/
Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/
Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/
IEEE Data Port: https://ieee-dataport.org/
Wikipedia: Database: https://dumps.wikimedia.org/
BuzzFeed News: https://github.com/BuzzFeedNews/everything
Academic Torrents: https://academictorrents.com/
Yelp Open Dataset: https://www.yelp.com/dataset
The NLP Index by Quantum Stat: https://index.quantumstat.com/
Computer Vision Online: http://www.computervisiononline.com/dataset
Visual Data Discovery: https://www.visualdata.io/
Roboflow Public Datasets: https://public.roboflow.com/
Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
๐8โค5
Data Analytics Projects for Beginners ๐
Web Scraping
https://github.com/shreyaswankhede/IMDb-Web-Scraping-and-Sentiment-Analysis
Product Price Scraping and Analysis
https://github.com/CodesdaLu/Web-Scrapping
News Scraping
https://github.com/rohit-yadav/scraping-news-arti Les
IPL Analysis
https://github.com/Yashmenaria1/IPL-Data-Exploration
Customer Churn Prediction
https://github.com/Pradnya1208/Telecom-Customer-Churn-prediction
Employeeโs Performance for HR Analytics
https://www.kaggle.com/code/rajatraj0502/employee-s-performance-for-hr-analytics
Food Price Prediction
https://github.com/VectorInstitute/foodprice-forecasting
Join for more: https://t.me/sqlproject
Web Scraping
https://github.com/shreyaswankhede/IMDb-Web-Scraping-and-Sentiment-Analysis
Product Price Scraping and Analysis
https://github.com/CodesdaLu/Web-Scrapping
News Scraping
https://github.com/rohit-yadav/scraping-news-arti Les
IPL Analysis
https://github.com/Yashmenaria1/IPL-Data-Exploration
Customer Churn Prediction
https://github.com/Pradnya1208/Telecom-Customer-Churn-prediction
Employeeโs Performance for HR Analytics
https://www.kaggle.com/code/rajatraj0502/employee-s-performance-for-hr-analytics
Food Price Prediction
https://github.com/VectorInstitute/foodprice-forecasting
Join for more: https://t.me/sqlproject
๐7โค4๐ฅฐ1
Top 10 Datasets for your next SQL Project ๐๐
https://medium.com/mr-plan-publication/top-10-datasets-for-your-next-sql-project-b3925f471147
Like for more โค๏ธ
https://medium.com/mr-plan-publication/top-10-datasets-for-your-next-sql-project-b3925f471147
Like for more โค๏ธ
๐5โค1
To transition from Data Analyst โก๏ธ Data Scientist, you will have to focus on building relevant projects! ๐ฏ
โ Predictive Analytics Project
โ Built a model to predict customer behaviour by analyzing past purchase patterns and used time series forecasting to predict future trends.
โ Sentiment Analysis using NLP
โ Developed a sentiment analysis model that categorized customer feedback into positive, neutral, and negative sentiments to improve products.
โ Personalized Recommendation Engine
โ Created a recommendation engine using collaborative and content-based filtering to give personalized suggestions based on userโs browsing history and preferences.
Tailor every project to focus on business impact and user experience, which can help you stand out to recruiters. ๐ช๐ป
โ Predictive Analytics Project
โ Built a model to predict customer behaviour by analyzing past purchase patterns and used time series forecasting to predict future trends.
โ Sentiment Analysis using NLP
โ Developed a sentiment analysis model that categorized customer feedback into positive, neutral, and negative sentiments to improve products.
โ Personalized Recommendation Engine
โ Created a recommendation engine using collaborative and content-based filtering to give personalized suggestions based on userโs browsing history and preferences.
Tailor every project to focus on business impact and user experience, which can help you stand out to recruiters. ๐ช๐ป
๐8โค2
Knowing the tools won't be enough to become a master of data analytics!
See if your soft skills are worthy of the rank of master:
1. ๐๐ผ๐บ๐บ๐๐ป๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป: Can you translate your findings into easily digestible insights for non-technical stakeholders?
2. ๐ฃ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ-๐ฆ๐ผ๐น๐๐ถ๐ป๐ด: Is your work focused on solving actual business problems, and are you able to pick the most efficient approach to solve them?
3. ๐ฆ๐๐ฎ๐ธ๐ฒ๐ต๐ผ๐น๐ฑ๐ฒ๐ฟ ๐ ๐ฎ๐ป๐ฎ๐ด๐ฒ๐บ๐ฒ๐ป๐: Are you building strong relationships with your stakeholders, understanding their needs, and providing them with regular updates?
4. ๐๐ผ๐ป๐๐ถ๐ป๐๐ผ๐๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด: The data landscape is constantly changing. Are you keeping up with new tools and trends?
5. ๐ฃ๐ฟ๐ผ๐ฑ๐๐ฐ๐/๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐ ๐ฎ๐ป๐ฎ๐ด๐ฒ๐บ๐ฒ๐ป๐: Are you aware of the life cycle of your data products? Do you have a structured approach to plan, prioritize, and track your work?
6. ๐๐๐๐ถ๐ป๐ฒ๐๐ ๐๐ฐ๐๐บ๐ฒ๐ป: Can you understand the language and needs of the business and put your data work into context?
7. ๐๐ผ๐บ๐ฎ๐ถ๐ป ๐๐ป๐ผ๐๐น๐ฒ๐ฑ๐ด๐ฒ: Do you know the processes, products, and challenges of your domain?
If you want to earn the rank of master in the data field, start working on your soft skills now.
See if your soft skills are worthy of the rank of master:
1. ๐๐ผ๐บ๐บ๐๐ป๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป: Can you translate your findings into easily digestible insights for non-technical stakeholders?
2. ๐ฃ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ-๐ฆ๐ผ๐น๐๐ถ๐ป๐ด: Is your work focused on solving actual business problems, and are you able to pick the most efficient approach to solve them?
3. ๐ฆ๐๐ฎ๐ธ๐ฒ๐ต๐ผ๐น๐ฑ๐ฒ๐ฟ ๐ ๐ฎ๐ป๐ฎ๐ด๐ฒ๐บ๐ฒ๐ป๐: Are you building strong relationships with your stakeholders, understanding their needs, and providing them with regular updates?
4. ๐๐ผ๐ป๐๐ถ๐ป๐๐ผ๐๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด: The data landscape is constantly changing. Are you keeping up with new tools and trends?
5. ๐ฃ๐ฟ๐ผ๐ฑ๐๐ฐ๐/๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐ ๐ฎ๐ป๐ฎ๐ด๐ฒ๐บ๐ฒ๐ป๐: Are you aware of the life cycle of your data products? Do you have a structured approach to plan, prioritize, and track your work?
6. ๐๐๐๐ถ๐ป๐ฒ๐๐ ๐๐ฐ๐๐บ๐ฒ๐ป: Can you understand the language and needs of the business and put your data work into context?
7. ๐๐ผ๐บ๐ฎ๐ถ๐ป ๐๐ป๐ผ๐๐น๐ฒ๐ฑ๐ด๐ฒ: Do you know the processes, products, and challenges of your domain?
If you want to earn the rank of master in the data field, start working on your soft skills now.
๐9โค1
7 Free Datasets to create your next data analytics project ๐๐
https://medium.com/@data_analyst/free-data-sources-to-create-data-analytics-projects-2fd8fd6eadd3
https://medium.com/@data_analyst/free-data-sources-to-create-data-analytics-projects-2fd8fd6eadd3
๐6
13 Best Data Analytics Projects for Final Year Students ๐๐
https://datasimplifier.com/data-analytics-projects-for-final-year-students/
https://datasimplifier.com/data-analytics-projects-for-final-year-students/
๐4
Few ways to optimise SQL Queries ๐๐
Use Indexing: Properly indexing your database tables can significantly speed up query performance by allowing the database to quickly locate the rows needed for a query.
Optimize Joins: Minimize the number of joins and use appropriate join types (e.g., INNER JOIN, LEFT JOIN) to ensure efficient data retrieval.
Avoid SELECT * : Instead of selecting all columns using SELECT *, explicitly specify only the columns needed for the query to reduce unnecessary data transfer and processing overhead.
Use WHERE Clause Wisely: Filter rows early in the query using WHERE clause to reduce the dataset size before joining or aggregating data.
Avoid Subqueries: Whenever possible, rewrite subqueries as JOINs or use Common Table Expressions (CTEs) for better performance.
Limit the Use of DISTINCT: Minimize the use of DISTINCT as it requires sorting and duplicate removal, which can be resource-intensive for large datasets.
Optimize GROUP BY and ORDER BY: Use GROUP BY and ORDER BY clauses judiciously, and ensure that they are using indexed columns whenever possible to avoid unnecessary sorting.
Consider Partitioning: Partition large tables to distribute data across multiple nodes, which can improve query performance by reducing I/O operations.
Monitor Query Performance: Regularly monitor query performance using tools like query execution plans, database profiler, and performance monitoring tools to identify and address bottlenecks.
Hope it helps :)
Use Indexing: Properly indexing your database tables can significantly speed up query performance by allowing the database to quickly locate the rows needed for a query.
Optimize Joins: Minimize the number of joins and use appropriate join types (e.g., INNER JOIN, LEFT JOIN) to ensure efficient data retrieval.
Avoid SELECT * : Instead of selecting all columns using SELECT *, explicitly specify only the columns needed for the query to reduce unnecessary data transfer and processing overhead.
Use WHERE Clause Wisely: Filter rows early in the query using WHERE clause to reduce the dataset size before joining or aggregating data.
Avoid Subqueries: Whenever possible, rewrite subqueries as JOINs or use Common Table Expressions (CTEs) for better performance.
Limit the Use of DISTINCT: Minimize the use of DISTINCT as it requires sorting and duplicate removal, which can be resource-intensive for large datasets.
Optimize GROUP BY and ORDER BY: Use GROUP BY and ORDER BY clauses judiciously, and ensure that they are using indexed columns whenever possible to avoid unnecessary sorting.
Consider Partitioning: Partition large tables to distribute data across multiple nodes, which can improve query performance by reducing I/O operations.
Monitor Query Performance: Regularly monitor query performance using tools like query execution plans, database profiler, and performance monitoring tools to identify and address bottlenecks.
Hope it helps :)
๐7โค3
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data SciencePlease also check the discussions and notebook submissions for different approaches and solution after you tried yourself.
1. Basic python and statistics
Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset
2. Advanced Statistics
Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset
3. Supervised Learning
a) Regression Problems
How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview
b) Classification problems
Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking
4. Some helpful Data science projects for beginners
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
https://www.kaggle.com/c/digit-recognizer
https://www.kaggle.com/c/titanic
5. Intermediate Level Data science Projects
Black Friday Data : https://www.kaggle.com/sdolezel/black-friday
Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones
Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset
Million Song Data : https://www.kaggle.com/c/msdchallenge
Census Income Data : https://www.kaggle.com/c/census-income/data
Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset
Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2
ENJOY LEARNING โ ๏ธโ ๏ธ
#datascienceprojects
๐10โค3