Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence
37.2K subscribers
282 photos
76 files
336 links
Free Datasets For Data Science Projects & Portfolio

Buy ads: https://telega.io/c/DataPortfolio

For Promotions/ads: @coderfun @love_data
Download Telegram
Twitter Sentiment Analysis.zip
2 MB
๐Ÿ“ฆ Datasets name: Twitter Sentiment Analysis

๐ŸŒนThis is an entity-level sentiment analysis dataset of twitter. Given a message and an entity, the task is to judge the sentiment of the message about the entity. There are three classes in this dataset: Positive, Negative and Neutral. We regard messages that are not relevant to the entity (i.e. Irrelevant) as Neutral
Movie Rating DataSet.zip
1.6 MB
๐Ÿ“ฆ Datasets name: Movie Rating DataSet


๐ŸŒนThis Data About Movie Voting and their best rating.
This Data have 20 Columns and 4804 Rows. And In this dataset how was the popularity of a movie and their characters and how was the release date of the movie revenue , status , title , movie language , average vote ,id and more..
๐Ÿ‘12
Forwarded from Data Science Projects
Sharing 20+ Diverse Datasets๐Ÿ“Š for Data Science and Analytics practice!


1. How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview

2. Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand

3. Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction

4. Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data

5. Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction

6. Iris Dataset: https://archive.ics.uci.edu/ml/datasets/iris

7. Titanic Dataset: https://www.kaggle.com/c/titanic

8. Wine Quality Dataset: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

9. Heart Disease Dataset: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

10. Bengaluru House Price Dataset: https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data

11. Breast Cancer Dataset: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

12. Credit Card Fraud Detection: https://www.kaggle.com/mlg-ulb/creditcardfraud

13. Netflix Movies and TV Shows: https://www.kaggle.com/shivamb/netflix-shows

14. Trending YouTube Video Statistics: https://www.kaggle.com/datasnaek/youtube-new

15. Walmart Store Sales Forecasting: https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting

16. FIFA 19 Complete Player Dataset: https://www.kaggle.com/karangadiya/fifa19

17. World Happiness Report: https://www.kaggle.com/unsdsn/world-happiness

18. TMDB 5000 Movie Dataset: https://www.kaggle.com/tmdb/tmdb-movie-metadata

19. Students Performance in Exams: https://www.kaggle.com/spscientist/students-performance-in-exams

20. Twitter Sentiment Analysis Dataset: https://www.kaggle.com/kazanova/sentiment140

21. Digit Recognizer: https://www.kaggle.com/c/digit-recognizer


๐Ÿ’ป๐Ÿ” Don't miss out on these valuable resources for advancing your data science journey!
๐Ÿ‘15โค1
Top๐Ÿ”ฅ10 Computer Vision ๐Ÿ”ฅProject Ideas ๐Ÿ”ฅ

1. Edge Detection
2. Photo Sketching
3. Detecting Contours
4. Collage Mosaic Generator
5. Barcode and QR Code Scanner
6. Face Detection
7. Blur the Face
8. Image Segmentation
9. Human Counting with OpenCV
10. Colour Detection
๐Ÿ‘19โค1
Free Datasets to work on Power BI + SQL projects ๐Ÿ‘‡๐Ÿ‘‡

1. AdventureWorks Sample Database:
- Link: [AdventureWorks Sample Database](https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver15)
- Description: A sample database provided by Microsoft, containing sales, products, customers, and other related data.

2. Online Retail Dataset:
- Link: [UCI Machine Learning Repository - Online Retail Dataset](https://archive.ics.uci.edu/ml/datasets/online+retail)
- Description: Transactional data from an online retail store, suitable for customer segmentation and sales analysis.

3. Supermarket Sales Dataset:
- Link: [Supermarket Sales Dataset](https://www.kaggle.com/aungpyaeap/supermarket-sales)
- Description: Sales data from a supermarket, useful for inventory management and sales performance analysis.

4. Yahoo Finance (Historical Stock Data):
- Link: [Yahoo Finance](https://finance.yahoo.com/)
- Description: Historical stock data for various companies, suitable for financial analysis and visualization.

5. Human Resources Analytics: Employee Attrition and Performance:
- Link: [Kaggle HR Analytics Dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset)
- Description: Employee data including demographics, performance, and attrition information, suitable for employee performance analysis.

Bonus Open Sources Resources: https://t.me/DataPortfolio/16

These datasets are freely available for practicing Power BI and SQL skills. You can download them from the provided links and import them into your SQL database management system (e.g., MySQL, SQL Server, PostgreSQL) for hands-on โ˜บ๏ธ๐Ÿ’ช
๐Ÿ‘15โค2
FitbitFitness Tracker Data.zip
4.2 MB
๐Ÿ“ฆ Datasets name: FitbitFitness Tracker Data: Capstone Project



๐ŸŒธ This dataset contains personal fitness tracker from thirty three eligible Fitbit users. This dataset was generated by respondents to a distributed survey via Amazon Mechanical Turk between the 12th of April, 2016 and the 12th of May, 2016.
This dataset has been cleaned, formatted with the date & time columns separated into 2 columns (one for date and the other for 24-hr time format) to prepare for the analysis done in SQL and visualisation in Tableau.


๐ŸŒ Format: CSV file

๐Ÿ” From: Kaggle
Metaverse Financial Transactions.zip
5.2 MB
๐Ÿ“ฆ Datasets name: Metaverse Financial Transactions


๐ŸŒธ This dataset provides blockchain financial transactions within the Open Metaverse, aiming to provide a rich, diverse, and realistic set of data for developing and testing anomaly detection models, fraud analysis, and predictive analytics in virtual environments. With a focus on applicability, this dataset captures various transaction types, user behaviors, and risk profiles across a global network.


๐ŸŒ Format: CSV file

๐Ÿ” From: Kaggle
๐Ÿ‘16โค5
Don't forget to check these 10 SQL projects with corresponding datasets that you could use to practice your SQL skills:

1. Analysis of Sales Data:

(https://www.kaggle.com/kyanyoga/sample-sales-data)

2. HR Analytics:

(https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset)

3. Social Media Analytics:

(https://www.kaggle.com/datasets/ramjasmaurya/top-1000-social-media-channels)

4. Financial Data Analysis:

(https://www.kaggle.com/datasets/nitindatta/finance-data)

5. Healthcare Data Analysis:

(https://www.kaggle.com/cdc/mortality)

6. Customer Relationship Management:

(https://www.kaggle.com/pankajjsh06/ibm-watson-marketing-customer-value-data)

7. Web Analytics:

(https://www.kaggle.com/zynicide/wine-reviews)

8. E-commerce Analysis:

(https://www.kaggle.com/olistbr/brazilian-ecommerce)

9. Supply Chain Management:

(https://www.kaggle.com/datasets/harshsingh2209/supply-chain-analysis)

10. Inventory Management:

(https://www.kaggle.com/datasets?search=inventory+management)

Share this channel with your friends ๐Ÿค๐Ÿคฉ

Join for more -> https://t.me/addlist/ID95piZJZa0wYzk5

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
๐Ÿ‘13๐Ÿ”ฅ2โค1
Free Python certification course from Google that you should not miss in 2024.

Link: https://www.kaggle.com/learn/python
๐Ÿ‘4โค1
Free Datasets to practice data science projects

1. Enron Email Dataset

Data Link: https://www.cs.cmu.edu/~enron/

2. Chatbot Intents Dataset

Data Link: https://github.com/katanaml/katana-assistant/blob/master/mlbackend/intents.json

3. Flickr 30k Dataset

Data Link: https://www.kaggle.com/hsankesara/flickr-image-dataset

4. Parkinson Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/parkinsons

5. Iris Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/Iris

6. ImageNet dataset

Data Link: http://www.image-net.org/

7. Mall Customers Dataset

Data Link: https://www.kaggle.com/shwetabh123/mall-customers

8. Google Trends Data Portal

Data Link: https://trends.google.com/trends/

9. The Boston Housing Dataset

Data Link: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

10. Uber Pickups Dataset

Data Link: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city

11. Recommender Systems Dataset

Data Link: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Source Code: https://bit.ly/37iBDEp

12. UCI Spambase Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/Spambase

13. GTSRB (German traffic sign recognition benchmark) Dataset

Data Link: http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset

Source Code: https://bit.ly/39taSyH

14. Cityscapes Dataset

Data Link: https://www.cityscapes-dataset.com/

15. Kinetics Dataset

Data Link: https://deepmind.com/research/open-source/kinetics

16. IMDB-Wiki dataset

Data Link: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/


17. Color Detection Dataset

Data Link: https://github.com/codebrainz/color-names/blob/master/output/colors.csv


18. Urban Sound 8K dataset

Data Link: https://urbansounddataset.weebly.com/urbansound8k.html

19. Librispeech Dataset

Data Link: http://www.openslr.org/12

20. Breast Histopathology Images Dataset

Data Link: https://www.kaggle.com/paultimothymooney/breast-histopathology-images

21. Youtube 8M Dataset

Data Link: https://research.google.com/youtube8m/

Join for more -> https://t.me/addlist/ID95piZJZa0wYzk5

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
๐Ÿ‘11โค1
Data Cleaning Checklist:

If you're just starting out in the world of data analytics, hopefully this checklist helps demystify the concept of "data cleaning"...

โ˜‘ Missing data - Decide if youโ€™re going to omit the datapoint, mathematically estimate the missing data using statistical methods, or use an external source to fill in the missing data.

โ˜‘ Duplicate data - Identify duplicate data and what it means in context. Is the duplicate an error that needs to be deleted? Or is it possible that you could have two of the same data point?

โ˜‘ Formatting errors - Ensure all data is rounded to the correct decimal place, all data is aligned correctly, and the data format is consistent within columns.

โ˜‘ Incorrect data types - Ensure all of your data is pulled as the correct data type (ex. making sure that integers are not used for money values).

โ˜‘ Outliers - Identify data points that are +/- 2 standard deviations from the mean, and double check that these values are correct. If they are correct, they may require further investigation.
๐Ÿ‘7๐Ÿ”ฅ2
5 Handy Tips to master Data Science โฌ‡๏ธ


1๏ธโƒฃ Begin with introductory projects that cover the fundamental concepts of data science, such as data exploration, cleaning, and visualization. These projects will help you get familiar with common data science tools and libraries like Python (Pandas, NumPy, Matplotlib), R, SQL, and Excel

2๏ธโƒฃ Look for publicly available datasets from sources like Kaggle, UCI Machine Learning Repository. Working with real-world data will expose you to the challenges of messy, incomplete, and heterogeneous data, which is common in practical scenarios.

3๏ธโƒฃ Explore various data science techniques like regression, classification, clustering, and time series analysis. Apply these techniques to different datasets and domains to gain a broader understanding of their strengths, weaknesses, and appropriate use cases.

4๏ธโƒฃ Work on projects that involve the entire data science lifecycle, from data collection and cleaning to model building, evaluation, and deployment. This will help you understand how different components of the data science process fit together.

5๏ธโƒฃ Consistent practice is key to mastering any skill. Set aside dedicated time to work on data science projects, and gradually increase the complexity and scope of your projects as you gain more experience.
๐Ÿ‘5โค4
๐Ÿš€Here are 5 fresh Project ideas for Data Analysts ๐Ÿ‘‡

๐ŸŽฏ ๐—”๐—ถ๐—ฟ๐—ฏ๐—ป๐—ฏ ๐—ข๐—ฝ๐—ฒ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐Ÿ 
https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata

๐Ÿ’กThis dataset describes the listing activity of homestays in New York City

๐ŸŽฏ ๐—ง๐—ผ๐—ฝ ๐—ฆ๐—ฝ๐—ผ๐˜๐—ถ๐—ณ๐˜† ๐˜€๐—ผ๐—ป๐—ด๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐Ÿฎ๐Ÿฌ๐Ÿญ๐Ÿฌ-๐Ÿฎ๐Ÿฌ๐Ÿญ๐Ÿต ๐ŸŽต

https://www.kaggle.com/datasets/leonardopena/top-spotify-songs-from-20102019-by-year

๐ŸŽฏ๐—ช๐—ฎ๐—น๐—บ๐—ฎ๐—ฟ๐˜ ๐—ฆ๐˜๐—ผ๐—ฟ๐—ฒ ๐—ฆ๐—ฎ๐—น๐—ฒ๐˜€ ๐—™๐—ผ๐—ฟ๐—ฒ๐—ฐ๐—ฎ๐˜€๐˜๐—ถ๐—ป๐—ด ๐Ÿ“ˆ

https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data
๐Ÿ’กUse historical markdown data to predict store sales

๐ŸŽฏ ๐—ก๐—ฒ๐˜๐—ณ๐—น๐—ถ๐˜… ๐— ๐—ผ๐˜ƒ๐—ถ๐—ฒ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—ง๐—ฉ ๐—ฆ๐—ต๐—ผ๐˜„๐˜€ ๐Ÿ“บ

https://www.kaggle.com/datasets/shivamb/netflix-shows
๐Ÿ’กListings of movies and tv shows on Netflix - Regularly Updated

๐ŸŽฏ๐—Ÿ๐—ถ๐—ป๐—ธ๐—ฒ๐—ฑ๐—œ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜€๐˜ ๐—ท๐—ผ๐—ฏ๐˜€ ๐—น๐—ถ๐˜€๐˜๐—ถ๐—ป๐—ด๐˜€ ๐Ÿ’ผ

https://www.kaggle.com/datasets/cedricaubin/linkedin-data-analyst-jobs-listings
๐Ÿ’กMore than 8400 rows of data analyst jobs from USA, Canada and Africa.

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
๐Ÿ‘11
Python-2.pdf
5 MB
Python Tutorial in Jupyter Notebook
๐Ÿ‘9โค2
๐Ÿ”’ Dataset Name: Spotify Songs Album

๐Ÿ” This dataset provides concise details about music tracks and their performance across various platforms. It includes essential information like track name, artist(s), release date, and presence in popular playlists and charts on platforms like Spotify, Apple Music, Deezer, and Shazam. Additionally, it features metrics such as BPM, key, mode, danceability, valence, energy, acousticness, instrumentalness, and liveness_speechiness, which offer insights into the musical characteristics and appeal of each track.

๐Ÿ’ก With this data, analysts can evaluate the popularity, genre, and audience engagement of different music offerings across multiple streaming services.

๐ŸคŒ From: Kaggle

๐Ÿค– Size: 47.1 kB
๐Ÿ‘5โค2