Kaggle Data Hub
29.1K subscribers
895 photos
14 videos
309 files
1.16K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Colorectal Cancer Global Dataset & Predictions

Predicting Colorectal Cancer Outcomes Based on Global Health Trends

This dataset contains real-world information about colorectal cancer cases from different countries. It includes patient demographics, lifestyle risks, medical history, cancer stage, treatment types, survival chances, and healthcare costs. The dataset follows global trends in colorectal cancer incidence, mortality, and prevention.

Dataset Structure
Each row represents an individual case, and the columns include:

Patient_ID (Unique identifier)
Country (Based on incidence distribution)
Age (Following colorectal cancer age trends)
Gender (M/F, considering men have 30-40% higher risk)
Cancer_Stage (Localized, Regional, Metastatic)
Tumor_Size_mm (Randomized within medical limits)
Family_History (Yes/No)
Smoking_History (Yes/No)
Alcohol_Consumption (Yes/No)
Obesity_BMI (Normal/Overweight/Obese)
Diet_Risk (Low/Moderate/High)
Physical_Activity (Low/Moderate/High)
Diabetes (Yes/No)
Inflammatory_Bowel_Disease (Yes/No)
Genetic_Mutation (Yes/No)
Screening_History (Regular/Irregular/Never)
Early_Detection (Yes/No)
Treatment_Type (Surgery/Chemotherapy/Radiotherapy/Combination)
Survival_5_years (Yes/No)
Mortality (Yes/No)
Healthcare_Costs (Country-dependent, $25K-$100K+)
Incidence_Rate_per_100K (Country-level prevalence)
Mortality_Rate_per_100K (Country-level mortality)
Urban_or_Rural (Urban/Rural)
Economic_Classification (Developed/Developing)
Healthcare_Access (Low/Moderate/High)
Insurance_Status (Insured/Uninsured)
Survival_Prediction (Yes/No, based on factors)
4👍3
🪙 +30.560$ with 300$ in a month of trading! We can teach you how to earn! FREE!

It was a challenge - a marathon 300$ to 30.000$ on trading, together with Lisa!

What is the essence of earning?: "Analyze and open a deal on the exchange, knowing where the currency rate will go. Lisa trades every day and posts signals on her channel for free."

🔹Start: $150
🔹 Goal: $20,000
🔹Period: 1.5 months.

Join and get started, there will be no second chance👇

https://t.me/+FPmafQ5jbDYyODBi
👍81🔥1
Car Number Plate Dataset (YOLO Format)

Car Number Plate Dataset with labels in #YOLO format (Label, Xc, Yc, W, H)

Dataset: Car License Plate Detection

This dataset consists of images of car license plates, paired with their corresponding annotations in YOLO format. It is designed for training and evaluating models focused on detecting car license plates in images. The dataset was derived from the Car License Plate Detection dataset on Kaggle and has been split into training and testing subsets.

Dataset Overview:
Total Images: 433 car license plate images
Image Format: .png
Annotation Format: YOLO (Label, Xc, Yc, Width, Height)
Image Resolution: Varies
Annotations: Bounding box coordinates for car license plates, normalized to image dimensions

Dataset Structure:
The dataset is divided into two main directories: train and test. Each directory contains two subdirectories: images and labels.
train: Contains 346 images and corresponding YOLO annotation files for training
test: Contains 87 images and corresponding YOLO annotation files for testing

Each image file (e.g., Cars0.png) is paired with a corresponding annotation file (e.g., Cars0.txt).
The annotation files contain the following information in YOLO format:
Label: The class of the object (for this dataset, it will always be 0, representing car license plates).
Xc, Yc: Center coordinates of the bounding box, normalized to the width and height of the image.
W, H: Width and height of the bounding box, also normalized.

File Information:
train/images/: 346 .png image files of car license plates.
train/labels/: 346 .txt annotation files in YOLO format.
test/images/: 87 .png image files for testing.
test/labels/: 87 .txt annotation files in YOLO format.
data.yaml: Configuration file with dataset details.

Dataset Splitting:
Training Set: 346 images (80% of total dataset)
Test Set: 87 images (20% of total dataset)

Example:
An example annotation for a license plate might look like this:
0 0.548 0.612 0.432 0.075

Where:
0: Class label (always 0 for license plates).
0.548: X-center (normalized to image width).
0.612: Y-center (normalized to image height).
0.432: Width of the bounding box (normalized to image width).
0.075: Height of the bounding box (normalized to image height).
👍71
ASL Alphabet

Image data set for alphabets in the American Sign Language

Content
The training data set contains 87,000 images which are 200x200 pixels. There are 29 classes, of which 26 are for the letters A-Z and 3 classes for SPACE, DELETE and NOTHING.
These 3 classes are very helpful in real-time applications, and classification.
The test data set contains a mere 29 images, to encourage the use of real-world test images.

enter image description here
https://www.nidcd.nih.gov/sites/default/files/Content%20Images/NIDCD-ASL-hands-2014.jpg
👍3
pothole, cracks and openmanholes (Road Hazards)

The dataset includes train and valid sets with annotations

This dataset contains 2,700 images focused on detecting potholes, cracks, and open manholes on roads. It has been augmented to enhance the variety and robustness of the data. The images are organized into training and validation sets, with three distinct categories:

Potholes: class 0
Cracks: class 1
Open Manholes: class 2

The dataset includes bounding box annotations in .txt files formatted for YOLOv8s, ensuring compatibility for model training. It is structured into separate folders for each class and contains train, valid, and all classes folders, allowing for easy access and custom augmentation. The dataset is designed for further model training, testing, and custom augmentation tasks related to road safety and infrastructure detection.
Usability
Signature

Detects human signatures in legal and general documents

Dataset Structure
The signature detection dataset is split into three subsets:
Training set: Contains 143 images, each with corresponding annotations.
Validation set: Includes 35 images, each with paired annotations.
👍62
🌟SPOTO AI Free Resources - Grab Yours Now! 🚀

👉 How to Get It?
Click the link below to access the resources.
Download and start learning instantly!

📥🔗Download for Free AI Materials: https://bit.ly/3F3lc5B
🔗📝Download Free Python/AI/Microsoft/Excel Study Course:https://bit.ly/3F4smWZ

🥳Don’t miss out on this opportunity to boost your career and stay ahead of the curve. 🏃‍♂️Share this with your friends and let’s grow together! 🌟

🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
👍4
Alzheimer MRI Disease Classification Dataset

Dataset focuses on the classification of Alzheimer's disease based on MRI scans.

Introduction

Alzheimer MRI Disease Classification dataset is a valuable resource for researchers and health medicine applications. This dataset focuses on the classification of Alzheimer's disease based on MRI scans. The dataset consists of brain MRI images labeled into four categories:

'0': Mild_Demented

'1': Moderate_Demented

'2': Non_Demented

'3': Very_Mild_Demented
Dataset Information

Train split:

Name: train

Number of bytes: 22,560,791.2

Number of examples: 5,120

Test split:

Name: test

Number of bytes: 5,637,447.08

Number of examples: 1,280

Download size: 28,289,848 bytes

Dataset size: 28,198,238.28 bytes
👍3🔥1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3🔥1
A Large Scale Fish Dataset

A Large-Scale Dataset for Fish Segmentation and Classification

The dataset contains 9 different seafood types. For each class, there are 1000 augmented images and their pair-wise augmented ground truths.
Each class can be found in the "Fish_Dataset" file with their ground truth labels. All images for each class are ordered from "00000.png" to "01000.png".

For example, if you want to access the ground truth images of the shrimp in the dataset, the order should be followed is "Fish->Shrimp->Shrimp GT".
👍42🔥1
Kaggle Data Hub
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5🔥1
🎁❗️TODAY FREE❗️🎁

Entry to our VIP channel is completely free today. Tomorrow it will cost $500! 🔥

JOIN 👇

https://t.me/+1TWrwFRud4U1YTVi
https://t.me/+1TWrwFRud4U1YTVi
https://t.me/+1TWrwFRud4U1YTVi
👍21
Alzheimer's Disease Multiclass Images Dataset

Alzheimer's Disease dataset split into 4 classes

About Dataset

The Alzheimer's Disease Multiclass Dataset contains approximately 44,000 MRI images categorized into four distinct classes based on the severity of Alzheimer's disease. This dataset is intended for use in machine learning model training and testing. All images are skull-stripped and clean of non-brain tissue.

Dataset Structure
The dataset is organized into the following four directories, each representing a different class of disease severity:
NonDemented: Contains 12,800 MRI images of subjects with no signs of dementia.
VeryMildDemented: Contains 11,200 MRI images of subjects with very mild symptoms of dementia.
MildDemented: Contains 10,000 MRI images of subjects with mild dementia.
ModerateDemented: Contains 10,000 MRI images of subjects with moderate dementia.

Image Details
Total Number of Images: 44,000
Image Format: MRI scans as .JPG files
Image Usage: Suitable for training and testing machine learning models focused on classifying Alzheimer's disease stages.

Disease Severity Classification
The dataset follows a severity ranking system for Alzheimer's disease:
NonDemented: No dementia.
Very Mild Demented: Early signs of dementia, very mild symptoms.
Mild Demented: Clear signs of dementia, but still mild.
Moderate Demented: More pronounced symptoms of dementia, moderate severity.
👍11🔥1
Forwarded from Kaggle Data Hub
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5🔥1
Mammogram Mass Analyzer Desktop App

A free desktop breast cancer detection app that accepts dicom files.

Mammogram Mass Analyzer

This is a free desktop computer aided diagnosis (CAD) tool that uses computer vision to detect and localize masses on full field digital mammograms. It's a flask app that's running on the desktop. Internally there are two Yolov5L ensembled models that were trained on data from the VinDr-Mammo dataset. The model ensemble has a validation accuracy of 0.65 and a validation recall of 0.63.

My aim was to create a proof of concept for a free desktop computer aided diagnosis (CAD) system that could be used as an aid when diagnosing breast cancer. Unlike a web app, this tool does not need an internet connection and there are no monthly costs for hosting and web server rental. I think a desktop tool could be helpful to radiologists in private practice and to medical non-profits that work in remote areas.

The complete project folder, including the trained models, is stored in this Kaggle dataset.
👍4🔥1