Kaggle Data Hub
29.2K subscribers
915 photos
14 videos
309 files
1.18K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔥 Accelerate Your IT Career with FREE Certification Kits!
🚀 Get Hired Faster—Zero Cost!
Grab expert guides, labs, and courses for AWS, Azure, AI, Python, Cyber Security, and beyond—100% FREE, no hidden fees!
CLICK your field👇
DOWNLOAD & dominate your goals!
🔗 AWS + Azure Cloud Mastery: https://bit.ly/44S0dNS
🔗 AI & Machine Learning Starter Kit: https://bit.ly/3FrKw5H
🔗 Python, Excel, Cyber Security Courses: https://bit.ly/4mFrA4g
📘 FREE Career Hack: IT Success Roadmap E-book ➔ https://bit.ly/3Z6JS49

🚨 Limited Time! Act FAST!
📱 Join Our IT Study Group: https://bit.ly/43piMq8
💬 1-on-1 Exam Help: https://wa.link/sbpp0m
Your dream job won’t wait—GRAB YOUR RESOURCES NOW! 💻
FitBit dataset

Fitness tracker data from smart watch device usage

About Dataset:
This is a Kaggle data set that contains personal fitness tracker data from thirty Fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits
4
Please open Telegram to view this post
VIEW IN TELEGRAM
7
Please open Telegram to view this post
VIEW IN TELEGRAM
3
ASL Alphabet

Image data set for alphabets in the American Sign Language

About
The data set is a collection of images of alphabets from the American Sign Language, separated in 29 folders which represent the various classes.

Content
The training data set contains 87,000 images which are 200x200 pixels. There are 29 classes, of which 26 are for the letters A-Z and 3 classes for SPACE, DELETE and NOTHING.
These 3 classes are very helpful in real-time applications, and classification.
The test data set contains a mere 29 images, to encourage the use of real-world test images.
5
Please open Telegram to view this post
VIEW IN TELEGRAM
4
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.me/addlist/8_rRW2scgfRhOTc0

https://t.me/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
2
Fundus Image Dataset for Vessel Segmentation

High-resolution manually annotated fundus images for vessel segmentation

About Dataset
The FIVES (Fundus Image Vessel Segmentation) dataset comprises 800 high-resolution color fundus photographs manually annotated at the pixel level for retinal vessel segmentation. The images represent a wide range of ages (4–83 years) and include various ocular conditions such as diabetic retinopathy, age-related macular degeneration, and glaucoma. The annotations were standardized via expert crowdsourcing. Each image was further assessed for three quality aspects (illumination and color distortion, blur, and low contrast) using published automatic algorithms. This dataset is currently the largest publicly available collection for retinal vessel segmentation and is designed to facilitate the development and evaluation of AI-based segmentation models.
The dataset supports automated analysis of retinal vasculature for ophthalmological and systemic disease assessment and contributes significantly to advancing the field of AI in medical imaging.
3
Fundus Image Dataset for Vessel Segmentation.zip
1.6 GB
Fundus Image Dataset for Vessel Segmentation

✉️ Our Telegram channels: https://t.me/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
5
🙏💸 500$ FOR THE FIRST 500 WHO JOIN THE CHANNEL! 🙏💸

Join our channel today for free! Tomorrow it will cost 500$!

https://t.me/+Y4vkzbTTshVhYTQ1

You can join at this link! 👆👇

https://t.me/+Y4vkzbTTshVhYTQ1
Pavement Dataset
Synthetic Dataset on Road Pavements (Educational Purposes)

🏗 Pavement Condition Monitoring and Maintenance Prediction
📘 Scenario

You are a data analyst for a city engineering office tasked with identifying which road segments require urgent maintenance. The office has collected inspection data on various roads, including surface conditions, traffic volume, and environmental factors.

Your goal is to analyze this data and build a binary classification model to predict whether a given road segment needs maintenance, based on pavement and environmental indicators.
🔍 Target Variable: Needs_Maintenance

This binary label indicates whether the road segment requires immediate maintenance, defined by the following rule:

Needs_Maintenance = 1
Needs_Maintenance = 0 otherwise
Please open Telegram to view this post
VIEW IN TELEGRAM
5
🟢 Name Of Dataset: BAH (Behavioural Ambivalence/Hesitancy)

🟢 Description Of Dataset:
Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H.This paper introduces a first Behavioural Ambivalence/Hesitancy ( BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data.Additionally, this paper provides preliminary benchmarking results baseline models for BAH at frame- and video-level recognition with mono- and multi-modal setups. It also includes results on models for zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.

🟢 Official Homepage: https://github.com/sbelharbi/bah-dataset

🟢 Number of articles that used this dataset: 1

🟢 Dataset Loaders:
Not found

🟢 Articles related to the dataset:
📝 BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change

==================================
🔴 For more data science resources:
https://t.me/DataScienceT
4👍1
🟢 Name Of Dataset: ITDD (Industrial Textile Defect Detection)

🟢 Description Of Dataset:
The Industrial Textile Defect Detection (ITDD) dataset includes 1885 industrial textile images categorized into 4 categories: cotton fabric, dyed fabric, hemp fabric, and plaid fabric. These classes are collected from the industrial production sites of WEIQIAO Textile. ITDD is an upgraded version of WFDD that reorganizes three original classes and adds one new class.

🟢 Official Homepage: https://github.com/cqylunlun/CRAS?tab=readme-ov-file#dataset-release

🟢 Number of articles that used this dataset: 1

🟢 Dataset Loaders:
Not found

🟢 Articles related to the dataset:
📝 Center-aware Residual Anomaly Synthesis for Multi-class Industrial Anomaly Detection

==================================
🔴 For more data science resources:
https://t.me/DataScienceT
1
🟢 Name Of Dataset: EGC-FPHFS (Early Gastric Cancer Data from First People's Hospital of Foshan)

🟢 Description Of Dataset:
High-resolution early gastric cancer (EGC) detection and analysis: Patient Data:Datasets often include images from patients diagnosed with gastric cancer, specifically distinguishing between early gastric cancer (EGC) and Non -pathogenic gastric cancer (NGC). The study utilized data from 341 patients, with 124 classified as EGC and 217 as NGC. Image Types: High-resolution images are typically obtained from endoscopy image. Data Volume: The size of datasets mentioned a dataset of 1120 images specifically for EGC detection and 2150 images for NGC.

🟢 Official Homepage: https://github.com/liu37972/Fuzzy-Seg-Deep-DuS-KFCM-.git

🟢 Number of articles that used this dataset: 1

🟢 Dataset Loaders:
Not found

🟢 Articles related to the dataset:
📝 Gastric histopathology image segmentation using a hierarchical conditional random field

==================================
🔴 For more data science resources:
https://t.me/DataScienceT
👍31
🟢 Name Of Dataset: MuJoCo

🟢 Description Of Dataset:
MuJoCo(multi-joint dynamics with contact) is a physics engine used to implement environments to benchmark Reinforcement Learning methods.

🟢 Official Homepage: https://www.mujoco.org/

🟢 Number of articles that used this dataset: 1613

🟢 Dataset Loaders:
deepmind/mujoco:
https://github.com/deepmind/mujoco

🟢 Articles related to the dataset:
📝 Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

📝 Proximal Policy Optimization Algorithms

📝 Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

📝 Model-Based Reinforcement Learning via Meta-Policy Optimization

📝 Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

📝 Primal Wasserstein Imitation Learning

📝 Unity: A General Platform for Intelligent Agents

📝 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

📝 Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

📝 Trust Region Policy Optimization

==================================
🔴 For more data science resources:
https://t.me/DataScienceT
4
🟢 Name Of Dataset: OpenAI Gym

🟢 Description Of Dataset:
OpenAI Gymis a toolkit for developing and comparing reinforcement learning algorithms. It includes environment such as Algorithmic, Atari, Box2D, Classic Control, MuJoCo, Robotics, and Toy Text.Source:https://github.com/openai/gym

🟢 Official Homepage: https://gym.openai.com/

🟢 Number of articles that used this dataset: 1296

🟢 Dataset Loaders:
openai/gym:
https://github.com/openai/gym/blob/master/docs/environments.md

🟢 Articles related to the dataset:
📝 Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

📝 Parameter Space Noise for Exploration

📝 Proximal Policy Optimization Algorithms

📝 Continuous control with deep reinforcement learning

📝 OpenAI Gym

📝 NoRML: No-Reward Meta Learning

📝 SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models

📝 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

📝 FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning

📝 Dynamic Datasets and Market Environments for Financial Reinforcement Learning

==================================
🔴 For more datasets resources:
https://t.me/DataScienceT
4
🟢 Name Of Dataset: InBreast

🟢 Description Of Dataset:
Rationale and objectives: Computer-aided detection and diagnosis (CAD) systems have been developed in the past two decades to assist radiologists in the detection and diagnosis of lesions seen on breast imaging exams, thus providing a second opinion. Mammographic databases play an important role in the development of algorithms aiming at the detection and diagnosis of mammary lesions. However, available databases often do not take into consideration all the requirements needed for research and study purposes. This article aims to present and detail a new mammographic database.Materials and methods: Images were acquired at a breast center located in a university hospital (Centro Hospitalar de S. João [CHSJ], Breast Centre, Porto) with the permission of the Portuguese National Committee of Data Protection and Hospital's Ethics Committee. MammoNovation Siemens full-field digital mammography, with a solid-state detector of amorphous selenium was used.Results:The new database-INbreast-has a total of 115 cases (410 images) from which 90 cases are from women with both breasts affected (four images per case) and 25 cases are from mastectomy patients (two images per case). Several types of lesions (masses, calcifications, asymmetries, and distortions) were included. Accurate contours made by specialists are also provided in XML format.Conclusion: The strengths of the actually presented database-INbreast-relies on the fact that it was built with full-field digital mammograms (in opposition to digitized mammograms), it presents a wide variability of cases, and is made publicly available together with precise annotations. We believe that this database can be a reference for future works centered or related to breast cancer imaging.

🟢 Official Homepage: Not found

🟢 Number of articles that used this dataset: 106

🟢 Dataset Loaders:
ngohongthong1832004/inBreast:
https://github.com/ngohongthong1832004/inBreast

🟢 Articles related to the dataset:
📝 MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis

📝 Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography

📝 End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design

📝 Multi-view Local Co-occurrence and Global Consistency Learning Improve Mammogram Classification Generalisation

📝 medigan: a Python library of pretrained generative models for medical image synthesis

📝 High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

📝 Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification

📝 Detecting and classifying lesions in mammograms with Deep Learning

📝 Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

📝 Unbiased Mean Teacher for Cross-domain Object Detection

==================================
🔴 For more datasets resources:
https://t.me/Datasets1
5