Kaggle Data Hub
29.2K subscribers
933 photos
14 videos
309 files
1.2K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🟒 Name Of Dataset: ETT (Electricity Transformer Temperature)

🟒 Description Of Dataset:
TheElectricity Transformer Temperature(ETT) is a crucial indicator in the electric power long-term deployment. This dataset consists of 2 years data from two separated counties in China. To explore the granularity on the Long sequence time-series forecasting (LSTF) problem, different subsets are created, {ETTh1, ETTh2} for 1-hour-level and ETTm1 for 15-minutes-level. Each data point consists of the target value ”oil temperature” and 6 power load features. The train/val/test is 12/4/4 months.Source:https://arxiv.org/pdf/2012.07436.pdf

🟒 Official Homepage: https://github.com/zhouhaoyi/ETDataset

🟒 Number of articles that used this dataset: 318

🟒 Dataset Loaders:
zhouhaoyi/ETDataset:
https://github.com/zhouhaoyi/ETDataset

🟒 Articles related to the dataset:
πŸ“ TSMixer: An All-MLP Architecture for Time Series Forecasting

πŸ“ A decoder-only foundation model for time-series forecasting

πŸ“ Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting

πŸ“ Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

πŸ“ Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

πŸ“ A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

πŸ“ iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

πŸ“ TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

πŸ“ TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

πŸ“ FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀6
🟒 Name Of Dataset: OoDIS (Anomaly Instance Segmentation Benchmark)

🟒 Description Of Dataset:
OoDIS is a benchmark dataset for anomaly instance segmentation, crucial for autonomous vehicle safety. It extends existing anomaly segmentation benchmarks to focus on the segmentation of individual out-of-distribution (OOD) objects.The dataset addresses the need for identifying and segmenting unknown objects, which are critical to avoid accidents. It includes diverse scenes with various anomalies, pushing the boundaries of current segmentation capabilities.The benchmark is focused on evaluation of detection and instance segmentation of unexpected obstacles on roads.For more details, refer to theOoDIS paper

🟒 Official Homepage: https://kumuji.github.io/oodis_website/

🟒 Number of articles that used this dataset: 5

🟒 Dataset Loaders:
kumuji/ugains:
https://github.com/kumuji/ugains

🟒 Articles related to the dataset:
πŸ“ Unmasking Anomalies in Road-Scene Segmentation

πŸ“ UGainS: Uncertainty Guided Anomaly Instance Segmentation

πŸ“ OoDIS: Anomaly Instance Segmentation Benchmark

πŸ“ Segmenting Known Objects and Unseen Unknowns without Prior Knowledge

πŸ“ On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀4
🟒 Name Of Dataset: InfoSeek (Visual Information Seeking)

🟒 Description Of Dataset:
In this project, we introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions that cannot be answered with only common sense knowledge. Using InfoSeek, we analyze various pre-trained visual question answering models and gain insights into their characteristics. Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset elicits models to use fine-grained knowledge that was learned during their pre-training.

🟒 Official Homepage: https://open-vision-language.github.io/infoseek/

🟒 Number of articles that used this dataset: 35

🟒 Dataset Loaders:
Not found

🟒 Articles related to the dataset:
πŸ“ BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

πŸ“ LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

πŸ“ Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

πŸ“ Ming-Omni: A Unified Multimodal Model for Perception and Generation

πŸ“ Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

πŸ“ PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

πŸ“ Safety of Multimodal Large Language Models on Images and Texts

πŸ“ PaLI-X: On Scaling up a Multilingual Vision and Language Model

πŸ“ MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

πŸ“ Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀4
🟒 Name Of Dataset: UIIS10K (General Underwater Image Instance Segmentation dataset 10K)

🟒 Description Of Dataset:
We propose a large-scale underwater instance segmentation dataset, UIIS10K, which includes 10,048 images with pixel-level annotations for 10 categories. As far as we know, this is the largest underwater instance segmentation dataset available and can be used as a benchmark for evaluating underwater segmentation methods.

🟒 Official Homepage: https://github.com/LiamLian0727/UIIS10K

🟒 Number of articles that used this dataset: 3

🟒 Dataset Loaders:
Not found

🟒 Articles related to the dataset:
πŸ“ WaterMask: Instance Segmentation for Underwater Imagery

πŸ“ A Unified Image-Dense Annotation Generation Model for Underwater Scenes

πŸ“ UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀6
Please open Telegram to view this post
VIEW IN TELEGRAM
🟒 Name Of Dataset: 1

🟒 Description Of Dataset:
111

🟒 Official Homepage: Not found

🟒 Number of articles that used this dataset: 28

🟒 Dataset Loaders:
Not found

🟒 Articles related to the dataset:
πŸ“ NeMo Inverse Text Normalization: From Development To Production

πŸ“ Open Deep Search: Democratizing Search with Open-source Reasoning Agents

πŸ“ Deep Learning in Single-Cell Analysis

πŸ“ Enhancing Fine-grained Sentiment Classification Exploiting Local Context Embedding

πŸ“ UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer

πŸ“ Representation Learning with Large Language Models for Recommendation

πŸ“ Short-Term Aggregated Residential Load Forecasting using BiLSTM and CNN-BiLSTM

πŸ“ K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce

πŸ“ Semi-supervised Sequence Modeling for Elastic Impedance Inversion

πŸ“ CholecTrack20: A Dataset for Multi-Class Multiple Tool Tracking in Laparoscopic Surgery

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀4πŸ‘1
πŸ”₯ The coolest AI bot on Telegram

πŸ’’ Completely free and knows everything, from simple questions to complex problems.

β˜•οΈ Helps you with anything in the easiest and fastest way possible.

♨️ You can even choose girlfriend or boyfriend mode and chat as if you’re talking to a real person πŸ˜‹

πŸ’΅ Includes weekly and monthly airdrops!❗️

πŸ˜΅β€πŸ’« Bot ID: @chatgpt_officialbot

πŸ’Ž The best part is, even group admins can use it right inside their groups! ✨

πŸ“Ί Try now:

β€’ Type FunFact! for a jaw-dropping AI trivia.
β€’ Type RecipePlease! for a quick, tasty meal idea.
β€’ Type JokeTime! for an instant laugh.

Or just say Surprise me! and I'll pick something awesome for you. πŸ€–βœ¨
❀5
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

βœ… https://t.me/addlist/8_rRW2scgfRhOTc0

βœ… https://t.me/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❀3
❗️ JAY HELPS EVERYONE EARN MONEY!$29,000 HE'S GIVING AWAY TODAY!

Everyone can join his channel and make money! He gives away from $200 to $5.000 every day in his channel

https://t.me/+LgzKy2hA4eY0YWNl

⚑️FREE ONLY FOR THE FIRST 500 SUBSCRIBERS! FURTHER ENTRY IS PAID! πŸ‘†πŸ‘‡

https://t.me/+LgzKy2hA4eY0YWNl
❀5
🟒 Name Of Dataset: MIT-BIH Arrhythmia Database

🟒 Description Of Dataset:
The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.The recordings were digitized at 360 samples per second per channel with 11-bit resolution over a 10 mV range. Two or more cardiologists independently annotated each record; disagreements were resolved to obtain the computer-readable reference annotations for each beat (approximately 110,000 annotations in all) included with the database.This directory contains the entire MIT-BIH Arrhythmia Database. About half (25 of 48 complete records, and reference annotation files for all 48 records) of this database has been freely available here since PhysioNet's inception in September 1999. The 23 remaining signal files, which had been available only on the MIT-BIH Arrhythmia Database CD-ROM, were posted here in February 2005.Much more information about this database may be found in theMIT-BIH Arrhythmia Database Directory.

🟒 Official Homepage: https://physionet.org/content/mitdb/1.0.0/

🟒 Number of articles that used this dataset: 31

🟒 Dataset Loaders:
Not found

🟒 Articles related to the dataset:
πŸ“ Inter- and intra- patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach

πŸ“ ECG Heartbeat Classification: A Deep Transferable Representation

πŸ“ Multi-module Recurrent Convolutional Neural Network with Transformer Encoder for ECG Arrhythmia Classification

πŸ“ Subject-Aware Contrastive Learning for Biosignals

πŸ“ Classification of Arrhythmia by Using Deep Learning with 2-D ECG Spectral Image Representation

πŸ“ A Personalized Zero-Shot ECG Arrhythmia Monitoring System: From Sparse Representation Based Domain Adaption to Energy Efficient Abnormal Beat Detection for Practical ECG Surveillance

πŸ“ AQuA: A Benchmarking Tool for Label Quality Assessment

πŸ“ Spot The Odd One Out: Regularized Complete Cycle Consistent Anomaly Detector GAN

πŸ“ Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks Quantization

πŸ“ MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀5
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

βœ… https://t.me/addlist/8_rRW2scgfRhOTc0

βœ… https://t.me/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❀5
🟒 Name Of Dataset: RVL-CDIP

🟒 Description Of Dataset:
TheRVL-CDIPdataset consists of scanned document images belonging to 16 classes such as letter, form, email, resume, memo, etc. The dataset has 320,000 training, 40,000 validation and 40,000 test images. The images are characterized by low quality, noise, and low resolution, typically 100 dpi.Source:Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

🟒 Official Homepage: https://www.cs.cmu.edu/~aharley/rvl-cdip/

🟒 Number of articles that used this dataset: Unknown

🟒 Dataset Loaders:
huggingface/datasets (rvl_cdip):
https://huggingface.co/datasets/rvl_cdip

huggingface/datasets (rvl-cdip_easyOCR):
https://huggingface.co/datasets/jordyvl/rvl-cdip_easyOCR

huggingface/datasets (rvl_cdip):
https://huggingface.co/datasets/aharley/rvl_cdip

huggingface/datasets (rvl_cdip_easyocr):
https://huggingface.co/datasets/jordyvl/rvl_cdip_easyocr

huggingface/datasets (rvl_cdip_mini):
https://huggingface.co/datasets/dvgodoy/rvl_cdip_mini

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀5
🟒 Name Of Dataset: FUNSD (Form Understanding in Noisy Scanned Documents)

🟒 Description Of Dataset:
Form Understanding in Noisy Scanned Documents (FUNSD) comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking.Source:FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

🟒 Official Homepage: https://guillaumejaume.github.io/FUNSD/

🟒 Number of articles that used this dataset: Unknown

🟒 Dataset Loaders:
huggingface/datasets:
https://huggingface.co/datasets/nielsr/FUNSD_layoutlmv2

mindee/doctr:
https://mindee.github.io/doctr/latest/datasets.html#doctr.datasets.FUNSD

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀4
🟒 Name Of Dataset: IIIT-AR-13K

🟒 Description Of Dataset:
IIIT-AR-13K is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection.Source:IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

🟒 Official Homepage: http://cvit.iiit.ac.in/usodi/iiitar13k.php

🟒 Number of articles that used this dataset: 6

🟒 Dataset Loaders:
Not found

🟒 Articles related to the dataset:
πŸ“ Deep learning for table detection and structure recognition: A survey

πŸ“ RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

πŸ“ The YOLO model that still excels in document layout analysis

πŸ“ IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

πŸ“ Document AI: Benchmarks, Models and Applications

πŸ“ Robust Table Detection and Structure Recognition from Heterogeneous Document Images

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀8
πŸš€ THE 7-DAY PROFIT CHALLENGE! πŸš€

Can you turn $100 into $5,000 in just 7 days?
Jay can. And she’s challenging YOU to do the same. πŸ‘‡

https://t.me/+QOcycXvRiYs4YTk1
https://t.me/+QOcycXvRiYs4YTk1
https://t.me/+QOcycXvRiYs4YTk1
❀2
🟒 Name Of Dataset: ICDAR 2013

🟒 Description Of Dataset:
TheICDAR 2013dataset consists of 229 training images and 233 testing images, with word-level annotations provided. It is the standard benchmark dataset for evaluating near-horizontal text detection.Source:Single Shot Text Detector with Regional Attention

🟒 Official Homepage: https://rrc.cvc.uab.es/?ch=2

🟒 Number of articles that used this dataset: Unknown

🟒 Dataset Loaders:
activeloopai/Hub:
https://docs.activeloop.ai/datasets/icdar-2013-dataset

mindee/doctr:
https://mindee.github.io/doctr/latest/datasets.html#doctr.datasets.IC13

tanglang96/DataLoaders_DALI:
https://github.com/tanglang96/DataLoaders_DALI

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀3
🟒 Name Of Dataset: UFPR-ALPR

🟒 Description Of Dataset:
This dataset includes 4,500 fully annotated images (over 30,000 license plate characters) from 150 vehicles in real-world scenarios where both the vehicle and the camera (inside another vehicle) are moving.The images were acquired with three different cameras and are available in the Portable Network Graphics (PNG) format with a size of 1,920 Γ— 1,080 pixels. The cameras used were: GoPro Hero4 Silver, Huawei P9 Lite, and iPhone 7 Plus.We collected 1,500 images with each camera, divided as follows:- 900 of cars with gray license plates;- 300 of cars with red license plates;- 300 of motorcycles with gray license plates.The dataset is split as follows: 40% for training, 40% for testing and 20% for validation. Every image has the following annotations available in a text file: the camera in which the image was taken, the vehicle’s position and information such as type (car or motorcycle), manufacturer, model and year; the identification and position of the license plate, as well as the position of its characters.Source:A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector

🟒 Official Homepage: https://web.inf.ufpr.br/vri/databases/ufpr-alpr/

🟒 Number of articles that used this dataset: Unknown

🟒 Dataset Loaders:
ultralytics/yolov5:
https://github.com/ultralytics/yolov5

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀2
🟒 Name Of Dataset: PHM2017

🟒 Description Of Dataset:
PHM2017 is a new dataset consisting of 7,192 English tweets across six diseases and conditions: Alzheimer’s Disease, heart attack (any severity), Parkinson’s disease, cancer (any type), Depression (any severity), and Stroke. The Twitter search API was used to retrieve the data using the colloquial disease names as search keywords, with the expectation of retrieving a high-recall, low precision dataset. After removing the re-tweets and replies, the tweets were manually annotated. The labels are:self-mention. The tweet contains a health mention with a health self-report of the Twitter account owner, e.g., "However, I worked hard and ran for Tokyo Mayer Election Campaign in January through February, 2014, without publicizing the cancer."other-mention. The tweet contains a health mention of a health report about someone other than the account owner, e.g., "Designer with Parkinson’s couldn’t work then engineer invents bracelet + changes her world"awareness. The tweet contains the disease name, but does not mention a specific person, e.g., "A Month Before a Heart Attack, Your Body Will Warn You With These 8 Signals"non-health. The tweet contains the disease name, but the tweet topic is not about health. "Now I can have cancer on my wall for all to see <3"Source:Did You Really Just Have a Heart Attack? Towards Robust Detection of Personal Health Mentions in Social Media

🟒 Official Homepage: https://github.com/emory-irlab/PHM2017

🟒 Number of articles that used this dataset: 7

🟒 Dataset Loaders:
emory-irlab/PHM2017:
https://github.com/emory-irlab/PHM2017

🟒 Articles related to the dataset:
πŸ“ PHMD: An easy data access tool for prognosis and health management datasets

πŸ“ Did You Really Just Have a Heart Attack? Towards Robust Detection of Personal Health Mentions in Social Media

πŸ“ Incorporating Emotions into Health Mention Classification Task on Social Media

πŸ“ A Novel Approach to Train Diverse Types of Language Models for Health Mention Classification of Tweets

πŸ“ Neural Architecture Search For Fault Diagnosis

πŸ“ Improving Health Mentioning Classification of Tweets using Contrastive Adversarial Training

πŸ“ Multi-task Learning for Personal Health Mention Detection on Social Media

==================================
πŸ”΄ For more datasets resources:
βœ“ https://t.me/Datasets1
❀1
πŸ™πŸ’Έ 500$ FOR THE FIRST 500 WHO JOIN THE CHANNEL! πŸ™πŸ’Έ

Join our channel today for free! Tomorrow it will cost 500$!

https://t.me/+QHlfCJcO2lRjZWVl

You can join at this link! πŸ‘†πŸ‘‡

https://t.me/+QHlfCJcO2lRjZWVl
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

βœ… https://t.me/addlist/8_rRW2scgfRhOTc0

βœ… https://t.me/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❀3
==================================================
πŸ“ 1. BASIC DATASET INFO
----------------------------------------
Title: E. coli Resistance Dataset
Basic Description: Antibiotic resistance profiles in E. coli clinical isolates

πŸ“– 2. FULL DATASET DESCRIPTION
----------------------------------------
Full Description:
This dataset contains 195,000+ raw records of Escherichia coli clinical isolates and their antimicrobial susceptibility test results. The data was extracted from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC), a public repository funded by NIAID.
Each entry captures how a specific E. coli genome responds to a given antibiotic, along with phenotypic interpretation, lab methods, measurement values (e.g., MIC), and supporting publication links.

πŸ“₯ 3. API DOWNLOAD INFORMATION
----------------------------------------
API Download Link: https://www.kaggle.com/api/v1/datasets/download/valeriamaciel/e-coli-resistance-dataset
Dataset Size: Download dataset as zip (3 MB)

πŸ“Š 4. FILE COUNT
----------------------------------------
File count not found

πŸ“ˆ 5. VIEWS & DOWNLOADS
----------------------------------------
Views: 418
Downloads: 72

πŸ“š 6. RELATED NOTEBOOKS
----------------------------------------
1. Antibiotic Dataset
Upvotes: 38
URL: https://www.kaggle.com/datasets/kanchana1990/antibiotic-dataset
2. Malaria Dataset
Upvotes: 35
URL: https://www.kaggle.com/datasets/miracle9to9/files1
3. SARS-CoV-2 Genetics
Upvotes: 13
URL: https://www.kaggle.com/datasets/rtwillett/sarscov2-genetics
4. E.coli_Data_cleaning
Upvotes: 7
URL: https://www.kaggle.com/code/valeriamaciel/e-coli-data-cleaning
5. E.coli_data_analysis
Upvotes: 4
URL: https://www.kaggle.com/code/valeriamaciel/e-coli-data-analysis
❀2