π’ Name Of Dataset: AMASS
π’ Description Of Dataset:
AMASS is a large database of human motion unifying different optical marker-based motion capture datasets by representing them within a common framework and parameterization. AMASS is readily useful for animation, visualization, and generating training data for deep learning.
π’ Official Homepage: https://amass.is.tue.mpg.de/
π’ Number of articles that used this dataset: 354
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π VIBE: Video Inference for Human Body Pose and Shape Estimation
π MotionGPT: Human Motion as a Foreign Language
π MotionBERT: A Unified Perspective on Learning Human Motion Representations
π DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition
π AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
π EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
π Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling
π MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
π WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
π Deep motifs and motion signatures
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
AMASS is a large database of human motion unifying different optical marker-based motion capture datasets by representing them within a common framework and parameterization. AMASS is readily useful for animation, visualization, and generating training data for deep learning.
π’ Official Homepage: https://amass.is.tue.mpg.de/
π’ Number of articles that used this dataset: 354
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π VIBE: Video Inference for Human Body Pose and Shape Estimation
π MotionGPT: Human Motion as a Foreign Language
π MotionBERT: A Unified Perspective on Learning Human Motion Representations
π DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition
π AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
π EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
π Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling
π MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
π WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
π Deep motifs and motion signatures
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€6
π’ Name Of Dataset: Office-Home
π’ Description Of Dataset:
Office-Homeis a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art β artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart β collection of clipart images; Product β images of objects without a background and Real-World β images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.Source:Multi-component Image Translation for Deep Domain Generalization
π’ Official Homepage: https://www.hemanthdv.org/officeHomeDataset.html
π’ Number of articles that used this dataset: 1064
π’ Dataset Loaders:
activeloopai/Hub:
https://docs.activeloop.ai/datasets/office-home-dataset
π’ Articles related to the dataset:
π Domain Conditional Predictors for Domain Adaptation
π Transfer Learning with Dynamic Distribution Adaptation
π FIXED: Frustratingly Easy Domain Generalization with Mixup
π Visual Domain Adaptation with Manifold Embedded Distribution Alignment
π Easy Transfer Learning By Exploiting Intra-domain Structures
π Generalizing to Unseen Domains: A Survey on Domain Generalization
π Learning to Match Distributions for Domain Adaptation
π Unsupervised Domain Adaptation by Backpropagation
π Domain-Adversarial Training of Neural Networks
π A Review of Single-Source Deep Unsupervised Visual Domain Adaptation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
Office-Homeis a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art β artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart β collection of clipart images; Product β images of objects without a background and Real-World β images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.Source:Multi-component Image Translation for Deep Domain Generalization
π’ Official Homepage: https://www.hemanthdv.org/officeHomeDataset.html
π’ Number of articles that used this dataset: 1064
π’ Dataset Loaders:
activeloopai/Hub:
https://docs.activeloop.ai/datasets/office-home-dataset
π’ Articles related to the dataset:
π Domain Conditional Predictors for Domain Adaptation
π Transfer Learning with Dynamic Distribution Adaptation
π FIXED: Frustratingly Easy Domain Generalization with Mixup
π Visual Domain Adaptation with Manifold Embedded Distribution Alignment
π Easy Transfer Learning By Exploiting Intra-domain Structures
π Generalizing to Unseen Domains: A Survey on Domain Generalization
π Learning to Match Distributions for Domain Adaptation
π Unsupervised Domain Adaptation by Backpropagation
π Domain-Adversarial Training of Neural Networks
π A Review of Single-Source Deep Unsupervised Visual Domain Adaptation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€1
π’ Name Of Dataset: M3-VOS (M3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation)
π’ Description Of Dataset:
π‘ DescriptionA new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M3-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. We collected 205,181 masks, with an average track duration of 14.27s. M3-VOS covers 120+ categories of objects across 6 phases within 14 scenarios, encompassing 23 specific phase transitions.Venue:CVPR2025Repository:Tool π ,Pageπ Paper:arxiv.org/html/2412.13803v2Point of Contact:Jiaxin Li,Zixuan Chen
π’ Official Homepage: https://zixuan-chen.github.io/M-cube-VOS.github.io/
π’ Number of articles that used this dataset: 4
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π SAM 2: Segment Anything in Images and Videos
π XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
π Putting the Object Back into Video Object Segmentation
π M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
π‘ DescriptionA new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M3-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. We collected 205,181 masks, with an average track duration of 14.27s. M3-VOS covers 120+ categories of objects across 6 phases within 14 scenarios, encompassing 23 specific phase transitions.Venue:CVPR2025Repository:Tool π ,Pageπ Paper:arxiv.org/html/2412.13803v2Point of Contact:Jiaxin Li,Zixuan Chen
π’ Official Homepage: https://zixuan-chen.github.io/M-cube-VOS.github.io/
π’ Number of articles that used this dataset: 4
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π SAM 2: Segment Anything in Images and Videos
π XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
π Putting the Object Back into Video Object Segmentation
π M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€4
π’ Name Of Dataset: Office-Home
π’ Description Of Dataset:
Office-Homeis a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art β artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart β collection of clipart images; Product β images of objects without a background and Real-World β images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.Source:Multi-component Image Translation for Deep Domain Generalization
π’ Official Homepage: https://www.hemanthdv.org/officeHomeDataset.html
π’ Number of articles that used this dataset: 1064
π’ Dataset Loaders:
activeloopai/Hub:
https://docs.activeloop.ai/datasets/office-home-dataset
π’ Articles related to the dataset:
π Domain Conditional Predictors for Domain Adaptation
π Generalizing to Unseen Domains: A Survey on Domain Generalization
π Transfer Learning with Dynamic Distribution Adaptation
π Easy Transfer Learning By Exploiting Intra-domain Structures
π FIXED: Frustratingly Easy Domain Generalization with Mixup
π Visual Domain Adaptation with Manifold Embedded Distribution Alignment
π Learning to Match Distributions for Domain Adaptation
π Unsupervised Domain Adaptation by Backpropagation
π Domain-Adversarial Training of Neural Networks
π A Review of Single-Source Deep Unsupervised Visual Domain Adaptation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
Office-Homeis a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The four domains are: Art β artistic images in the form of sketches, paintings, ornamentation, etc.; Clipart β collection of clipart images; Product β images of objects without a background and Real-World β images of objects captured with a regular camera. It contains 15,500 images, with an average of around 70 images per class and a maximum of 99 images in a class.Source:Multi-component Image Translation for Deep Domain Generalization
π’ Official Homepage: https://www.hemanthdv.org/officeHomeDataset.html
π’ Number of articles that used this dataset: 1064
π’ Dataset Loaders:
activeloopai/Hub:
https://docs.activeloop.ai/datasets/office-home-dataset
π’ Articles related to the dataset:
π Domain Conditional Predictors for Domain Adaptation
π Generalizing to Unseen Domains: A Survey on Domain Generalization
π Transfer Learning with Dynamic Distribution Adaptation
π Easy Transfer Learning By Exploiting Intra-domain Structures
π FIXED: Frustratingly Easy Domain Generalization with Mixup
π Visual Domain Adaptation with Manifold Embedded Distribution Alignment
π Learning to Match Distributions for Domain Adaptation
π Unsupervised Domain Adaptation by Backpropagation
π Domain-Adversarial Training of Neural Networks
π A Review of Single-Source Deep Unsupervised Visual Domain Adaptation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€3
π’ Name Of Dataset: Moving MNIST
π’ Description Of Dataset:
TheMoving MNISTdataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64Γ64 pixels. The digits frequently intersect with each other and bounce off the edges of the frameSource:Mutual Suppression Network for Video Prediction using Disentangled Features
π’ Official Homepage: http://www.cs.toronto.edu/~nitish/unsupervised_video/
π’ Number of articles that used this dataset: 194
π’ Dataset Loaders:
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/moving_mnist
π’ Articles related to the dataset:
π Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
π What Makes for Good Views for Contrastive Learning?
π VideoGPT: Video Generation using VQ-VAE and Transformers
π Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
π PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs
π OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
π Eidetic 3D LSTM: A Model for Video Prediction and Beyond
π SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning
π MogaNet: Multi-order Gated Aggregation Network
π Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
TheMoving MNISTdataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64Γ64 pixels. The digits frequently intersect with each other and bounce off the edges of the frameSource:Mutual Suppression Network for Video Prediction using Disentangled Features
π’ Official Homepage: http://www.cs.toronto.edu/~nitish/unsupervised_video/
π’ Number of articles that used this dataset: 194
π’ Dataset Loaders:
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/moving_mnist
π’ Articles related to the dataset:
π Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
π What Makes for Good Views for Contrastive Learning?
π VideoGPT: Video Generation using VQ-VAE and Transformers
π Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
π PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs
π OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
π Eidetic 3D LSTM: A Model for Video Prediction and Beyond
π SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning
π MogaNet: Multi-order Gated Aggregation Network
π Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€7
π’ Name Of Dataset: Cityscapes
π’ Description Of Dataset:
Cityscapesis a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.Source:A Review on Deep Learning Techniques Applied to Semantic Segmentation
π’ Official Homepage: https://www.cityscapes-dataset.com/dataset-overview/
π’ Number of articles that used this dataset: 3679
π’ Dataset Loaders:
facebookresearch/detectron2:
https://detectron2.readthedocs.io/en/latest/tutorials/builtin_datasets.html#expected-dataset-structure-for-cityscapes
open-mmlab/mmdetection:
https://github.com/open-mmlab/mmdetection/blob/master/docs/1_exist_data_model.md
pytorch/vision:
https://pytorch.org/vision/stable/datasets.html#torchvision.datasets.Cityscapes
voxel51/fiftyone:
https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html#cityscapes
open-mmlab/mmsegmentation:
https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md
Kaggle/kaggle-api:
https://www.kaggle.com/datasets/sakshaymahna/cityscapes-depth-and-segmentation
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/cityscapes
facebookresearch/MaskFormer:
https://github.com/facebookresearch/MaskFormer
π’ Articles related to the dataset:
π Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
π DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
π Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
π Searching for MobileNetV3
π Rethinking Atrous Convolution for Semantic Image Segmentation
π Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
π Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
π Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
π MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context
π Mask R-CNN
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
Cityscapesis a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.Source:A Review on Deep Learning Techniques Applied to Semantic Segmentation
π’ Official Homepage: https://www.cityscapes-dataset.com/dataset-overview/
π’ Number of articles that used this dataset: 3679
π’ Dataset Loaders:
facebookresearch/detectron2:
https://detectron2.readthedocs.io/en/latest/tutorials/builtin_datasets.html#expected-dataset-structure-for-cityscapes
open-mmlab/mmdetection:
https://github.com/open-mmlab/mmdetection/blob/master/docs/1_exist_data_model.md
pytorch/vision:
https://pytorch.org/vision/stable/datasets.html#torchvision.datasets.Cityscapes
voxel51/fiftyone:
https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html#cityscapes
open-mmlab/mmsegmentation:
https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md
Kaggle/kaggle-api:
https://www.kaggle.com/datasets/sakshaymahna/cityscapes-depth-and-segmentation
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/cityscapes
facebookresearch/MaskFormer:
https://github.com/facebookresearch/MaskFormer
π’ Articles related to the dataset:
π Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
π DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
π Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
π Searching for MobileNetV3
π Rethinking Atrous Convolution for Semantic Image Segmentation
π Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
π Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
π Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
π MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context
π Mask R-CNN
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€6
π’ Name Of Dataset: Visual Question Answering v2.0 (VQA v2.0)
π’ Description Of Dataset:
Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of theVQAdataset.265,016 images (COCO and abstract scenes)At least 3 questions (5.4 questions on average) per image10 ground truth answers per question3 plausible (but likely incorrect) answers per questionAutomatic evaluation metricThefirst version of the datasetwas released in October 2015.
π’ Official Homepage: https://visualqa.org/
π’ Number of articles that used this dataset: 365
π’ Dataset Loaders:
facebookresearch/ParlAI:
https://parl.ai/docs/tasks.html#vqav2
allenai/allennlp-models:
https://docs.allennlp.org/models/main/models/vision/dataset_readers/vqav2/
π’ Articles related to the dataset:
π Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
π Language Models are General-Purpose Interfaces
π VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
π MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
π CoCa: Contrastive Captioners are Image-Text Foundation Models
π BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
π Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
π Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
π InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
π MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of theVQAdataset.265,016 images (COCO and abstract scenes)At least 3 questions (5.4 questions on average) per image10 ground truth answers per question3 plausible (but likely incorrect) answers per questionAutomatic evaluation metricThefirst version of the datasetwas released in October 2015.
π’ Official Homepage: https://visualqa.org/
π’ Number of articles that used this dataset: 365
π’ Dataset Loaders:
facebookresearch/ParlAI:
https://parl.ai/docs/tasks.html#vqav2
allenai/allennlp-models:
https://docs.allennlp.org/models/main/models/vision/dataset_readers/vqav2/
π’ Articles related to the dataset:
π Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
π Language Models are General-Purpose Interfaces
π VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
π MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
π CoCa: Contrastive Captioners are Image-Text Foundation Models
π BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
π Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
π Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
π InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
π MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€6
π’ Name Of Dataset: HowTo100M
π’ Description Of Dataset:
HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of:136M video clips with captions sourced from 1.2M Youtube videos (15 years of video)23k activities from domains such as cooking, hand crafting, personal care, gardening or fitnessEach video is associated with a narration available as subtitles automatically downloaded from Youtube.Source:HowTo100M
π’ Official Homepage: https://www.di.ens.fr/willow/research/howto100m/
π’ Number of articles that used this dataset: 286
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
π VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
π VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
π Self-Supervised MultiModal Versatile Networks
π Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
π UnLoc: A Unified Framework for Video Localization Tasks
π Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
π Harvest Video Foundation Models via Efficient Post-Pretraining
π InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
π InternVideo: General Video Foundation Models via Generative and Discriminative Learning
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of:136M video clips with captions sourced from 1.2M Youtube videos (15 years of video)23k activities from domains such as cooking, hand crafting, personal care, gardening or fitnessEach video is associated with a narration available as subtitles automatically downloaded from Youtube.Source:HowTo100M
π’ Official Homepage: https://www.di.ens.fr/willow/research/howto100m/
π’ Number of articles that used this dataset: 286
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
π VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
π VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
π Self-Supervised MultiModal Versatile Networks
π Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
π UnLoc: A Unified Framework for Video Localization Tasks
π Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
π Harvest Video Foundation Models via Efficient Post-Pretraining
π InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
π InternVideo: General Video Foundation Models via Generative and Discriminative Learning
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€4
π’ Name Of Dataset: CoQA (Conversational Question Answering Challenge)
π’ Description Of Dataset:
CoQAis a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation.CoQA contains 127,000+ questions with answers collected from 8000+ conversations. Each conversation is collected by pairing two crowdworkers to chat about a passage in the form of questions and answers. The unique features of CoQA include 1) the questions are conversational; 2) the answers can be free-form text; 3) each answer also comes with an evidence subsequence highlighted in the passage; and 4) the passages are collected from seven diverse domains. CoQA has a lot of challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning.Source:https://stanfordnlp.github.io/coqa/
π’ Official Homepage: https://stanfordnlp.github.io/coqa/
π’ Number of articles that used this dataset: 277
π’ Dataset Loaders:
huggingface/datasets (coqa):
https://huggingface.co/datasets/coqa
huggingface/datasets (pcmr):
https://huggingface.co/datasets/Ruohao/pcmr
huggingface/datasets (coqa):
https://huggingface.co/datasets/stanfordnlp/coqa
facebookresearch/ParlAI:
https://parl.ai/docs/tasks.html#conversational-question-answering-challenge
activeloopai/Hub:
https://docs.activeloop.ai/datasets/coqa-dataset
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/coqa
π’ Articles related to the dataset:
π MVP: Multi-task Supervised Pre-training for Natural Language Generation
π BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
π Language Models are Unsupervised Multitask Learners
π Unified Language Model Pre-training for Natural Language Understanding and Generation
π Language Models are Few-Shot Learners
π UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
π Pre-Training with Whole Word Masking for Chinese BERT
π StarCoder: may the source be with you!
π ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
π Natural Questions: a Benchmark for Question Answering Research
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
CoQAis a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation.CoQA contains 127,000+ questions with answers collected from 8000+ conversations. Each conversation is collected by pairing two crowdworkers to chat about a passage in the form of questions and answers. The unique features of CoQA include 1) the questions are conversational; 2) the answers can be free-form text; 3) each answer also comes with an evidence subsequence highlighted in the passage; and 4) the passages are collected from seven diverse domains. CoQA has a lot of challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning.Source:https://stanfordnlp.github.io/coqa/
π’ Official Homepage: https://stanfordnlp.github.io/coqa/
π’ Number of articles that used this dataset: 277
π’ Dataset Loaders:
huggingface/datasets (coqa):
https://huggingface.co/datasets/coqa
huggingface/datasets (pcmr):
https://huggingface.co/datasets/Ruohao/pcmr
huggingface/datasets (coqa):
https://huggingface.co/datasets/stanfordnlp/coqa
facebookresearch/ParlAI:
https://parl.ai/docs/tasks.html#conversational-question-answering-challenge
activeloopai/Hub:
https://docs.activeloop.ai/datasets/coqa-dataset
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/coqa
π’ Articles related to the dataset:
π MVP: Multi-task Supervised Pre-training for Natural Language Generation
π BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
π Language Models are Unsupervised Multitask Learners
π Unified Language Model Pre-training for Natural Language Understanding and Generation
π Language Models are Few-Shot Learners
π UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
π Pre-Training with Whole Word Masking for Chinese BERT
π StarCoder: may the source be with you!
π ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
π Natural Questions: a Benchmark for Question Answering Research
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€2
π’ Name Of Dataset: AISHELL-1
π’ Description Of Dataset:
AISHELL-1 is a corpus for speech recognition research and building speech recognition systems for Mandarin.Source:AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
π’ Official Homepage: http://www.openslr.org/33/
π’ Number of articles that used this dataset: 197
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
π Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
π AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
π PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
π Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
π FunASR: A Fundamental End-to-End Speech Recognition Toolkit
π BAT: Boundary aware transducer for memory-efficient and low-latency ASR
π SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
π Extremely Low Footprint End-to-End ASR System for Smart Device
π Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
AISHELL-1 is a corpus for speech recognition research and building speech recognition systems for Mandarin.Source:AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
π’ Official Homepage: http://www.openslr.org/33/
π’ Number of articles that used this dataset: 197
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
π Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
π AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
π PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
π Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
π FunASR: A Fundamental End-to-End Speech Recognition Toolkit
π BAT: Boundary aware transducer for memory-efficient and low-latency ASR
π SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
π Extremely Low Footprint End-to-End ASR System for Smart Device
π Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€3
βοΈ WITH JAY MO YOU WILL START EARNING MONEY
Jay will leave a link with free entry to a channel that draws money every day. Each subscriber gets between $100 and $5,000.
ππ»CLICK HERE TO JOIN THE CHANNEL ππ»
ππ»CLICK HERE TO JOIN THE CHANNEL!ππ»
ππ»CLICK HERE TO JOIN THE CHANNEL ππ»
π¨FREE FOR THE FIRST 500 SUBSCRIBERS ONLY!
Jay will leave a link with free entry to a channel that draws money every day. Each subscriber gets between $100 and $5,000.
ππ»CLICK HERE TO JOIN THE CHANNEL ππ»
ππ»CLICK HERE TO JOIN THE CHANNEL!ππ»
ππ»CLICK HERE TO JOIN THE CHANNEL ππ»
π¨FREE FOR THE FIRST 500 SUBSCRIBERS ONLY!
β€3
Indian Diabetic Retinopathy Image Dataset (IDRiD) dataset consists of typical diabetic retinopathy lesions and normal retinal structures annotated at a pixel level. This dataset also provides information on the disease severity of diabetic retinopathy and diabetic macular edema for each image. This dataset is perfect for the development and evaluation of image analysis algorithms for early detection of diabetic retinopathy.
milan01234/MachineLearning:
https://github.com/milan01234/MachineLearning
==================================
β https://t.me/Datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
β€6
π’ Name Of Dataset: DeepMind Control Suite
π’ Description Of Dataset:
TheDeepMind Control Suite(DMCS) is a set of simulated continuous control environments with a standardized structure and interpretable rewards. The tasks are written and powered by the MuJoCo physics engine, making them easy to identify. Control Suite tasks include Pendulum, Acrobot, Cart-pole, Cart-k-pole, Ball in cup, Point-mass, Reacher, Finger, Hooper, Fish, Cheetah, Walker, Manipulator, Manipulator extra, Stacker, Swimmer, Humanoid, Humanoid_CMU and LQR.Source:Unsupervised Learning of Object Structure and Dynamics from Videos
π’ Official Homepage: https://github.com/deepmind/dm_control
π’ Number of articles that used this dataset: 360
π’ Dataset Loaders:
deepmind/dm_control:
https://github.com/deepmind/dm_control
π’ Articles related to the dataset:
π State Entropy Maximization with Random Encoders for Efficient Exploration
π Critic Regularized Regression
π The Distracting Control Suite -- A Challenging Benchmark for Reinforcement Learning from Pixels
π TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
π Unsupervised Learning of Object Structure and Dynamics from Videos
π Deep Reinforcement Learning
π dm_control: Software and Tasks for Continuous Control
π DeepMind Control Suite
π CoBERL: Contrastive BERT for Reinforcement Learning
π Acme: A Research Framework for Distributed Reinforcement Learning
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
TheDeepMind Control Suite(DMCS) is a set of simulated continuous control environments with a standardized structure and interpretable rewards. The tasks are written and powered by the MuJoCo physics engine, making them easy to identify. Control Suite tasks include Pendulum, Acrobot, Cart-pole, Cart-k-pole, Ball in cup, Point-mass, Reacher, Finger, Hooper, Fish, Cheetah, Walker, Manipulator, Manipulator extra, Stacker, Swimmer, Humanoid, Humanoid_CMU and LQR.Source:Unsupervised Learning of Object Structure and Dynamics from Videos
π’ Official Homepage: https://github.com/deepmind/dm_control
π’ Number of articles that used this dataset: 360
π’ Dataset Loaders:
deepmind/dm_control:
https://github.com/deepmind/dm_control
π’ Articles related to the dataset:
π State Entropy Maximization with Random Encoders for Efficient Exploration
π Critic Regularized Regression
π The Distracting Control Suite -- A Challenging Benchmark for Reinforcement Learning from Pixels
π TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
π Unsupervised Learning of Object Structure and Dynamics from Videos
π Deep Reinforcement Learning
π dm_control: Software and Tasks for Continuous Control
π DeepMind Control Suite
π CoBERL: Contrastive BERT for Reinforcement Learning
π Acme: A Research Framework for Distributed Reinforcement Learning
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€3
π’ Name Of Dataset: VideoSet
π’ Description Of Dataset:
VideoSetis a large-scale compressed video quality dataset based on just-noticeable-difference (JND) measurement.The dataset consists of 220 5-second sequences in four resolutions (i.e., 1920Γ1080, 1280Γ720, 960Γ540 and 640Γ360). Each of the 880 video clips is encoded using the H.264 codec with QP=1,β―,51 and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)".
π’ Official Homepage: https://ieee-dataport.org/documents/videoset
π’ Number of articles that used this dataset: 12
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
π VideoSet: A Large-Scale Compressed Video Quality Dataset Based on JND Measurement
π Full RGB Just Noticeable Difference (JND) Modelling
π A user model for JND-based video quality assessment: theory and applications
π Prediction of Satisfied User Ratio for Compressed Video
π Analysis and prediction of JND-based video quality model
π Subjective Image Quality Assessment with Boosted Triplet Comparisons
π Subjective and Objective Analysis of Streamed Gaming Videos
π A Framework to Map VMAF with the Probability of Just Noticeable Difference between Video Encoding Recipes
π On the benefit of parameter-driven approaches for the modeling and the prediction of Satisfied User Ratio for compressed video
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
VideoSetis a large-scale compressed video quality dataset based on just-noticeable-difference (JND) measurement.The dataset consists of 220 5-second sequences in four resolutions (i.e., 1920Γ1080, 1280Γ720, 960Γ540 and 640Γ360). Each of the 880 video clips is encoded using the H.264 codec with QP=1,β―,51 and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)".
π’ Official Homepage: https://ieee-dataport.org/documents/videoset
π’ Number of articles that used this dataset: 12
π’ Dataset Loaders:
Not found
π’ Articles related to the dataset:
π Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
π VideoSet: A Large-Scale Compressed Video Quality Dataset Based on JND Measurement
π Full RGB Just Noticeable Difference (JND) Modelling
π A user model for JND-based video quality assessment: theory and applications
π Prediction of Satisfied User Ratio for Compressed Video
π Analysis and prediction of JND-based video quality model
π Subjective Image Quality Assessment with Boosted Triplet Comparisons
π Subjective and Objective Analysis of Streamed Gaming Videos
π A Framework to Map VMAF with the Probability of Just Noticeable Difference between Video Encoding Recipes
π On the benefit of parameter-driven approaches for the modeling and the prediction of Satisfied User Ratio for compressed video
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€7
π’ Name Of Dataset: iNaturalist
π’ Description Of Dataset:
The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to 13 super-categories including Plantae (Plant), Insecta (Insect), Aves (Bird), Mammalia (Mammal), and so on. The iNat dataset is highly imbalanced with dramatically different number of images per category. For example, the largest super-category βPlantae (Plant)β has 196,613 images from 2,101 categories; whereas the smallest super-category βProtozoaβ only has 381 images from 4 categories.Source:Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
π’ Official Homepage: https://github.com/visipedia/inat_comp/tree/master/2017
π’ Number of articles that used this dataset: 600
π’ Dataset Loaders:
pytorch/vision:
https://pytorch.org/vision/stable/generated/torchvision.datasets.INaturalist.html
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/i_naturalist2017
visipedia/inat_comp:
https://github.com/visipedia/inat_comp
π’ Articles related to the dataset:
π The iNaturalist Species Classification and Detection Dataset
π SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
π A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems
π Class-Balanced Distillation for Long-Tailed Visual Recognition
π Ranking Neural Checkpoints
π DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
π Going deeper with Image Transformers
π ResNet strikes back: An improved training procedure in timm
π LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
π On Data Scaling in Masked Image Modeling
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to 13 super-categories including Plantae (Plant), Insecta (Insect), Aves (Bird), Mammalia (Mammal), and so on. The iNat dataset is highly imbalanced with dramatically different number of images per category. For example, the largest super-category βPlantae (Plant)β has 196,613 images from 2,101 categories; whereas the smallest super-category βProtozoaβ only has 381 images from 4 categories.Source:Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
π’ Official Homepage: https://github.com/visipedia/inat_comp/tree/master/2017
π’ Number of articles that used this dataset: 600
π’ Dataset Loaders:
pytorch/vision:
https://pytorch.org/vision/stable/generated/torchvision.datasets.INaturalist.html
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/i_naturalist2017
visipedia/inat_comp:
https://github.com/visipedia/inat_comp
π’ Articles related to the dataset:
π The iNaturalist Species Classification and Detection Dataset
π SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
π A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems
π Class-Balanced Distillation for Long-Tailed Visual Recognition
π Ranking Neural Checkpoints
π DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
π Going deeper with Image Transformers
π ResNet strikes back: An improved training procedure in timm
π LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
π On Data Scaling in Masked Image Modeling
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€3
π’ Name Of Dataset: Common Voice
π’ Description Of Dataset:
Common Voiceis an audio dataset that consists of a unique MP3 and corresponding text file. There are 9,283 recorded hours in the dataset. The dataset also includes demographic metadata like age, sex, and accent. The dataset consists of 7,335 validated hours in 60 languages.
π’ Official Homepage: https://commonvoice.mozilla.org
π’ Number of articles that used this dataset: 438
π’ Dataset Loaders:
huggingface/datasets (common_voice_21_0):
https://huggingface.co/datasets/2Jyq/common_voice_21_0
huggingface/datasets (common_voice_16_0):
https://huggingface.co/datasets/eldad-akhaumere/common_voice_16_0
huggingface/datasets (common_voice_16_0_):
https://huggingface.co/datasets/eldad-akhaumere/common_voice_16_0_
huggingface/datasets (c-v):
https://huggingface.co/datasets/xi0v/c-v
huggingface/datasets (common_voice):
https://huggingface.co/datasets/common_voice
huggingface/datasets (common_voice_5_1):
https://huggingface.co/datasets/mozilla-foundation/common_voice_5_1
huggingface/datasets (common_voice_7_0):
https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0
huggingface/datasets (common_voice_7_0_test):
https://huggingface.co/datasets/anton-l/common_voice_7_0_test
huggingface/datasets (common_voice_7_0_test1):
https://huggingface.co/datasets/anton-l/common_voice_7_0_test1
huggingface/datasets (common_voice_1_0):
https://huggingface.co/datasets/anton-l/common_voice_1_0
π’ Articles related to the dataset:
π Unsupervised Cross-lingual Representation Learning for Speech Recognition
π Robust Speech Recognition via Large-Scale Weak Supervision
π YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
π Scaling Speech Technology to 1,000+ Languages
π Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
π Unsupervised Speech Recognition
π Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
π Towards End-to-end Unsupervised Speech Recognition
π Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
Common Voiceis an audio dataset that consists of a unique MP3 and corresponding text file. There are 9,283 recorded hours in the dataset. The dataset also includes demographic metadata like age, sex, and accent. The dataset consists of 7,335 validated hours in 60 languages.
π’ Official Homepage: https://commonvoice.mozilla.org
π’ Number of articles that used this dataset: 438
π’ Dataset Loaders:
huggingface/datasets (common_voice_21_0):
https://huggingface.co/datasets/2Jyq/common_voice_21_0
huggingface/datasets (common_voice_16_0):
https://huggingface.co/datasets/eldad-akhaumere/common_voice_16_0
huggingface/datasets (common_voice_16_0_):
https://huggingface.co/datasets/eldad-akhaumere/common_voice_16_0_
huggingface/datasets (c-v):
https://huggingface.co/datasets/xi0v/c-v
huggingface/datasets (common_voice):
https://huggingface.co/datasets/common_voice
huggingface/datasets (common_voice_5_1):
https://huggingface.co/datasets/mozilla-foundation/common_voice_5_1
huggingface/datasets (common_voice_7_0):
https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0
huggingface/datasets (common_voice_7_0_test):
https://huggingface.co/datasets/anton-l/common_voice_7_0_test
huggingface/datasets (common_voice_7_0_test1):
https://huggingface.co/datasets/anton-l/common_voice_7_0_test1
huggingface/datasets (common_voice_1_0):
https://huggingface.co/datasets/anton-l/common_voice_1_0
π’ Articles related to the dataset:
π Unsupervised Cross-lingual Representation Learning for Speech Recognition
π Robust Speech Recognition via Large-Scale Weak Supervision
π YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
π Scaling Speech Technology to 1,000+ Languages
π Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
π Unsupervised Speech Recognition
π Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
π Towards End-to-end Unsupervised Speech Recognition
π Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€7
π’ Name Of Dataset: SuperGLUE
π’ Description Of Dataset:
SuperGLUEis a benchmark dataset designed to pose a more rigorous test of language understanding than GLUE. SuperGLUE has the same high-level motivation as GLUE: to provide a simple, hard-to-game measure of progress toward general-purpose language understanding technologies for English. SuperGLUE follows the basic design of GLUE: It consists of a public leaderboard built around eight language understanding tasks, drawing on existing data, accompanied by a single-number performance metric, and an analysis toolkit. However, it improves upon GLUE in several ways:More challenging tasks: SuperGLUE retains the two hardest tasks in GLUE. The remaining tasks were identified from those submitted to an open call for task proposals and were selected based on difficulty for current NLP approaches.More diverse task formats: The task formats in GLUE are limited to sentence- and sentence-pair classification. The authors expand the set of task formats in SuperGLUE to include coreference resolution and question answering (QA).Comprehensive human baselines: the authors include human performance estimates for all benchmark tasks, which verify that substantial headroom exists between a strong BERT-based baseline and human performance.Improved code support: SuperGLUE is distributed with a new, modular toolkit for work on pretraining, multi-task learning, and transfer learning in NLP, built around standard tools including PyTorch (Paszke et al., 2017) and AllenNLP (Gardner et al., 2017).Refined usage rules: The conditions for inclusion on the SuperGLUE leaderboard were revamped to ensure fair competition, an informative leaderboard, and full credit assignment to data and task creators.
π’ Official Homepage: https://super.gluebenchmark.com/
π’ Number of articles that used this dataset: 418
π’ Dataset Loaders:
huggingface/datasets (superglue):
https://huggingface.co/datasets/Hyukkyu/superglue
huggingface/datasets (super_glue):
https://huggingface.co/datasets/super_glue
huggingface/datasets (test_data):
https://huggingface.co/datasets/zzzzhhh/test_data
huggingface/datasets (super_glue):
https://huggingface.co/datasets/aps/super_glue
huggingface/datasets (test):
https://huggingface.co/datasets/ThierryZhou/test
huggingface/datasets (ceshi0119):
https://huggingface.co/datasets/Xieyiyiyi/ceshi0119
facebookresearch/ParlAI:
https://parl.ai/docs/tasks.html#superglue
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/super_glue
π’ Articles related to the dataset:
π Leveraging redundancy in attention with Reuse Transformers
π Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
π GLU Variants Improve Transformer
π Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
π Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
π UL2: Unifying Language Learning Paradigms
π Few-shot Learning with Multilingual Language Models
π Kosmos-2: Grounding Multimodal Large Language Models to the World
π Language Models are Few-Shot Learners
π ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
SuperGLUEis a benchmark dataset designed to pose a more rigorous test of language understanding than GLUE. SuperGLUE has the same high-level motivation as GLUE: to provide a simple, hard-to-game measure of progress toward general-purpose language understanding technologies for English. SuperGLUE follows the basic design of GLUE: It consists of a public leaderboard built around eight language understanding tasks, drawing on existing data, accompanied by a single-number performance metric, and an analysis toolkit. However, it improves upon GLUE in several ways:More challenging tasks: SuperGLUE retains the two hardest tasks in GLUE. The remaining tasks were identified from those submitted to an open call for task proposals and were selected based on difficulty for current NLP approaches.More diverse task formats: The task formats in GLUE are limited to sentence- and sentence-pair classification. The authors expand the set of task formats in SuperGLUE to include coreference resolution and question answering (QA).Comprehensive human baselines: the authors include human performance estimates for all benchmark tasks, which verify that substantial headroom exists between a strong BERT-based baseline and human performance.Improved code support: SuperGLUE is distributed with a new, modular toolkit for work on pretraining, multi-task learning, and transfer learning in NLP, built around standard tools including PyTorch (Paszke et al., 2017) and AllenNLP (Gardner et al., 2017).Refined usage rules: The conditions for inclusion on the SuperGLUE leaderboard were revamped to ensure fair competition, an informative leaderboard, and full credit assignment to data and task creators.
π’ Official Homepage: https://super.gluebenchmark.com/
π’ Number of articles that used this dataset: 418
π’ Dataset Loaders:
huggingface/datasets (superglue):
https://huggingface.co/datasets/Hyukkyu/superglue
huggingface/datasets (super_glue):
https://huggingface.co/datasets/super_glue
huggingface/datasets (test_data):
https://huggingface.co/datasets/zzzzhhh/test_data
huggingface/datasets (super_glue):
https://huggingface.co/datasets/aps/super_glue
huggingface/datasets (test):
https://huggingface.co/datasets/ThierryZhou/test
huggingface/datasets (ceshi0119):
https://huggingface.co/datasets/Xieyiyiyi/ceshi0119
facebookresearch/ParlAI:
https://parl.ai/docs/tasks.html#superglue
tensorflow/datasets:
https://www.tensorflow.org/datasets/catalog/super_glue
π’ Articles related to the dataset:
π Leveraging redundancy in attention with Reuse Transformers
π Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
π GLU Variants Improve Transformer
π Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
π Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
π UL2: Unifying Language Learning Paradigms
π Few-shot Learning with Multilingual Language Models
π Kosmos-2: Grounding Multimodal Large Language Models to the World
π Language Models are Few-Shot Learners
π ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€7
Intent | AI-Enhanced Telegram
π Supports real-time translation in 86 languages
π¬ Simply swipe up during chat to let AI automatically generate contextual replies
π Instant AI enhanced voice-to-text conversion
π§ Built-in mainstream models including GPT-4o, Claude 3.7, Gemini 2, Deepseek, etc., activated with one click
π Currently offering generous free AI credits
π± Supports Android & iOS systems
π Website | π¬ Download
π Supports real-time translation in 86 languages
π¬ Simply swipe up during chat to let AI automatically generate contextual replies
π Instant AI enhanced voice-to-text conversion
π§ Built-in mainstream models including GPT-4o, Claude 3.7, Gemini 2, Deepseek, etc., activated with one click
π Currently offering generous free AI credits
π± Supports Android & iOS systems
π Website | π¬ Download
intentchat.app
Lingogram: Real-time Automatic Translation
Lingogram, your Multilingual Telegram Messenger. AI-powered assistant lets you type in your own language and effortlessly connect with people worldwide.
β€2
Forwarded from Machine Learning with Python
π FREE IT Study Kits for 2025 β Grab Yours Now!
Just found these zero-cost resources from SPOTOπ
Perfect if you're prepping for #Cisco, #AWS, #PMP, #AI, #Python, #Excel, or #Cybersecurity!
β 100% Free
β No signup traps
β Instantly downloadable
π IT Certs E-book: https://bit.ly/4fJSoLP
βοΈ Cloud & AI Kits: https://bit.ly/3F3lc5B
π Cybersecurity, Python & Excel: https://bit.ly/4mFrA4g
π§ Skill Test (Free!): https://bit.ly/3PoKH39
Tag a friend & level up together πͺ
π Join the IT Study Group: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
π² 1-on-1 Exam Help: https://wa.link/k0vy3x
πLast 24 HOURS to grab Mid-Year Mega Sale pricesοΌDonβt miss Lucky Drawπ
https://bit.ly/43VgcbT
Just found these zero-cost resources from SPOTOπ
Perfect if you're prepping for #Cisco, #AWS, #PMP, #AI, #Python, #Excel, or #Cybersecurity!
β 100% Free
β No signup traps
β Instantly downloadable
π IT Certs E-book: https://bit.ly/4fJSoLP
βοΈ Cloud & AI Kits: https://bit.ly/3F3lc5B
π Cybersecurity, Python & Excel: https://bit.ly/4mFrA4g
π§ Skill Test (Free!): https://bit.ly/3PoKH39
Tag a friend & level up together πͺ
π Join the IT Study Group: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
π² 1-on-1 Exam Help: https://wa.link/k0vy3x
πLast 24 HOURS to grab Mid-Year Mega Sale pricesοΌDonβt miss Lucky Drawπ
https://bit.ly/43VgcbT
β€2
π’ Name Of Dataset: ScanNet
π’ Description Of Dataset:
ScanNetis an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled voxels rather than points or objects. Up to now, ScanNet v2, the newest version of ScanNet, has collected 1513 annotated scans with an approximate 90% surface coverage. In the semantic segmentation task, this dataset is marked in 20 classes of annotated 3D voxelized objects.Source:A Review of Point Cloud Semantic Segmentation
π’ Official Homepage: http://www.scan-net.org/
π’ Number of articles that used this dataset: 1574
π’ Dataset Loaders:
Pointcept/Pointcept:
https://github.com/Pointcept/Pointcept
ScanNet/ScanNet:
http://www.scan-net.org/
π’ Articles related to the dataset:
π Mask R-CNN
π ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
π NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
π ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
π FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
π PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
π Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research
π PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
π SuperGlue: Learning Feature Matching with Graph Neural Networks
π MIMIC-IT: Multi-Modal In-Context Instruction Tuning
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
π’ Description Of Dataset:
ScanNetis an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled voxels rather than points or objects. Up to now, ScanNet v2, the newest version of ScanNet, has collected 1513 annotated scans with an approximate 90% surface coverage. In the semantic segmentation task, this dataset is marked in 20 classes of annotated 3D voxelized objects.Source:A Review of Point Cloud Semantic Segmentation
π’ Official Homepage: http://www.scan-net.org/
π’ Number of articles that used this dataset: 1574
π’ Dataset Loaders:
Pointcept/Pointcept:
https://github.com/Pointcept/Pointcept
ScanNet/ScanNet:
http://www.scan-net.org/
π’ Articles related to the dataset:
π Mask R-CNN
π ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
π NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
π ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
π FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
π PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
π Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research
π PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
π SuperGlue: Learning Feature Matching with Graph Neural Networks
π MIMIC-IT: Multi-Modal In-Context Instruction Tuning
==================================
π΄ For more datasets resources:
β https://t.me/Datasets1
β€4
