ML Research Hub
32.8K subscribers
4.15K photos
251 videos
23 files
4.49K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🌟 MiraData: Large, long-duration video dataset with structured annotations.

When training generative models, the training dataset plays an important role in the quality of reference of ready-made models.
One of the good sources can be MiraData from Tencent - a ready-made dataset with a total video duration of 16 thousand hours, designed for training models for generating text in videos. It includes long videos (average 72.1 seconds) with high motion intensity and detailed structured annotations (average 318 words per video).

To assess the quality of the dataset, a system of MiraBench benchmarks was even specially created, consisting of 17 metrics that evaluate temporal consistency, movement in the frame, video quality, and other parameters. According to their results, MiroData outperforms other well-known datasets available in open sources, which mainly consist of short videos with floating quality and short descriptions.

🟡 Project page
🟡 Arxiv
🤗 Hugging Face
🖥 GitHub

#Text2Video #Dataset #ML

https://t.me/DataScienceT ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21