ML Research Hub
32.9K subscribers
4.37K photos
269 videos
23 files
4.73K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Multivariate Probabilistic Time Series Forecasting with Informer

Efficient transformer-based model for LSTF.

Method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.

🤗Hugging face:
https://huggingface.co/blog/informer

Paper:
https://huggingface.co/docs/transformers/main/en/model_doc/informer

⭐️ Colab:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multivariate_informer.ipynb

💨 Dataset:
https://huggingface.co/docs/datasets/v2.7.0/en/package_reference/main_classes#datasets.Dataset.set_transform

https://t.me/DataScienceT
❤‍🔥2👍1
DigiData: Training and Evaluating General-Purpose Mobile Control Agents

📝 Summary:
DigiData provides a diverse, high-quality dataset for training mobile control agents with complex goals from app feature exploration. DigiData-Bench offers dynamic AI-powered evaluation protocols, improving agent assessment beyond common metrics.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07413
• PDF: https://arxiv.org/pdf/2511.07413
• Github: https://facebookresearch.github.io/DigiData

==================================

For more data science resources:
https://t.me/DataScienceT

#MobileAgents #ArtificialIntelligence #MachineLearning #Datasets #AgentTraining
1
Grounding Computer Use Agents on Human Demonstrations

📝 Summary:
GroundCUA is a large desktop grounding dataset built from expert human demonstrations. It enables GroundNext models to achieve state-of-the-art performance in mapping instructions to UI elements with less training data and strong agentic capabilities.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07332
• PDF: https://arxiv.org/pdf/2511.07332
• Project Page: https://groundcua.github.io/
• Github: https://groundcua.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #Agents #HCI #Datasets #HumanDemonstrations
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

📝 Summary:
FiNERweb is a new pipeline that scales multilingual Named Entity Recognition dataset creation to 91 languages using LLMs. It produces 225k high-quality passages, enabling models to achieve comparable or improved zero-shot performance with 19x less data.

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13884
• PDF: https://arxiv.org/pdf/2512.13884
• Github: https://github.com/whoisjones/FiNERweb

==================================

For more data science resources:
https://t.me/DataScienceT

#NER #NLP #LLMs #MultilingualAI #Datasets
1
ModelTables: A Corpus of Tables about Models

📝 Summary:
ModelTables is a new benchmark corpus of 90K structured performance and configuration tables about AI models, linking them to their context. Its evaluation for table search reveals a clear need for improved methods in understanding structured model knowledge.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16106
• PDF: https://arxiv.org/pdf/2512.16106
• Github: https://github.com/RJMillerLab/ModelTables

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #Datasets #MachineLearning #StructuredData #TableSearch
1