Data Science | Machine Learning with Python for Researchers
32.1K subscribers
2.56K photos
102 videos
22 files
2.81K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title: The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13996
• PDF: https://arxiv.org/pdf/2510.13996

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: RealDPO: Real or Not Real, that is the Preference

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14955
• PDF: https://arxiv.org/pdf/2510.14955
• Project Page: https://vchitect.github.io/RealDPO-Project/
• Github: https://github.com/Vchitect/RealDPO

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14949
• PDF: https://arxiv.org/pdf/2510.14949
• Project Page: https://dialectgen.github.io/
• Github: https://github.com/DialectGen/DialectGen

🔹 Datasets citing this paper:
https://huggingface.co/datasets/uclanlp/DialectGen

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14902
• PDF: https://arxiv.org/pdf/2510.14902
• Project Page: https://vla-2.github.io

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis

🔹 Publication Date: Published on Oct 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.06694
• PDF: https://arxiv.org/pdf/2510.06694

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14847
• PDF: https://arxiv.org/pdf/2510.14847
• Github: https://github.com/AMAP-ML/ImagerySearch

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13910
• PDF: https://arxiv.org/pdf/2510.13910

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

🔹 Publication Date: Published on Oct 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.04849
• PDF: https://arxiv.org/pdf/2510.04849

🔹 Datasets citing this paper:
https://huggingface.co/datasets/s-nlp/PsiloQA

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: On Pretraining for Project-Level Code Completion

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13697
• PDF: https://arxiv.org/pdf/2510.13697
• Project Page: https://huggingface.co/collections/JetBrains-Research/repository-level-pre-trained-opencoder-68e938c003be1cfba9c3595e

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🤖🧠 NVIDIA, MIT, HKU and Tsinghua University Introduce QeRL: A Powerful Quantum Leap in Reinforcement Learning for LLMs

🗓️ 17 Oct 2025
📚 AI News & Trends

The rise of large language models (LLMs) has redefined artificial intelligence powering everything from conversational AI to autonomous reasoning systems. However, training these models especially through reinforcement learning (RL) is computationally expensive requiring massive GPU resources and long training cycles. To address this, a team of researchers from NVIDIA, Massachusetts Institute of Technology (MIT), The ...

#QuantumLearning #ReinforcementLearning #LLMs #NVIDIA #MIT #TsinghuaUniversity
🔹 Title: Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14961
• PDF: https://arxiv.org/pdf/2510.14961
• Github: https://github.com/seal-rg/recurrent-pretraining

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🤖🧠 Agentic Entropy-Balanced Policy Optimization (AEPO): Balancing Exploration and Stability in Reinforcement Learning for Web Agents

🗓️ 17 Oct 2025
📚 AI News & Trends

AEPO (Agentic Entropy-Balanced Policy Optimization) represents a major advancement in the evolution of Agentic Reinforcement Learning (RL). As large language models (LLMs) increasingly act as autonomous web agents – searching, reasoning and interacting with tools – the need for balanced exploration and stability has become crucial. Traditional RL methods often rely heavily on entropy to ...

#AgenticRL #ReinforcementLearning #LLMs #WebAgents #EntropyBalanced #PolicyOptimization
1
🔹 Title: SimKO: Simple Pass@K Policy Optimization

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14807
• PDF: https://arxiv.org/pdf/2510.14807
• Project Page: https://spherelab.ai/simko/
• Github: https://github.com/CLR-Lab/SimKO

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: Agentic Design of Compositional Machines

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14980
• PDF: https://arxiv.org/pdf/2510.14980
• Project Page: https://besiegefield.github.io/
• Github: https://besiegefield.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14942
• PDF: https://arxiv.org/pdf/2510.14942

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13161
• PDF: https://arxiv.org/pdf/2510.13161

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13913
• PDF: https://arxiv.org/pdf/2510.13913

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the Importance of Exploration Breadth

🔹 Publication Date: Published on Oct 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.10472
• PDF: https://arxiv.org/pdf/2510.10472
• Project Page: https://github.com/qrzou/FML-bench
• Github: https://github.com/qrzou/FML-bench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14095
• PDF: https://arxiv.org/pdf/2510.14095
• Github: https://github.com/Awni00/algorithmic-generalization-transformer-architectures

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT
🔹 Title: LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14240
• PDF: https://arxiv.org/pdf/2510.14240

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.me/DataScienceT