Papers.Data.Code
18 subscribers
99 links
Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. πŸ“„πŸ’»πŸ“Š
papers.data.code@gmail.com
Download Telegram
πŸ“ˆ Monthly Β· Efficient ML Β· Apr 17 – May 17
#MonthlyDigest #EfficientML

πŸ“„ Papers

Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
#DiffusionModels #DecisionTrees #KnowledgeDistillation
Trees and flows ⟢ faster tabular generation
β†’ Learn more...

πŸ“Š Datasets

MSR-ACC/TAE25
#QuantumChemistry #AtomizationEnergy #CoupledCluster
Quantum chemistry dataset ⟢ trains atomization energy models
β†’ Learn more...

AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset ⟢ cross-country trend forecasting
β†’ Learn more...

WHO Global Health Indicators for Prediction
#GlobalHealth #CountryLevel #WorldBank
Global health panel data ⟢ cross-country trend analysis
β†’ Learn more...

⚑ Trends

β–Έ Longitudinal country-level datasets increasingly target forecasting and cross-country trend analysis.
β–Έ Wide, linked, multi-table dataset formats are becoming standard for benchmarking.
β–Έ Efficiency gains come from unifying model families and distilling complex systems.

🧭 TL;DR

πŸ“„ Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
Unifies trees and diffusion, delivering faster tabular generation and effective distillation.

πŸ’‘ Efficiency advances increasingly come from unifying classical structures with generative modeling.

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #Reasoning #ReinforcementLearning

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
πŸ‘€ Yafu Li, Runzhe Zhan, Haoran Zhang et al.

🎯 Task
Olympiad-level mathematical and scientific reasoning

πŸ’‘ Idea
Instead of domain-specific systems, it uses one scaling recipe: reverse-perplexity long-CoT SFT to instill proof search and self-checking, then coarse verifiable-reward RL, proof-level RL with self-refinement/replay, and test-time verification loops.

✨ Why it's interesting
SU-01 gets 57.6% on IMO-ProofBench, 70.2% with test-time scaling, and reaches the IMO 2025 gold line with 35 points.

πŸ’» Repo
⭐ Simplified-Reasoning/SU-01 β€” 68 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #LLM #SoftwareEngineering #ToolUse

Orchard
πŸ‘€ microsoft

🎯 Task
Agentic software engineering and web GUI interaction

πŸ’‘ Idea
~110K agent trajectories in 2 parallel subsets: 107,185 SWE chat+tool rollouts over 2,788 GitHub repos with hidden-test pass/fail labels, plus 3,070 GUI decision-point rows with screenshots, chat context, and judge-verified rewards across 409 web tasks.

✨ Why it's interesting
Verified patch outcomes and judge-scored GUI steps make agent training and evaluation measurable across real coding and browser tasks.

β“ˆ 110,255 samples, ~10.97 GB

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #CV #DepthEstimation #CameraPose

vggt Omega
πŸ‘€ facebookresearch

🎯 Task
multi-view camera and depth reconstruction

πŸ’‘ Idea
Infer camera parameters and per-image depth from a set of images in one forward pass, and optionally produce text-aligned embeddings for the same visual inputs.

✨ Why it's interesting
Runs end-to-end on a single A100 with 6.02 GB for 1 frame and 43.15 GB for 500 frames, with released 1B checkpoints and demo code.

πŸ’» Repo
⭐ facebookresearch/vggt-omega β€” 413 stars (+413 3d)
Python


via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #VideoGeneration #DiffusionModels

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
πŸ‘€ Min Zhao, Hongzhou Zhu, Kaiwen Zheng et al.

🎯 Task
Real-time autoregressive video generation

πŸ’‘ Idea
Instead of costly AR-teacher ODE trajectory distillation, it initializes few-step AR students with causal consistency distillation: same AR flow-map target, but learned from single online adjacent-step teacher updates, making frame-wise 1-2 step rollout practical.

✨ Why it's interesting
At frame-wise 2-step, beats 4-step chunk-wise Causal Forcing by +0.1 VBench Total, +0.3 Quality, +0.335 VisionReward; 50% lower first-frame latency, ~4x cheaper Stage 2.

πŸ’» Repo
⭐ thu-ml/Causal-Forcing β€” 665 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #LongContextModeling #VisionLanguageModels

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
πŸ‘€ Zhaowei Wang, Lishu Luo, Haodong Duan et al.

🎯 Task
Long-context vision-language modeling

πŸ’‘ Idea
Instead of OCR-style long-data training, use instruction-formatted long-document VQA. A balanced length mix and retrieval-heavy task mixture beat 128K-focused or transcription-based training for extending LVLM context.

✨ Why it's interesting
With 5B tokens, MMProLong improves long-doc VQA by 7.1%, stays strong at 256K/512K beyond 128K training, and exceeds baselines by 20%+ there.

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #LLM #HallucinationDetection #Multilingual

LLM Hallucination Benchmark Dataset
πŸ‘€ alitaqishah

🎯 Task
LLM hallucination detection and analysis

πŸ’‘ Idea
200 annotated LLM responses spanning 5 models, 8 domains, 7 languages, 7 hallucination types, 4 annotator types, and 4 mitigation strategies, with prompt, response, hallucination label, span, severity, and verified correction.

✨ Why it's interesting
Makes cross-model, multilingual hallucination detection and mitigation evaluation directly measurable with typed error labels and corrected references.

β“ˆ 200 annotated responses

πŸ”— dataset

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ReinforcementLearning #KnowledgeDistillation

Self-Distilled Agentic Reinforcement Learning
πŸ‘€ Zhengxi Lu, Zhiyuan Yao, Zhuowen Han et al.

🎯 Task
Agentic RL for multi-turn LLMs

πŸ’‘ Idea
Instead of naively mixing OPSD with RL, SDAR keeps RL as the backbone and uses detached token-level gates to apply distillation selectivelyβ€”amplifying positive teacher-student gaps and softening negative teacher rejections.

✨ Why it's interesting
Beats GRPO by +9.4% on ALFWorld, +7.0% on Search-QA, and +10.2% WebShop-Acc, while avoiding naive GRPO+OPSD instability.

πŸ’» Repo
⭐ ZJU-REAL/SDAR β€” 96 stars

πŸ”— paper

via @Papers.Data.Code
πŸ’» Repo #Repo #CV #VideoGeneration #CameraControl

Warp As History
πŸ‘€ yyfz

🎯 Task
camera-controlled video generation

πŸ’‘ Idea
Generate videos that follow user-specified camera trajectories from one input frame, using a single training video and optional interactive/autoregressive control in a drop-in Helios pipeline.

✨ Why it's interesting
Enables interactive viewpoint control from only one camera-annotated training example, with released training, inference, and browser demo code.

πŸ’» Repo
⭐ yyfz/Warp-as-History β€” 117 stars (+58 3d)
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #NLP #TheoremProving #RetrievalAugmentedGeneration

OProver: A Unified Framework for Agentic Formal Theorem Proving
πŸ‘€ David Ma, Kaijing Ma, Shawn Guo et al.

🎯 Task
Formal theorem proving

πŸ’‘ Idea
Instead of bolting retrieval and self-repair onto a fixed prover at test time, OProver trains that agentic loop itself: multi-round proof revision conditioned on retrieved verified proofs and raw Lean feedback, with new verified proofs and repair traces fed back into training.

✨ Why it's interesting
OProver-32B gets best Pass@32 on MiniF2F 93.3%, ProverBench 58.2%, PutnamBench 11.3%, and second on MathOlympiad 22.8% and ProofNet 33.2%.

πŸ’» Repo
⭐ multimodal-art-projection/OProver β€” 7 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #Tabular #Education #MentalHealth

Impact of Ai on Students
πŸ‘€ laveshjadon

🎯 Task
Student outcome and burnout prediction

πŸ’‘ Idea
50,000 student records with 16 features spanning academic profile, GenAI usage, study habits, institutional policy, anxiety, skill retention, and burnout, with targets for GPA regression, skill retention, and burnout classification.

✨ Why it's interesting
Makes it possible to model academic and well-being outcomes against AI usage and policy in one complete, balanced student dataset.

β“ˆ 50,000 samples, 16 columns, CSV

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #LLM #FederatedLearning #Lora

Smart Fed
πŸ‘€ benmagnifico

🎯 Task
federated LLM fine-tuning with LoRA reuse

πŸ’‘ Idea
Compose a frozen pool of existing task LoRAs into a federated adapter by splitting them into rank-wise experts and learning a small input-conditioned router that selects and combines them on each client.

✨ Why it's interesting
Cuts training, communication, and energy cost versus federated train-from-scratch baselines, and beats both knowledge-free and knowledge-reuse baselines on three skill-composition tasks.

πŸ’» Repo
⭐ benmagnifico/SmartFed β€” 15 stars (+15 3d)


via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #ImageGeneration #VideoGeneration

Lance: Unified Multimodal Modeling by Multi-Task Synergy
πŸ‘€ Fengyi Fu, Mengqi Huang, Shaojin Wu et al.

🎯 Task
Unified multimodal understanding and generation

πŸ’‘ Idea
Instead of one shared visual path or bolted-on modules, Lance uses a shared interleaved multimodal context with dual MoE streams: one expert for text+semantic understanding, one for VAE-latent generation, plus modality-aware RoPE and staged multi-task training.

✨ Why it's interesting
With only 3B activated params and a 128-GPU budget, it substantially outperforms prior open-source unified models on image and video generation while keeping strong understanding.

πŸ’» Repo
⭐ bytedance/Lance β€” 314 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #ComputerUseAgents #Benchmarking

OpenComputer: Verifiable Software Worlds for Computer-Use Agents
πŸ‘€ Jinbiao Wei, Qianran Ma, Yilun Zhao et al.

🎯 Task
Computer-use agent evaluation and benchmark generation

πŸ’‘ Idea
Instead of screenshot or LLM-judge evaluation, uses app-specific state verifiers over real software, then self-refines them from execution disagreements to synthesize and score realistic desktop tasks automatically.

✨ Why it's interesting
Covers 33 apps and 1,000 tasks. Verifiers align better with humans than LLM judges. Best agent hits 68.3% success; open models drop sharply vs OSWorld.

πŸ’» Repo
⭐ echo0715/OpenComputer

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ReinforcementLearning #LongContext

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
πŸ‘€ Minxuan Lv, Tiehua Mei, Tanlong Du et al.

🎯 Task
Long-context reinforcement learning for LLMs

πŸ’‘ Idea
Instead of retrieval-path-heavy QA data and uniform rewards, it trains on 9 long-context capability tasks with task-native metrics, then replaces vanilla GRPO's prompt-level scaling with task-mean normalization plus difficulty-adaptive reweighting.

✨ Why it's interesting
On Qwen3-30B-A3B, average long-context score rises from 60.1 to 69.8; TMN-Reweight reaches 63.0 on 4B vs 62.2 with vanilla GRPO.

πŸ’» Repo
⭐ xiaoxuanNLP/GoLongRL

πŸ”— paper

via @Papers.Data.Code
πŸ”₯ Repo #Repo #LLM #Pretraining #HierarchicalReasoningModel

Hrm Text
πŸ‘€ sapientinc

🎯 Task
efficient foundation model pretraining

πŸ’‘ Idea
Pretrain HRM text generation models from scratch on 8-16 H100s with built-in data packing, distributed training, benchmark evaluation, and checkpoint export to Transformers format.

✨ Why it's interesting
Claims 130-600x less compute and 150-900x less data; reference runs train 0.6B-1B models in 46-50 hours on 8-16 H100s.

πŸ’» Repo
⭐ sapientinc/HRM-Text β€” 580 stars (+580 3d)
Python


via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #VideoGeneration #Quantization

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
πŸ‘€ Yukang Chen, Luozhou Wang, Wei Huang et al.

🎯 Task
Long video generation infrastructure

πŸ’‘ Idea
Instead of complex multi-stage long-video pipelines, it directly fine-tunes an AR diffusion model and co-designs sequence parallelism with teacher forcing. Balanced SP pairs clean/noisy chunks per rank, while end-to-end NVFP4 enables W4A4 inference, KV-cache compression, and async decoding.

✨ Why it's interesting
Up to 2.15x faster training and 1.84x faster inference; 45.7 FPS, 21.9 ms/frame, and memory cut from 35.4 GB to 19.4 GB.

πŸ’» Repo
⭐ NVlabs/LongLive β€” 1.4k stars
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ReinforcementLearning #Reasoning

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
πŸ‘€ Zhepei Wei, Xinyu Zhu, Wei-Lin Chen et al.

🎯 Task
LLM RL checkpoint extrapolation

πŸ’‘ Idea
Instead of running full RLVR, estimate each tensor's dominant rank-1 update direction from early checkpoints and linearly extrapolate its coefficient. Unlike raw weight or logit extrapolation, it uses the low-rank RLVR geometry as a denoised predictor.

✨ Why it's interesting
With 15-20% of RLVR steps, RELEX matches or nears full RLVR on MATH: 71.6 vs 71.5, 85.6 vs 85.5, 87.4 vs 88.5 across 3 models.

πŸ’» Repo
⭐ weizhepei/RELEX

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #Reasoning #ReinforcementLearning

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
πŸ‘€ Guobin Shen, Xiang Cheng, Chenxiao Zhao et al.

🎯 Task
Reasoning reinforcement learning for math and code

πŸ’‘ Idea
Instead of pulling the policy toward a privileged self-teacher that rewards shortcut tokens and suppresses deliberation, AntiSD reverses the signal: ascend student-teacher JSD, with an entropy gate to stop once teacher confidence collapses.

✨ Why it's interesting
Across 5 models (4B-30B), it matches GRPO in 2-10x fewer steps and improves final avg accuracy by up to 11.5 points.

πŸ’» Repo
⭐ FloyedShen/AntiSD β€” 11 stars
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #LLM #MultiTurnDialogue #UserModeling

ThoughtTrace
πŸ‘€ SCAI-JHU

🎯 Task
User modeling in multi-turn dialogue

πŸ’‘ Idea
2,155 real-world conversations from 1,058 users across 20 LLMs, with 10,174 message-level thought annotations: 7 reason types on user turns and 5 reaction types on assistant turns.

✨ Why it's interesting
Makes latent user intent and satisfaction measurable from real chats; authors show gains for behavior prediction (+41.7%) and alignment (+25.6% win rate).

β“ˆ 2,155 conversations, 10,174 thought annotations

πŸ”— dataset πŸ”— paper πŸ”— repo

via @Papers.Data.Code
πŸ“„ Paper #Paper #Audio #SpeechRecognition #Robustness

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
πŸ‘€ Zhifei Xie, Kaiyu Pang, Haobin Zhang et al.

🎯 Task
Robust automatic speech recognition

πŸ’‘ Idea
Instead of training on isolated mild noise, it scales to 54 physically plausible compound acoustic scenarios and trains ASR progressively from acoustic perception to semantic recovery, then uses WER-gated token- vs sentence-level rewards to handle both local errors and hallucinated/omitted transcripts.

✨ Why it's interesting
Beats prior SOTA on adverse ASR: 45.69% vs 54.01% on VOiCES R4-B-F, 21.49% vs 29.34% on NOIZEUS Sta-0; >30% relative WER drop on compound scenarios.

πŸ’» Repo
⭐ xzf-thu/Mega-ASR

πŸ”— paper

via @Papers.Data.Code