Papers.Data.Code

📄 Paper #Paper #Multimodal #ImageGeneration #VideoGeneration

Lance: Unified Multimodal Modeling by Multi-Task Synergy
👤 Fengyi Fu, Mengqi Huang, Shaojin Wu et al.

🎯 Task
Unified multimodal understanding and generation

💡 Idea
Instead of one shared visual path or bolted-on modules, Lance uses a shared interleaved multimodal context with dual MoE streams: one expert for text+semantic understanding, one for VAE-latent generation, plus modality-aware RoPE and staged multi-task training.

✨ Why it's interesting
With only 3B activated params and a 128-GPU budget, it substantially outperforms prior open-source unified models on image and video generation while keeping strong understanding.

💻 Repo
⭐ bytedance/Lance — 314 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - bytedance/Lance: A 3B-active-parameter native unified multimodal model for image and video understanding, generation,…

A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing. - bytedance/Lance

5 views16:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #ComputerUseAgents #Benchmarking

OpenComputer: Verifiable Software Worlds for Computer-Use Agents
👤 Jinbiao Wei, Qianran Ma, Yilun Zhao et al.

🎯 Task
Computer-use agent evaluation and benchmark generation

💡 Idea
Instead of screenshot or LLM-judge evaluation, uses app-specific state verifiers over real software, then self-refines them from execution disagreements to synthesize and score realistic desktop tasks automatically.

✨ Why it's interesting
Covers 33 apps and 1,000 tasks. Verifiers align better with humans than LLM judges. Best agent hits 68.3% success; open models drop sharply vs OSWorld.

💻 Repo
⭐ echo0715/OpenComputer

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - echo0715/OpenComputer

Contribute to echo0715/OpenComputer development by creating an account on GitHub.

3 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #LongContext

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
👤 Minxuan Lv, Tiehua Mei, Tanlong Du et al.

🎯 Task
Long-context reinforcement learning for LLMs

💡 Idea
Instead of retrieval-path-heavy QA data and uniform rewards, it trains on 9 long-context capability tasks with task-native metrics, then replaces vanilla GRPO's prompt-level scaling with task-mean normalization plus difficulty-adaptive reweighting.

✨ Why it's interesting
On Qwen3-30B-A3B, average long-context score rises from 60.1 to 69.8; TMN-Reweight reaches 63.0 on 4B vs 62.2 with vanilla GRPO.

💻 Repo
⭐ xiaoxuanNLP/GoLongRL

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - xiaoxuanNLP/GoLongRL: GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment - xiaoxuanNLP/GoLongRL

3 views10:00

Papers.Data.Code

🔥 Repo #Repo #LLM #Pretraining #HierarchicalReasoningModel

Hrm Text
👤 sapientinc

🎯 Task
efficient foundation model pretraining

💡 Idea
Pretrain HRM text generation models from scratch on 8-16 H100s with built-in data packing, distributed training, benchmark evaluation, and checkpoint export to Transformers format.

✨ Why it's interesting
Claims 130-600x less compute and 150-900x less data; reference runs train 0.6B-1B models in 46-50 hours on 8-16 H100s.

💻 Repo
⭐ sapientinc/HRM-Text — 580 stars (+580 3d)
Python

via @Papers.Data.Code

GitHub

GitHub - sapientinc/HRM-Text: HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion…

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning. - sapientinc/HRM-Text

3 views13:00

Papers.Data.Code

📄 Paper #Paper #CV #VideoGeneration #Quantization

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
👤 Yukang Chen, Luozhou Wang, Wei Huang et al.

🎯 Task
Long video generation infrastructure

💡 Idea
Instead of complex multi-stage long-video pipelines, it directly fine-tunes an AR diffusion model and co-designs sequence parallelism with teacher forcing. Balanced SP pairs clean/noisy chunks per rank, while end-to-end NVFP4 enables W4A4 inference, KV-cache compression, and async decoding.

✨ Why it's interesting
Up to 2.15x faster training and 1.84x faster inference; 45.7 FPS, 21.9 ms/frame, and memory cut from 35.4 GB to 19.4 GB.

💻 Repo
⭐ NVlabs/LongLive — 1.4k stars
Python

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - NVlabs/LongLive: LongLive 2.0: Infra - Long Video Gen

LongLive 2.0: Infra - Long Video Gen. Contribute to NVlabs/LongLive development by creating an account on GitHub.

3 views16:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #Reasoning

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
👤 Zhepei Wei, Xinyu Zhu, Wei-Lin Chen et al.

🎯 Task
LLM RL checkpoint extrapolation

💡 Idea
Instead of running full RLVR, estimate each tensor's dominant rank-1 update direction from early checkpoints and linearly extrapolate its coefficient. Unlike raw weight or logit extrapolation, it uses the low-rank RLVR geometry as a denoised predictor.

✨ Why it's interesting
With 15-20% of RLVR steps, RELEX matches or nears full RLVR on MATH: 71.6 vs 71.5, 85.6 vs 85.5, 87.4 vs 88.5 across 3 models.

💻 Repo
⭐ weizhepei/RELEX

🔗 paper

via @Papers.Data.Code

arXiv.org

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1...

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter...

3 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #Reasoning #ReinforcementLearning

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
👤 Guobin Shen, Xiang Cheng, Chenxiao Zhao et al.

🎯 Task
Reasoning reinforcement learning for math and code

💡 Idea
Instead of pulling the policy toward a privileged self-teacher that rewards shortcut tokens and suppresses deliberation, AntiSD reverses the signal: ascend student-teacher JSD, with an entropy gate to stop once teacher confidence collapses.

✨ Why it's interesting
Across 5 models (4B-30B), it matches GRPO in 2-10x fewer steps and improves final avg accuracy by up to 11.5 points.

💻 Repo
⭐ FloyedShen/AntiSD — 11 stars
Python

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - FloyedShen/AntiSD: Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information - FloyedShen/AntiSD

3 views10:00

Papers.Data.Code

📊 Dataset #Dataset #LLM #MultiTurnDialogue #UserModeling

ThoughtTrace
👤 SCAI-JHU

🎯 Task
User modeling in multi-turn dialogue

💡 Idea
2,155 real-world conversations from 1,058 users across 20 LLMs, with 10,174 message-level thought annotations: 7 reason types on user turns and 5 reaction types on assistant turns.

✨ Why it's interesting
Makes latent user intent and satisfaction measurable from real chats; authors show gains for behavior prediction (+41.7%) and alignment (+25.6% win rate).

Ⓢ 2,155 conversations, 10,174 thought annotations

🔗 dataset 🔗 paper 🔗 repo

via @Papers.Data.Code

3 views13:00

Papers.Data.Code

📄 Paper #Paper #Audio #SpeechRecognition #Robustness

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
👤 Zhifei Xie, Kaiyu Pang, Haobin Zhang et al.

🎯 Task
Robust automatic speech recognition

💡 Idea
Instead of training on isolated mild noise, it scales to 54 physically plausible compound acoustic scenarios and trains ASR progressively from acoustic perception to semantic recovery, then uses WER-gated token- vs sentence-level rewards to handle both local errors and hallucinated/omitted transcripts.

✨ Why it's interesting
Beats prior SOTA on adverse ASR: 45.69% vs 54.01% on VOiCES R4-B-F, 21.49% vs 29.34% on NOIZEUS Sta-0; >30% relative WER drop on compound scenarios.

💻 Repo
⭐ xzf-thu/Mega-ASR

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - xzf-thu/Mega-ASR: First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios…

First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'...

3 views16:00

Papers.Data.Code

3 views09:00

Papers.Data.Code

3 views15:00

Papers.Data.Code

⚡ Trends

▸ Reinforcement learning is shifting toward structured credit assignment and more stable objectives
▸ Test-time scaling increasingly uses agentic search, verification loops, and multi-agent coordination
▸ LLM agents are trained on richer long-horizon trajectories for search, research, and tool use

🧭 TL;DR

📄 Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Unified SFT-RL-test-time scaling reaches gold-level olympiad reasoning on 30B models.

⭐ antirez/ds4
Practical local DeepSeek serving with 1M context and persistent KV cache.

💡 LLM progress is shifting toward scalable reasoning and agentic interaction optimization.

via Papers.Data.Code

3 views15:00

Papers.Data.Code

📅 Monthly digest week starts tomorrow — May2026.
Top papers, repos and datasets land at @papersdatacode_digests Mon–Wed.

3 views18:00

About

Blog

Apps

Platform