Papers.Data.Code
18 subscribers
101 links
Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. πŸ“„πŸ’»πŸ“Š
papers.data.code@gmail.com
Download Telegram
πŸ’» Repo #Repo #Multimodal #VisionEncoder #AutoregressivePretraining

GenLIP
πŸ‘€ YanFangCS

🎯 Task
Generative vision-language pretraining

πŸ’‘ Idea
Pretrain a ViT-based vision encoder for MLLMs using a single Transformer and a single autoregressive language modeling objective, without contrastive loss, dual towers, or an extra text decoder.

✨ Why it's interesting
Simplifies MLLM vision pretraining and reports particularly strong gains on Doc and OCR tasks.

πŸ’» Repo
⭐ YanFangCS/GenLIP β€” 49 stars (+49 3d)
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #Tabular #DiffusionModels #DecisionTrees

Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
πŸ‘€ Sai Niranjan Ramachandran, Suvrit Sra

🎯 Task
Tabular generation and tree-to-network distillation

πŸ’‘ Idea
Tree↔flow correspondence maps refined tree partitions to PF-ODEs and diffusion dynamics to hierarchies; GTSM unifies boosting and score matching. TREEFLOW conditions flows on tree paths, and DSM-TREE distills full tree decisions.

✨ Why it's interesting
TREEFLOW is 2Γ— faster; best TSTR on 3/5 and best Wasserstein on 4/5 benchmarks.

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #SearchAgents #SupervisedFineTuning

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
πŸ‘€ Yuwen Du, Rui Ye, Shuo Tang et al.

🎯 Task
LLM search agent training

πŸ’‘ Idea
High-difficulty trajectory synthesis for SFT: enlarge source graphs, expand the tool set, and filter out short trajectories to force longer multi-hop ReAct search without CPT or RL.

✨ Why it's interesting
46.0 BrowseComp, 58.1 BC-ZH, 34.6 HLE, 78.0 xbench; beats Tongyi DeepResearch.

πŸ’» Repo
⭐ PolarSeeker/OpenSeeker β€” 634 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #ReinforcementLearning #KnowledgeDistillation

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
πŸ‘€ Sudong Wang, Weiquan Huang, Xiaomin Yu et al.

🎯 Task
Multimodal reasoning post-training

πŸ’‘ Idea
Black-box adversarial on-policy distillation with an MoE discriminator separates perception and reasoning feedback, aligning post-SFT outputs to supervision before RL without teacher logits.

✨ Why it's interesting
Boosts average accuracy over SFT→RLVR by +4.4 on 4B and +6.0 on 8B.

πŸ’» Repo
⭐ XIAO4579/PRISM β€” 53 stars

πŸ”— paper

via @Papers.Data.Code
πŸ’» Repo #Repo #Multimodal #TextToImage #FlowMatching

Leap Align Code
πŸ‘€ RockeyCoss

🎯 Task
Preference alignment for text-to-image flow matching models

πŸ’‘ Idea
Aligns flow-matching image generators with human preference rewards by replacing full-trajectory backpropagation with a two-step leap trajectory, so optimization can target any generation step during sampling.

✨ Why it's interesting
Enables gradient propagation to any generation step while avoiding full-trajectory memory cost.

πŸ’» Repo
⭐ RockeyCoss/LeapAlign_Code β€” 12 stars (+12 3d)
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ContextLearning #MultiAgentSystems

From Context to Skills: Can Language Models Learn from Context Skillfully?
πŸ‘€ Shuzheng Si, Haozhe Zhao, Yu Lei et al.

🎯 Task
Context learning for language models

πŸ’‘ Idea
Multi-agent self-play builds skills instead of updating weights: Challenger makes tasks/rubrics, Reasoner solves with evolving skills, Judge gives binary feedback, and Cross-time Replay picks the most generalizable skill set.

✨ Why it's interesting
Improves CL-bench solving rates, e.g. GPT-4.1 11.1%β†’16.5% and GPT-5.1 21.2%β†’25.8%.

πŸ’» Repo
⭐ S1s-Z/Ctx2Skill β€” 44 stars

πŸ”— paper

via @Papers.Data.Code
πŸ”₯1
πŸ“„ Paper #Paper #Multimodal #AgenticRL #ToolUse

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
πŸ‘€ Shuang Chen, Kaituo Feng, Hangting Chen et al.

🎯 Task
Multimodal deep search agents

πŸ’‘ Idea
Wikipedia path-sampled multi-hop VQA plus a unified search/OCR/image-enhancement toolset train agents with fatal-aware GRPO, masking post-failure tokens and clamping advantages to keep useful pre-failure reasoning.

✨ Why it's interesting
Improves average score from 47.8 to 61.6; +13.8 points across 7 benchmarks.

πŸ’» Repo
⭐ shawn0728/OpenSearch-VL β€” 69 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #NLP #InformationRetrieval #Benchmarking

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
πŸ‘€ Yilun Zhao, Jinbiao Wei, Tingyu Song et al.

🎯 Task
Reasoning-intensive retrieval

πŸ’‘ Idea
Aspect-annotated retrieval benchmark plus aspect-decomposed synthetic training. BRIGHT-PRO labels multi-aspect evidence and tests static/agentic search; RTriever-Synth creates complementary positives and positive-conditioned hard negatives for LoRA tuning.

✨ Why it's interesting
RTriever-4B substantially improves over Qwen3-Embedding-4B.

πŸ’» Repo
⭐ yale-nlp/Bright-Pro β€” 11 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #Multimodal #ImageGeneration #PermissiveLicense

gpic
πŸ‘€ stanford-vision-lab

🎯 Task
Visual generation

πŸ’‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

✨ Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.

Size: 100M images

Downloads: 187 | Likes: 4

πŸ”— dataset

via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #3DGeneration #ArticulatedObjects

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
πŸ‘€ Yunhan Yang, Chunshi Wang, Junliang Ye et al.

🎯 Task
Physics-grounded 3D asset generation

πŸ’‘ Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.

✨ Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.

πŸ’» Repo
⭐ HKU-MMLab/PhysForge β€” 44 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“‹ Weekly Digest | May 02 – May 09
#WeeklyDigest

πŸ“„ Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟢ beats VLA baselines on 7 benchmarks
β†’ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟢ +13.8 points on 7 benchmarks
β†’ Learn more...

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟢ SOTA ~30B search agents
β†’ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟢ scientific tasks on structured data
β†’ Learn more...

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟢ simulation-ready articulated assets
β†’ Learn more...

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟢ predicts animation-ready rotations
β†’ Learn more...

πŸ’» Repos

PKU-YuanGroup/TIDE ⭐
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟢ 0.6B student, lower cost
β†’ Learn more...

Vinayak-VG/GenWildSplat ⭐
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟢ 3D Gaussian splat in 3s
β†’ Learn more...

YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟢ strong Doc and OCR gains
β†’ Learn more...

πŸ“Š Datasets

SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟢ studies human-agent coding workflows
β†’ Learn more...

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟢ visual generation research
β†’ Learn more...

➑️ Tomorrow β€” Multimodal & Agents Monthly

via @Papers.Data.Code
πŸ“ˆ Monthly: Multimodal & Agents | Apr 10 – May 10
#MonthlyDigest #Multimodal #Agents

πŸ“„ Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟢ beats VLA baselines on 7 benchmarks
β†’ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟢ +13.8 points on 7 benchmarks
β†’ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟢ scientific tasks on structured data
β†’ Learn more...

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟢ boosts accuracy over SFTβ†’RLVR
β†’ Learn more...

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟢ multimodal pixel-aligned generation
β†’ Learn more...

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟢ open-vocabulary 3D in dynamics
β†’ Learn more...

πŸ’» Repos

YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟢ strong Doc and OCR gains
β†’ Learn more...

RockeyCoss/LeapAlign_Code ⭐
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟢ preference-aligns flow-matching T2I
β†’ Learn more...

πŸ“Š Datasets

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟢ visual generation research
β†’ Learn more...

MathNet v0 β€” Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟢ reasoning and retrieval benchmark
β†’ Learn more...

⚑ Trends

β–Έ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
β–Έ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
β–Έ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.

🧭 TL;DR

πŸ“„ MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.

πŸ’‘ Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.

via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
πŸ‘€ Tao Liu, Hao Yan, Mengting Chen et al.

🎯 Task
Few-step text-to-image diffusion distillation

πŸ’‘ Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.

✨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.

πŸ’» Repo
⭐ byliutao/cdm β€” 77 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #CV #ImageGeneration #PermissiveLicense

giant-permissive-image-corpus
πŸ‘€ stanford-vision-lab

🎯 Task
Visual generation

πŸ’‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

✨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.

Size: 100M images

Downloads: 86 | Likes: 3

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #LLM #Benchmark #SoftwareEngineering

Program Bench
πŸ‘€ facebookresearch

🎯 Task
Program reconstruction benchmark

πŸ’‘ Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.

✨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.

πŸ’» Repo
⭐ facebookresearch/ProgramBench β€” 390 stars (+278 3d)
Python


via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation

Flow-OPD: On-Policy Distillation for Flow Matching Models
πŸ‘€ Zhen Fang, Wenxuan Huang, Yu Zeng et al.

🎯 Task
Text-to-image model alignment

πŸ’‘ Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.

✨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63β†’92 and OCR 59β†’94, about 10 points over GRPO.

πŸ’» Repo
⭐ CostaliyA/Flow-OPD β€” 80 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #TestTimeScaling #Reasoning

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
πŸ‘€ Tong Zheng, Haolin Liu, Chengsong Huang et al.

🎯 Task
Test-time scaling for LLM reasoning

πŸ’‘ Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-Ξ² parameterization and execution-trace feedback.

✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.

πŸ’» Repo
⭐ zhengkid/AutoTTS β€” 43 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators

AI Index Data: Growth, Talent (Cambridge/Harvard)
πŸ‘€ patelris

🎯 Task
Global AI readiness and growth analysis

πŸ’‘ Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.

✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.

Size: 259,546 observations, 24,453 indicators

Downloads: 242 | Likes: 28

πŸ”— dataset

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ReinforcementLearning #PostTraining

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
πŸ‘€ Yun Qu, Qi Wang, Yixiu Mao et al.

🎯 Task
LLM post-training with verifiable rewards

πŸ’‘ Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.

✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.

πŸ”— paper

via @Papers.Data.Code
πŸ’» Repo #Repo #Robotics #InertialOdometry #SelfSupervised

Kiss Imu
πŸ‘€ sparolab

🎯 Task
Self-supervised inertial odometry

πŸ’‘ Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.

✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.

πŸ’» Repo
⭐ sparolab/KISS-IMU β€” 63 stars (+43 3d)
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
πŸ‘€ George Wu, Nan Jing, Qing Yi et al.

🎯 Task
Test-time scaling for LLM reasoning

πŸ’‘ Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.

✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.

πŸ’» Repo
⭐ george-QF/TMAS-code β€” 4 stars

πŸ”— paper

via @Papers.Data.Code
πŸ”₯1