Papers.Data.Code

📈 Monthly: Multimodal & Agents | Apr 10 – May 10
#MonthlyDigest #Multimodal #Agents

📄 Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟶ boosts accuracy over SFT→RLVR
→ Learn more...

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟶ multimodal pixel-aligned generation
→ Learn more...

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟶ open-vocabulary 3D in dynamics
→ Learn more...

💻 Repos

YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...

RockeyCoss/LeapAlign_Code ⭐
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟶ preference-aligns flow-matching T2I
→ Learn more...

📊 Datasets

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...

MathNet v0 — Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟶ reasoning and retrieval benchmark
→ Learn more...

⚡ Trends

▸ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
▸ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
▸ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.

🧭 TL;DR

📄 MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.

💡 Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.

via @Papers.Data.Code

6 views15:00

Papers.Data.Code

📄 Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
👤 Tao Liu, Hao Yan, Mengting Chen et al.

🎯 Task
Few-step text-to-image diffusion distillation

💡 Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.

✨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.

💻 Repo
⭐ byliutao/cdm — 77 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - byliutao/CDM: Continuous-Time Distribution Matching for Few-Step Diffusion Distillation👏

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation👏 - byliutao/CDM

6 views08:00

Papers.Data.Code

📊 Dataset #Dataset #CV #ImageGeneration #PermissiveLicense

giant-permissive-image-corpus
👤 stanford-vision-lab

🎯 Task
Visual generation

💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

✨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.

Size: 100M images

Downloads: 86 | Likes: 3

🔗 dataset

via @Papers.Data.Code

5 views13:00

Papers.Data.Code

💻 Repo #Repo #LLM #Benchmark #SoftwareEngineering

Program Bench
👤 facebookresearch

🎯 Task
Program reconstruction benchmark

💡 Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.

✨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.

💻 Repo
⭐ facebookresearch/ProgramBench — 390 stars (+278 3d)
Python

via @Papers.Data.Code

GitHub

GitHub - facebookresearch/ProgramBench: Can Language Models Rebuild Programs From Scratch?

Can Language Models Rebuild Programs From Scratch? - facebookresearch/ProgramBench

5 views16:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation

Flow-OPD: On-Policy Distillation for Flow Matching Models
👤 Zhen Fang, Wenxuan Huang, Yu Zeng et al.

🎯 Task
Text-to-image model alignment

💡 Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.

✨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63→92 and OCR 59→94, about 10 points over GRPO.

💻 Repo
⭐ CostaliyA/Flow-OPD — 80 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - CostaliyA/Flow-OPD: Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"

Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models" - CostaliyA/Flow-OPD

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #TestTimeScaling #Reasoning

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
👤 Tong Zheng, Haolin Liu, Chengsong Huang et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-β parameterization and execution-trace feedback.

✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.

💻 Repo
⭐ zhengkid/AutoTTS — 43 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - zhengkid/AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"

The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling" - zhengkid/AutoTTS

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators

AI Index Data: Growth, Talent (Cambridge/Harvard)
👤 patelris

🎯 Task
Global AI readiness and growth analysis

💡 Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.

✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.

Size: 259,546 observations, 24,453 indicators

Downloads: 242 | Likes: 28

🔗 dataset

via @Papers.Data.Code

Kaggle

AI Index Data: Growth, Talent (Cambridge/Harvard)

259K observations across 24K+ AI metrics from Cambridge/Harvard

5 views13:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #PostTraining

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
👤 Yun Qu, Qi Wang, Yixiu Mao et al.

🎯 Task
LLM post-training with verifiable rewards

💡 Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.

✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.

🔗 paper

via @Papers.Data.Code

arXiv.org

Listwise Policy Optimization: Group-based RLVR as...

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes,...

5 views16:00

Papers.Data.Code

💻 Repo #Repo #Robotics #InertialOdometry #SelfSupervised

Kiss Imu
👤 sparolab

🎯 Task
Self-supervised inertial odometry

💡 Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.

✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.

💻 Repo
⭐ sparolab/KISS-IMU — 63 stars (+43 3d)
Python

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - sparolab/KISS-IMU: KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference.…

KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference. @ ICRA'26 Award Finalist - sparolab/KISS-IMU

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
👤 George Wu, Nan Jing, Qing Yi et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.

✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.

💻 Repo
⭐ george-QF/TMAS-code — 4 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - george-QF/TMAS-code

Contribute to george-QF/TMAS-code development by creating an account on GitHub.

🔥1

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing

Hyperspectral Invasive Detection Dataset
👤 ziya07

🎯 Task
Hyperspectral invasive plant classification

💡 Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.

✨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.

Downloads: 33 | Likes: 13

🔗 dataset

via @Papers.Data.Code

Kaggle

Hyperspectral Invasive Detection Dataset

Spectral-Spatial Vegetation Features for Intelligent Ecological Mapping

6 views13:00

Papers.Data.Code

🔥 Repo #Repo #LLM #Metal #KvCache

Ds4
👤 antirez

🎯 Task
Local LLM inference and serving

💡 Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.

✨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.

💻 Repo
⭐ antirez/ds4 — 8.0k stars (+5.3k 3d)
C

via @Papers.Data.Code

GitHub

GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal and CUDA

DeepSeek 4 Flash local inference engine for Metal and CUDA - antirez/ds4

5 views16:00

Papers.Data.Code

📄 Paper #Paper #LLM #MemoryMechanisms #Attention

δ-mem: Efficient Online Memory for Large Language Models
👤 Jingdi Lei, Di Zhang, Junxian Li et al.

🎯 Task
Long-term memory augmentation for LLMs

💡 Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.

✨ Why it's interesting
With only an 8×8 state, average score reaches 1.10× the frozen backbone and 1.15× the best non-δ-mem baseline; 1.31× on MemoryAgentBench and 1.20× on LoCoMo.

💻 Repo
⭐ declare-lab/delta-Mem — 53 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - declare-lab/delta-Mem: The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models

The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models - declare-lab/delta-Mem

6 views08:00

Papers.Data.Code

📊 Dataset #Dataset #Tabular #Epidemiology #InfectiousDisease

🦠 Hantavirus (Andes Virus) — Global Epidemiology
👤 zkskhurram

🎯 Task
Infectious disease epidemiology analysis

💡 Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993–2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.

✨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.

Size: 7 tables, 25 countries, 1993–2025

📊 Dataset
📥 662 downloads
❤️ 26 likes

🔗 dataset

via @Papers.Data.Code

Kaggle

🦠 Hantavirus (Andes Virus) — Global Epidemiology

🌍 Comprehensive worldwide dataset covering HPS/HFRS cases, clinical outcomes

6 views10:00

Papers.Data.Code

💻 Repo #Repo #CV #4dReconstruction #DynamicScenes

D4rt
👤 lucidrains

🎯 Task
dynamic scene reconstruction from video

💡 Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.

✨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.

💻 Repo
⭐ lucidrains/d4rt — 50 stars (+50 3d)
Python

via @Papers.Data.Code

GitHub

GitHub - lucidrains/d4rt: Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind

Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind - lucidrains/d4rt

6 views13:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #VisionLanguageModels #ImageGeneration

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
👤 Haiwen Diao, Penghao Wu, Hanming Deng et al.

🎯 Task
Unified multimodal understanding and generation

💡 Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.

✨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32× compression.

💻 Repo
⭐ OpenSenseNova/SenseNova-U1 — 1.7k stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - OpenSenseNova/SenseNova-U1: SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles

SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles - OpenSenseNova/SenseNova-U1

6 views16:00

Papers.Data.Code

📄 Paper #Paper #CV #VideoGeneration #DiffusionModels

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
👤 Yuchao Gu, Guian Fang, Yuxin Jiang et al.

🎯 Task
Any-step video generation

💡 Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.

✨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.

💻 Repo
⭐ NVlabs/AnyFlow — 202 stars
⭐ NVLabs/AnyFlow — 202 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - NVlabs/AnyFlow

Contribute to NVlabs/AnyFlow development by creating an account on GitHub.

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #AgentTraining

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
👤 Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.

🎯 Task
Long-form deep research agent training

💡 Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.

✨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.

🔗 paper

via @Papers.Data.Code

arXiv.org

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond...

Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the regime of verifiable rewards. Their...

5 views10:00

Papers.Data.Code

📊 Dataset #Dataset #NLP #StemReasoning #VisualQuestionAnswering

open-mm-rl
👤 TuringEnterprises

🎯 Task
Multimodal STEM question answering

💡 Idea
40 MIT-licensed STEM QA examples across physics, math, biology, and chemistry, spanning single-image, multi-panel, and multi-image formats with deterministic final answers.

✨ Why it's interesting
Deterministic, programmatically checkable answers make advanced multimodal STEM reasoning benchmarkable for RL and outcome-supervised training.

Size: 40 examples, 15.5 MB

📊 Dataset
📥 2.6k downloads
❤️ 94 likes

🔗 dataset

via @Papers.Data.Code

huggingface.co

TuringEnterprises/Open-MM-RL · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

5 views13:00

Papers.Data.Code

💻 Repo #Repo #CV #ImageToVideo #2kGeneration

Swifti2v
👤 HKUST-LongGroup

🎯 Task
high-resolution image-to-video generation

💡 Idea
Generate native 2K videos from a single image by first producing a low-res motion reference, then refining to high resolution while conditioning on both the input image and the Stage I video.

✨ Why it's interesting
Matches strong 2K end-to-end I2V baselines on key VBench-I2V metrics with 202× less GPU-time; 81-frame 2K output runs in ~111s on one H800 and fits on a 24 GB RTX 4090.

💻 Repo
⭐ HKUST-LongGroup/SwiftI2V — 71 stars (+47 3d)
HTML

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - HKUST-LongGroup/SwiftI2V: Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional…

Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation" - HKUST-LongGroup/SwiftI2V

4 views16:00

Papers.Data.Code

📋 Weekly Digest · May 09 – May 16
#WeeklyDigest

📄 Papers

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
#VideoGeneration #DiffusionModels #Distillation
Any-step video diffusion ⟶ 84.05 VBench at 4 NFEs
→ Learn more...

Flow-OPD: On-Policy Distillation for Flow Matching Models
#TextToImage #KnowledgeDistillation #ReinforcementLearning
On-policy flow distillation ⟶ boosts GenEval and OCR
→ Learn more...

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
#VisionLanguageModels #ImageGeneration #MixtureOfExperts
NEO-unify multimodal model ⟶ unifies understanding and generation
→ Learn more...

δ-mem: Efficient Online Memory for Large Language Models
#MemoryMechanisms #Attention #ParameterEfficientTuning
Online associative memory ⟶ steers attention for long-horizon tasks
→ Learn more...

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
#TestTimeScaling #Reasoning #AgenticSearch
Offline replay controller ⟶ improves accuracy-cost tradeoffs
→ Learn more...

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
#ReinforcementLearning #AgentTraining #LongContextReasoning
Rubric-guided meta-RL ⟶ stagewise credit for research agents
→ Learn more...

💻 Repos

antirez/ds4 ⭐
#Metal #KvCache #OpenaiCompatible
Metal local inference ⟶ 1M context with disk KV cache
→ Learn more...

facebookresearch/ProgramBench ⭐
#Benchmark #SoftwareEngineering #ReverseEngineering
Program reconstruction benchmark ⟶ tests LM reverse engineering
→ Learn more...

sparolab/KISS-IMU ⭐
#InertialOdometry #SelfSupervised #LidarPseudoLabels
Self-supervised IMU odometry ⟶ denoises raw IMU with LiDAR labels
→ Learn more...

📊 Datasets

AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset ⟶ cross-country trend forecasting
→ Learn more...

giant-permissive-image-corpus
#ImageGeneration #PermissiveLicense #ImageDataset
Permissive image corpus ⟶ trains visual generation
→ Learn more...

➡️ Tomorrow — Efficient ML Monthly

via @Papers.Data.Code

👍1

6 views09:00

About

Blog

Apps

Platform