Papers.Data.Code

📄 Paper #Paper #Multimodal #ReinforcementLearning #KnowledgeDistillation

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
👤 Sudong Wang, Weiquan Huang, Xiaomin Yu et al.

🎯 Task
Multimodal reasoning post-training

💡 Idea
Black-box adversarial on-policy distillation with an MoE discriminator separates perception and reasoning feedback, aligning post-SFT outputs to supervision before RL without teacher logits.

✨ Why it's interesting
Boosts average accuracy over SFT→RLVR by +4.4 on 4B and +6.0 on 8B.

💻 Repo
⭐ XIAO4579/PRISM — 53 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - XIAO4579/PRISM

Contribute to XIAO4579/PRISM development by creating an account on GitHub.

6 views10:00

Papers.Data.Code

💻 Repo #Repo #Multimodal #TextToImage #FlowMatching

Leap Align Code
👤 RockeyCoss

🎯 Task
Preference alignment for text-to-image flow matching models

💡 Idea
Aligns flow-matching image generators with human preference rewards by replacing full-trajectory backpropagation with a two-step leap trajectory, so optimization can target any generation step during sampling.

✨ Why it's interesting
Enables gradient propagation to any generation step while avoiding full-trajectory memory cost.

💻 Repo
⭐ RockeyCoss/LeapAlign_Code — 12 stars (+12 3d)
Python

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - RockeyCoss/LeapAlign_Code: [CVPR2026] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building…

[CVPR2026] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories - RockeyCoss/LeapAlign_Code

6 views13:00

Papers.Data.Code

📄 Paper #Paper #LLM #ContextLearning #MultiAgentSystems

From Context to Skills: Can Language Models Learn from Context Skillfully?
👤 Shuzheng Si, Haozhe Zhao, Yu Lei et al.

🎯 Task
Context learning for language models

💡 Idea
Multi-agent self-play builds skills instead of updating weights: Challenger makes tasks/rubrics, Reasoner solves with evolving skills, Judge gives binary feedback, and Cross-time Replay picks the most generalizable skill set.

✨ Why it's interesting
Improves CL-bench solving rates, e.g. GPT-4.1 11.1%→16.5% and GPT-5.1 21.2%→25.8%.

💻 Repo
⭐ S1s-Z/Ctx2Skill — 44 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - S1s-Z/Ctx2Skill: Code for "From Context to Skills: Can Language Models Learn from Context Skillfully? "

Code for "From Context to Skills: Can Language Models Learn from Context Skillfully? " - S1s-Z/Ctx2Skill

🔥1

7 views16:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #AgenticRL #ToolUse

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
👤 Shuang Chen, Kaituo Feng, Hangting Chen et al.

🎯 Task
Multimodal deep search agents

💡 Idea
Wikipedia path-sampled multi-hop VQA plus a unified search/OCR/image-enhancement toolset train agents with fatal-aware GRPO, masking post-failure tokens and clamping advantages to keep useful pre-failure reasoning.

✨ Why it's interesting
Improves average score from 47.8 to 61.6; +13.8 points across 7 benchmarks.

💻 Repo
⭐ shawn0728/OpenSearch-VL — 69 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - shawn0728/OpenSearch-VL: 🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents…

🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diverse visual/search tools, and fatal-aware agentic reinforcement...

6 views08:00

Papers.Data.Code

📄 Paper #Paper #NLP #InformationRetrieval #Benchmarking

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
👤 Yilun Zhao, Jinbiao Wei, Tingyu Song et al.

🎯 Task
Reasoning-intensive retrieval

💡 Idea
Aspect-annotated retrieval benchmark plus aspect-decomposed synthetic training. BRIGHT-PRO labels multi-aspect evidence and tests static/agentic search; RTriever-Synth creates complementary positives and positive-conditioned hard negatives for LoRA tuning.

✨ Why it's interesting
RTriever-4B substantially improves over Qwen3-Embedding-4B.

💻 Repo
⭐ yale-nlp/Bright-Pro — 11 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - yale-nlp/Bright-Pro: Data and code for ACL 2026 Paper "Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing…

Data and code for ACL 2026 Paper "Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems" - yale-nlp/Bright-Pro

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #Multimodal #ImageGeneration #PermissiveLicense

gpic
👤 stanford-vision-lab

🎯 Task
Visual generation

💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

✨ Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.

Size: 100M images

Downloads: 187 | Likes: 4

🔗 dataset

via @Papers.Data.Code

6 views13:00

Papers.Data.Code

📄 Paper #Paper #CV #3DGeneration #ArticulatedObjects

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
👤 Yunhan Yang, Chunshi Wang, Junliang Ye et al.

🎯 Task
Physics-grounded 3D asset generation

💡 Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.

✨ Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.

💻 Repo
⭐ HKU-MMLab/PhysForge — 44 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - HKU-MMLab/PhysForge: [ICML 2026] PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

[ICML 2026] PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World - HKU-MMLab/PhysForge

6 views16:00

Papers.Data.Code

4 views09:00

Papers.Data.Code

📈 Monthly: Multimodal & Agents | Apr 10 – May 10
#MonthlyDigest #Multimodal #Agents

📄 Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟶ boosts accuracy over SFT→RLVR
→ Learn more...

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟶ multimodal pixel-aligned generation
→ Learn more...

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟶ open-vocabulary 3D in dynamics
→ Learn more...

💻 Repos

YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...

RockeyCoss/LeapAlign_Code ⭐
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟶ preference-aligns flow-matching T2I
→ Learn more...

📊 Datasets

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...

MathNet v0 — Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟶ reasoning and retrieval benchmark
→ Learn more...

⚡ Trends

▸ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
▸ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
▸ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.

🧭 TL;DR

📄 MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.

💡 Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.

via @Papers.Data.Code

6 views15:00

Papers.Data.Code

📄 Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
👤 Tao Liu, Hao Yan, Mengting Chen et al.

🎯 Task
Few-step text-to-image diffusion distillation

💡 Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.

✨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.

💻 Repo
⭐ byliutao/cdm — 77 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - byliutao/CDM: Continuous-Time Distribution Matching for Few-Step Diffusion Distillation👏

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation👏 - byliutao/CDM

6 views08:00

Papers.Data.Code

📊 Dataset #Dataset #CV #ImageGeneration #PermissiveLicense

giant-permissive-image-corpus
👤 stanford-vision-lab

🎯 Task
Visual generation

💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

✨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.

Size: 100M images

Downloads: 86 | Likes: 3

🔗 dataset

via @Papers.Data.Code

5 views13:00

Papers.Data.Code

💻 Repo #Repo #LLM #Benchmark #SoftwareEngineering

Program Bench
👤 facebookresearch

🎯 Task
Program reconstruction benchmark

💡 Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.

✨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.

💻 Repo
⭐ facebookresearch/ProgramBench — 390 stars (+278 3d)
Python

via @Papers.Data.Code

GitHub

GitHub - facebookresearch/ProgramBench: Can Language Models Rebuild Programs From Scratch?

Can Language Models Rebuild Programs From Scratch? - facebookresearch/ProgramBench

5 views16:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation

Flow-OPD: On-Policy Distillation for Flow Matching Models
👤 Zhen Fang, Wenxuan Huang, Yu Zeng et al.

🎯 Task
Text-to-image model alignment

💡 Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.

✨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63→92 and OCR 59→94, about 10 points over GRPO.

💻 Repo
⭐ CostaliyA/Flow-OPD — 80 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - CostaliyA/Flow-OPD: Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"

Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models" - CostaliyA/Flow-OPD

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #TestTimeScaling #Reasoning

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
👤 Tong Zheng, Haolin Liu, Chengsong Huang et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-β parameterization and execution-trace feedback.

✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.

💻 Repo
⭐ zhengkid/AutoTTS — 43 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - zhengkid/AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"

The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling" - zhengkid/AutoTTS

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators

AI Index Data: Growth, Talent (Cambridge/Harvard)
👤 patelris

🎯 Task
Global AI readiness and growth analysis

💡 Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.

✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.

Size: 259,546 observations, 24,453 indicators

Downloads: 242 | Likes: 28

🔗 dataset

via @Papers.Data.Code

Kaggle

AI Index Data: Growth, Talent (Cambridge/Harvard)

259K observations across 24K+ AI metrics from Cambridge/Harvard

5 views13:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #PostTraining

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
👤 Yun Qu, Qi Wang, Yixiu Mao et al.

🎯 Task
LLM post-training with verifiable rewards

💡 Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.

✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.

🔗 paper

via @Papers.Data.Code

arXiv.org

Listwise Policy Optimization: Group-based RLVR as...

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes,...

5 views16:00

Papers.Data.Code

💻 Repo #Repo #Robotics #InertialOdometry #SelfSupervised

Kiss Imu
👤 sparolab

🎯 Task
Self-supervised inertial odometry

💡 Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.

✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.

💻 Repo
⭐ sparolab/KISS-IMU — 63 stars (+43 3d)
Python

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - sparolab/KISS-IMU: KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference.…

KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference. @ ICRA'26 Award Finalist - sparolab/KISS-IMU

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
👤 George Wu, Nan Jing, Qing Yi et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.

✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.

💻 Repo
⭐ george-QF/TMAS-code — 4 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - george-QF/TMAS-code

Contribute to george-QF/TMAS-code development by creating an account on GitHub.

🔥1

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing

Hyperspectral Invasive Detection Dataset
👤 ziya07

🎯 Task
Hyperspectral invasive plant classification

💡 Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.

✨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.

Downloads: 33 | Likes: 13

🔗 dataset

via @Papers.Data.Code

Kaggle

Hyperspectral Invasive Detection Dataset

Spectral-Spatial Vegetation Features for Intelligent Ecological Mapping

6 views13:00

Papers.Data.Code

🔥 Repo #Repo #LLM #Metal #KvCache

Ds4
👤 antirez

🎯 Task
Local LLM inference and serving

💡 Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.

✨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.

💻 Repo
⭐ antirez/ds4 — 8.0k stars (+5.3k 3d)
C

via @Papers.Data.Code

GitHub

GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal and CUDA

DeepSeek 4 Flash local inference engine for Metal and CUDA - antirez/ds4

5 views16:00

Papers.Data.Code

📄 Paper #Paper #LLM #MemoryMechanisms #Attention

δ-mem: Efficient Online Memory for Large Language Models
👤 Jingdi Lei, Di Zhang, Junxian Li et al.

🎯 Task
Long-term memory augmentation for LLMs

💡 Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.

✨ Why it's interesting
With only an 8×8 state, average score reaches 1.10× the frozen backbone and 1.15× the best non-δ-mem baseline; 1.31× on MemoryAgentBench and 1.20× on LoCoMo.

💻 Repo
⭐ declare-lab/delta-Mem — 53 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - declare-lab/delta-Mem: The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models

The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models - declare-lab/delta-Mem

6 views08:00

About

Blog

Apps

Platform