Papers.Data.Code
18 subscribers
101 links
Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. πŸ“„πŸ’»πŸ“Š
papers.data.code@gmail.com
Download Telegram
πŸ“ˆ Monthly: Multimodal & Agents | Apr 10 – May 10
#MonthlyDigest #Multimodal #Agents

πŸ“„ Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟢ beats VLA baselines on 7 benchmarks
β†’ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟢ +13.8 points on 7 benchmarks
β†’ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟢ scientific tasks on structured data
β†’ Learn more...

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟢ boosts accuracy over SFTβ†’RLVR
β†’ Learn more...

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟢ multimodal pixel-aligned generation
β†’ Learn more...

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟢ open-vocabulary 3D in dynamics
β†’ Learn more...

πŸ’» Repos

YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟢ strong Doc and OCR gains
β†’ Learn more...

RockeyCoss/LeapAlign_Code ⭐
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟢ preference-aligns flow-matching T2I
β†’ Learn more...

πŸ“Š Datasets

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟢ visual generation research
β†’ Learn more...

MathNet v0 β€” Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟢ reasoning and retrieval benchmark
β†’ Learn more...

⚑ Trends

β–Έ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
β–Έ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
β–Έ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.

🧭 TL;DR

πŸ“„ MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.

πŸ’‘ Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.

via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
πŸ‘€ Tao Liu, Hao Yan, Mengting Chen et al.

🎯 Task
Few-step text-to-image diffusion distillation

πŸ’‘ Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.

✨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.

πŸ’» Repo
⭐ byliutao/cdm β€” 77 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #CV #ImageGeneration #PermissiveLicense

giant-permissive-image-corpus
πŸ‘€ stanford-vision-lab

🎯 Task
Visual generation

πŸ’‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

✨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.

Size: 100M images

Downloads: 86 | Likes: 3

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #LLM #Benchmark #SoftwareEngineering

Program Bench
πŸ‘€ facebookresearch

🎯 Task
Program reconstruction benchmark

πŸ’‘ Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.

✨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.

πŸ’» Repo
⭐ facebookresearch/ProgramBench β€” 390 stars (+278 3d)
Python


via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation

Flow-OPD: On-Policy Distillation for Flow Matching Models
πŸ‘€ Zhen Fang, Wenxuan Huang, Yu Zeng et al.

🎯 Task
Text-to-image model alignment

πŸ’‘ Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.

✨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63β†’92 and OCR 59β†’94, about 10 points over GRPO.

πŸ’» Repo
⭐ CostaliyA/Flow-OPD β€” 80 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #TestTimeScaling #Reasoning

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
πŸ‘€ Tong Zheng, Haolin Liu, Chengsong Huang et al.

🎯 Task
Test-time scaling for LLM reasoning

πŸ’‘ Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-Ξ² parameterization and execution-trace feedback.

✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.

πŸ’» Repo
⭐ zhengkid/AutoTTS β€” 43 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators

AI Index Data: Growth, Talent (Cambridge/Harvard)
πŸ‘€ patelris

🎯 Task
Global AI readiness and growth analysis

πŸ’‘ Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.

✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.

Size: 259,546 observations, 24,453 indicators

Downloads: 242 | Likes: 28

πŸ”— dataset

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ReinforcementLearning #PostTraining

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
πŸ‘€ Yun Qu, Qi Wang, Yixiu Mao et al.

🎯 Task
LLM post-training with verifiable rewards

πŸ’‘ Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.

✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.

πŸ”— paper

via @Papers.Data.Code
πŸ’» Repo #Repo #Robotics #InertialOdometry #SelfSupervised

Kiss Imu
πŸ‘€ sparolab

🎯 Task
Self-supervised inertial odometry

πŸ’‘ Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.

✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.

πŸ’» Repo
⭐ sparolab/KISS-IMU β€” 63 stars (+43 3d)
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
πŸ‘€ George Wu, Nan Jing, Qing Yi et al.

🎯 Task
Test-time scaling for LLM reasoning

πŸ’‘ Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.

✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.

πŸ’» Repo
⭐ george-QF/TMAS-code β€” 4 stars

πŸ”— paper

via @Papers.Data.Code
πŸ”₯1
πŸ“Š Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing

Hyperspectral Invasive Detection Dataset
πŸ‘€ ziya07

🎯 Task
Hyperspectral invasive plant classification

πŸ’‘ Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.

✨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.

Downloads: 33 | Likes: 13

πŸ”— dataset

via @Papers.Data.Code
πŸ”₯ Repo #Repo #LLM #Metal #KvCache

Ds4
πŸ‘€ antirez

🎯 Task
Local LLM inference and serving

πŸ’‘ Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.

✨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.

πŸ’» Repo
⭐ antirez/ds4 β€” 8.0k stars (+5.3k 3d)
C


via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #MemoryMechanisms #Attention

Ξ΄-mem: Efficient Online Memory for Large Language Models
πŸ‘€ Jingdi Lei, Di Zhang, Junxian Li et al.

🎯 Task
Long-term memory augmentation for LLMs

πŸ’‘ Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.

✨ Why it's interesting
With only an 8Γ—8 state, average score reaches 1.10Γ— the frozen backbone and 1.15Γ— the best non-Ξ΄-mem baseline; 1.31Γ— on MemoryAgentBench and 1.20Γ— on LoCoMo.

πŸ’» Repo
⭐ declare-lab/delta-Mem β€” 53 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #Tabular #Epidemiology #InfectiousDisease

🦠 Hantavirus (Andes Virus) β€” Global Epidemiology
πŸ‘€ zkskhurram

🎯 Task
Infectious disease epidemiology analysis

πŸ’‘ Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993–2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.

✨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.

Size: 7 tables, 25 countries, 1993–2025

πŸ“Š Dataset
πŸ“₯ 662 downloads
❀️ 26 likes

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #CV #4dReconstruction #DynamicScenes

D4rt
πŸ‘€ lucidrains

🎯 Task
dynamic scene reconstruction from video

πŸ’‘ Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.

✨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.

πŸ’» Repo
⭐ lucidrains/d4rt β€” 50 stars (+50 3d)
Python


via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #VisionLanguageModels #ImageGeneration

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
πŸ‘€ Haiwen Diao, Penghao Wu, Hanming Deng et al.

🎯 Task
Unified multimodal understanding and generation

πŸ’‘ Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.

✨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32Γ— compression.

πŸ’» Repo
⭐ OpenSenseNova/SenseNova-U1 β€” 1.7k stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #VideoGeneration #DiffusionModels

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
πŸ‘€ Yuchao Gu, Guian Fang, Yuxin Jiang et al.

🎯 Task
Any-step video generation

πŸ’‘ Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.

✨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.

πŸ’» Repo
⭐ NVlabs/AnyFlow β€” 202 stars
⭐ NVLabs/AnyFlow β€” 202 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #ReinforcementLearning #AgentTraining

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
πŸ‘€ Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.

🎯 Task
Long-form deep research agent training

πŸ’‘ Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.

✨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #NLP #StemReasoning #VisualQuestionAnswering

open-mm-rl
πŸ‘€ TuringEnterprises

🎯 Task
Multimodal STEM question answering

πŸ’‘ Idea
40 MIT-licensed STEM QA examples across physics, math, biology, and chemistry, spanning single-image, multi-panel, and multi-image formats with deterministic final answers.

✨ Why it's interesting
Deterministic, programmatically checkable answers make advanced multimodal STEM reasoning benchmarkable for RL and outcome-supervised training.

Size: 40 examples, 15.5 MB

πŸ“Š Dataset
πŸ“₯ 2.6k downloads
❀️ 94 likes

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #CV #ImageToVideo #2kGeneration

Swifti2v
πŸ‘€ HKUST-LongGroup

🎯 Task
high-resolution image-to-video generation

πŸ’‘ Idea
Generate native 2K videos from a single image by first producing a low-res motion reference, then refining to high resolution while conditioning on both the input image and the Stage I video.

✨ Why it's interesting
Matches strong 2K end-to-end I2V baselines on key VBench-I2V metrics with 202Γ— less GPU-time; 81-frame 2K output runs in ~111s on one H800 and fits on a 24 GB RTX 4090.

πŸ’» Repo
⭐ HKUST-LongGroup/SwiftI2V β€” 71 stars (+47 3d)
HTML

πŸ”— paper

via @Papers.Data.Code
πŸ“‹ Weekly Digest Β· May 09 – May 16
#WeeklyDigest

πŸ“„ Papers

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
#VideoGeneration #DiffusionModels #Distillation
Any-step video diffusion ⟢ 84.05 VBench at 4 NFEs
β†’ Learn more...

Flow-OPD: On-Policy Distillation for Flow Matching Models
#TextToImage #KnowledgeDistillation #ReinforcementLearning
On-policy flow distillation ⟢ boosts GenEval and OCR
β†’ Learn more...

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
#VisionLanguageModels #ImageGeneration #MixtureOfExperts
NEO-unify multimodal model ⟢ unifies understanding and generation
β†’ Learn more...

Ξ΄-mem: Efficient Online Memory for Large Language Models
#MemoryMechanisms #Attention #ParameterEfficientTuning
Online associative memory ⟢ steers attention for long-horizon tasks
β†’ Learn more...

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
#TestTimeScaling #Reasoning #AgenticSearch
Offline replay controller ⟢ improves accuracy-cost tradeoffs
β†’ Learn more...

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
#ReinforcementLearning #AgentTraining #LongContextReasoning
Rubric-guided meta-RL ⟢ stagewise credit for research agents
β†’ Learn more...

πŸ’» Repos

antirez/ds4 ⭐
#Metal #KvCache #OpenaiCompatible
Metal local inference ⟢ 1M context with disk KV cache
β†’ Learn more...

facebookresearch/ProgramBench ⭐
#Benchmark #SoftwareEngineering #ReverseEngineering
Program reconstruction benchmark ⟢ tests LM reverse engineering
β†’ Learn more...

sparolab/KISS-IMU ⭐
#InertialOdometry #SelfSupervised #LidarPseudoLabels
Self-supervised IMU odometry ⟢ denoises raw IMU with LiDAR labels
β†’ Learn more...

πŸ“Š Datasets

AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset ⟢ cross-country trend forecasting
β†’ Learn more...

giant-permissive-image-corpus
#ImageGeneration #PermissiveLicense #ImageDataset
Permissive image corpus ⟢ trains visual generation
β†’ Learn more...

➑️ Tomorrow β€” Efficient ML Monthly

via @Papers.Data.Code
πŸ‘1