Papers.Data.Code
18 subscribers
101 links
Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. 📄💻📊
papers.data.code@gmail.com
Download Telegram
📊 Dataset #Dataset #Multimodal #ImageGeneration #PermissiveLicense

gpic
👤 stanford-vision-lab

🎯 Task
Visual generation

💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.

Size: 100M images

Downloads: 187 | Likes: 4

🔗 dataset

via @Papers.Data.Code
📄 Paper #Paper #CV #3DGeneration #ArticulatedObjects

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
👤 Yunhan Yang, Chunshi Wang, Junliang Ye et al.

🎯 Task
Physics-grounded 3D asset generation

💡 Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.

Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.

💻 Repo
HKU-MMLab/PhysForge — 44 stars

🔗 paper

via @Papers.Data.Code
📋 Weekly Digest | May 02 – May 09
#WeeklyDigest

📄 Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟶ SOTA ~30B search agents
→ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟶ simulation-ready articulated assets
→ Learn more...

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟶ predicts animation-ready rotations
→ Learn more...

💻 Repos

PKU-YuanGroup/TIDE
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟶ 0.6B student, lower cost
→ Learn more...

Vinayak-VG/GenWildSplat
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟶ 3D Gaussian splat in 3s
→ Learn more...

YanFangCS/GenLIP
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...

📊 Datasets

SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟶ studies human-agent coding workflows
→ Learn more...

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...

➡️ Tomorrow — Multimodal & Agents Monthly

via @Papers.Data.Code
📈 Monthly: Multimodal & Agents | Apr 10 – May 10
#MonthlyDigest #Multimodal #Agents

📄 Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟶ boosts accuracy over SFT→RLVR
→ Learn more...

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟶ multimodal pixel-aligned generation
→ Learn more...

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟶ open-vocabulary 3D in dynamics
→ Learn more...

💻 Repos

YanFangCS/GenLIP
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...

RockeyCoss/LeapAlign_Code
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟶ preference-aligns flow-matching T2I
→ Learn more...

📊 Datasets

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...

MathNet v0 — Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟶ reasoning and retrieval benchmark
→ Learn more...

Trends

▸ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
▸ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
▸ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.

🧭 TL;DR

📄 MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.

💡 Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.

via @Papers.Data.Code
📄 Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
👤 Tao Liu, Hao Yan, Mengting Chen et al.

🎯 Task
Few-step text-to-image diffusion distillation

💡 Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.

Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.

💻 Repo
byliutao/cdm — 77 stars

🔗 paper

via @Papers.Data.Code
📊 Dataset #Dataset #CV #ImageGeneration #PermissiveLicense

giant-permissive-image-corpus
👤 stanford-vision-lab

🎯 Task
Visual generation

💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.

Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.

Size: 100M images

Downloads: 86 | Likes: 3

🔗 dataset

via @Papers.Data.Code
💻 Repo #Repo #LLM #Benchmark #SoftwareEngineering

Program Bench
👤 facebookresearch

🎯 Task
Program reconstruction benchmark

💡 Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.

Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.

💻 Repo
facebookresearch/ProgramBench — 390 stars (+278 3d)
Python


via @Papers.Data.Code
📄 Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation

Flow-OPD: On-Policy Distillation for Flow Matching Models
👤 Zhen Fang, Wenxuan Huang, Yu Zeng et al.

🎯 Task
Text-to-image model alignment

💡 Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.

Why it's interesting
On SD 3.5 Medium, GenEval rises 63→92 and OCR 59→94, about 10 points over GRPO.

💻 Repo
CostaliyA/Flow-OPD — 80 stars

🔗 paper

via @Papers.Data.Code
📄 Paper #Paper #LLM #TestTimeScaling #Reasoning

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
👤 Tong Zheng, Haolin Liu, Chengsong Huang et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-β parameterization and execution-trace feedback.

Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.

💻 Repo
zhengkid/AutoTTS — 43 stars

🔗 paper

via @Papers.Data.Code
📊 Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators

AI Index Data: Growth, Talent (Cambridge/Harvard)
👤 patelris

🎯 Task
Global AI readiness and growth analysis

💡 Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.

Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.

Size: 259,546 observations, 24,453 indicators

Downloads: 242 | Likes: 28

🔗 dataset

via @Papers.Data.Code
📄 Paper #Paper #LLM #ReinforcementLearning #PostTraining

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
👤 Yun Qu, Qi Wang, Yixiu Mao et al.

🎯 Task
LLM post-training with verifiable rewards

💡 Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.

Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.

🔗 paper

via @Papers.Data.Code
💻 Repo #Repo #Robotics #InertialOdometry #SelfSupervised

Kiss Imu
👤 sparolab

🎯 Task
Self-supervised inertial odometry

💡 Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.

Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.

💻 Repo
sparolab/KISS-IMU — 63 stars (+43 3d)
Python

🔗 paper

via @Papers.Data.Code
📄 Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
👤 George Wu, Nan Jing, Qing Yi et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.

Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.

💻 Repo
george-QF/TMAS-code — 4 stars

🔗 paper

via @Papers.Data.Code
🔥1
📊 Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing

Hyperspectral Invasive Detection Dataset
👤 ziya07

🎯 Task
Hyperspectral invasive plant classification

💡 Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.

Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.

Downloads: 33 | Likes: 13

🔗 dataset

via @Papers.Data.Code
🔥 Repo #Repo #LLM #Metal #KvCache

Ds4
👤 antirez

🎯 Task
Local LLM inference and serving

💡 Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.

Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.

💻 Repo
antirez/ds4 — 8.0k stars (+5.3k 3d)
C


via @Papers.Data.Code
📄 Paper #Paper #LLM #MemoryMechanisms #Attention

δ-mem: Efficient Online Memory for Large Language Models
👤 Jingdi Lei, Di Zhang, Junxian Li et al.

🎯 Task
Long-term memory augmentation for LLMs

💡 Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.

Why it's interesting
With only an 8×8 state, average score reaches 1.10× the frozen backbone and 1.15× the best non-δ-mem baseline; 1.31× on MemoryAgentBench and 1.20× on LoCoMo.

💻 Repo
declare-lab/delta-Mem — 53 stars

🔗 paper

via @Papers.Data.Code
📊 Dataset #Dataset #Tabular #Epidemiology #InfectiousDisease

🦠 Hantavirus (Andes Virus) — Global Epidemiology
👤 zkskhurram

🎯 Task
Infectious disease epidemiology analysis

💡 Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993–2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.

Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.

Size: 7 tables, 25 countries, 1993–2025

📊 Dataset
📥 662 downloads
❤️ 26 likes

🔗 dataset

via @Papers.Data.Code
💻 Repo #Repo #CV #4dReconstruction #DynamicScenes

D4rt
👤 lucidrains

🎯 Task
dynamic scene reconstruction from video

💡 Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.

Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.

💻 Repo
lucidrains/d4rt — 50 stars (+50 3d)
Python


via @Papers.Data.Code
📄 Paper #Paper #Multimodal #VisionLanguageModels #ImageGeneration

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
👤 Haiwen Diao, Penghao Wu, Hanming Deng et al.

🎯 Task
Unified multimodal understanding and generation

💡 Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.

Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32× compression.

💻 Repo
OpenSenseNova/SenseNova-U1 — 1.7k stars

🔗 paper

via @Papers.Data.Code
📄 Paper #Paper #CV #VideoGeneration #DiffusionModels

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
👤 Yuchao Gu, Guian Fang, Yuxin Jiang et al.

🎯 Task
Any-step video generation

💡 Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.

Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.

💻 Repo
NVlabs/AnyFlow — 202 stars
NVLabs/AnyFlow — 202 stars

🔗 paper

via @Papers.Data.Code
📄 Paper #Paper #LLM #ReinforcementLearning #AgentTraining

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
👤 Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.

🎯 Task
Long-form deep research agent training

💡 Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.

Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.

🔗 paper

via @Papers.Data.Code