📋 Weekly Digest | May 02 – May 09
#WeeklyDigest
📄 Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟶ SOTA ~30B search agents
→ Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟶ simulation-ready articulated assets
→ Learn more...
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟶ predicts animation-ready rotations
→ Learn more...
💻 Repos
PKU-YuanGroup/TIDE ⭐
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟶ 0.6B student, lower cost
→ Learn more...
Vinayak-VG/GenWildSplat ⭐
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟶ 3D Gaussian splat in 3s
→ Learn more...
YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...
📊 Datasets
SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟶ studies human-agent coding workflows
→ Learn more...
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...
➡️ Tomorrow — Multimodal & Agents Monthly
via @Papers.Data.Code
#WeeklyDigest
📄 Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟶ SOTA ~30B search agents
→ Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟶ simulation-ready articulated assets
→ Learn more...
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟶ predicts animation-ready rotations
→ Learn more...
💻 Repos
PKU-YuanGroup/TIDE ⭐
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟶ 0.6B student, lower cost
→ Learn more...
Vinayak-VG/GenWildSplat ⭐
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟶ 3D Gaussian splat in 3s
→ Learn more...
YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...
📊 Datasets
SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟶ studies human-agent coding workflows
→ Learn more...
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...
➡️ Tomorrow — Multimodal & Agents Monthly
via @Papers.Data.Code