Papers.Data.Code
18 subscribers
99 links
Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. πŸ“„πŸ’»πŸ“Š
papers.data.code@gmail.com
Download Telegram
πŸ“Š Dataset #Dataset #LLM #CodingAgent #AgentTraces

SWE-chat
πŸ‘€ SALT-NLP

🎯 Task
AI coding session modeling

πŸ’‘ Idea
205+ repositories of real developer–AI coding sessions with full chat transcripts, tool calls, thinking traces, code changes, and authorship attribution between humans and agents.

✨ Why it's interesting
Combines interaction traces with code edits and authorship labels, enabling study of real human-agent coding workflows.

Size: 205+ repositories

Downloads: 1.5k | Likes: 34

πŸ”— dataset

via @Papers.Data.Code
πŸ“‹ Weekly Digest | May 02 – May 09
#WeeklyDigest

πŸ“„ Papers

MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟢ beats VLA baselines on 7 benchmarks
β†’ Learn more...

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟢ +13.8 points on 7 benchmarks
β†’ Learn more...

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟢ SOTA ~30B search agents
β†’ Learn more...

Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟢ scientific tasks on structured data
β†’ Learn more...

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟢ simulation-ready articulated assets
β†’ Learn more...

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟢ predicts animation-ready rotations
β†’ Learn more...

πŸ’» Repos

PKU-YuanGroup/TIDE ⭐
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟢ 0.6B student, lower cost
β†’ Learn more...

Vinayak-VG/GenWildSplat ⭐
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟢ 3D Gaussian splat in 3s
β†’ Learn more...

YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟢ strong Doc and OCR gains
β†’ Learn more...

πŸ“Š Datasets

SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟢ studies human-agent coding workflows
β†’ Learn more...

gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟢ visual generation research
β†’ Learn more...

➑️ Tomorrow β€” Multimodal & Agents Monthly

via @Papers.Data.Code