Papers.Data.Code – Telegram

Papers.Data.Code

@papersdatacode

18 subscribers

101 links

Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. 📄💻📊
papers.data.code@gmail.com

Download Telegram

About

Blog

Apps

Platform

Papers.Data.Code

Papers.Data.Code

📄 Paper #Paper #CV #MultimodalLearning #ImageGeneration

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
👤 Zhiheng Liu, Weiming Ren, Xiaoke Huang et al.

🎯 Task
Unified multimodal understanding and generation

💡 Idea
Direct patch embeddings replace VAE and representation encoders, so one transformer handles text, images, and pixel-space generation end to end. A masking-based visual feature learning scheme stabilizes training and improves pixel-space representations.

✨ Why it's interesting
At 7B, it reaches SOTA among native UMMs on understanding and stays competitive on generation.

💻 Repo
⭐ facebookresearch/tuna-2 — 139 stars

🔗 paper

via @Papers.Data.Code

GitHub - facebookresearch/tuna-2: Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding…

Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation - facebookresearch/tuna-2

7 views16:00

Papers.Data.Code

📄 Paper #Paper #MultiAgentSystems #MultiAgentSystems #Reasoning

Recursive Multi-Agent Systems
👤 Xiyuan Yang, Jiaru Zou, Rui Pan et al.

🎯 Task
Multi-agent LLM reasoning

💡 Idea
Latent-state recursion across agents via lightweight RecursiveLink modules — agents pass and refine hidden states in a loop, with inner-outer training for whole-system credit assignment instead of text-based coordination.

✨ Why it's interesting
Avg +8.3% accuracy, 1.2-2.4x faster inference, and 34.6-75.6% fewer tokens vs baselines.

💻 Repo
⭐ RecursiveMAS/RecursiveMAS — 30 stars

🔗 paper

via @Papers.Data.Code

GitHub - RecursiveMAS/RecursiveMAS: Offical Implementation for "Recursive Multi-Agent Systems"

Offical Implementation for "Recursive Multi-Agent Systems" - RecursiveMAS/RecursiveMAS

7 views08:00

Papers.Data.Code

📄 Paper #Paper #AudioReasoning #AudioReasoning #Rlhf

Step-Audio-R1.5 Technical Report
👤 Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu et al.

🎯 Task
Audio reasoning for multi-turn spoken dialogue

💡 Idea
RLHF with a rubric-guided generated reward model compares responses in multi-turn audio chats, optimizing naturalness, coherence, and instruction retention beyond label-only RLVR.

✨ Why it's interesting
77.97 avg across 8 benchmarks, +5.47 over Step-Audio-R1; 41.15 on Audio MC.

💻 Repo
⭐ stepfun-ai/Step-Audio-R1 — 647 stars

🔗 paper 🔗 dataset

via @Papers.Data.Code

GitHub - stepfun-ai/Step-Audio-R1

Contribute to stepfun-ai/Step-Audio-R1 development by creating an account on GitHub.

🔥1

8 views10:00

Papers.Data.Code

💻 Repo #Repo #TestDrivenDevelopment #TestDrivenDevelopment #CodingAgents

Evan Flow
👤 evanklem

🎯 Task
AI-assisted software development workflow

💡 Idea
Single-entry workflow for Claude Code that orchestrates brainstorm → plan → execute → iterate, with vertical-slice TDD inside coding tasks, optional parallel coder/overseer subagents, and a hook blocking dangerous git commands.

✨ Why it's interesting
Keeps users in control with approval checkpoints, no auto-commits, and blocked destructive git ops.

💻 Repo
⭐ evanklem/evanflow — 356 stars (+356 3d)
Shell

🔗 paper

via @Papers.Data.Code

GitHub - evanklem/evanflow: A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk…

A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk an idea from brainstorm → plan → execute → iterate, with checkpoints throughout. - evanklem/evanflow

8 views13:00

Papers.Data.Code

📄 Paper #Paper #InstructionTuning #InstructionTuning #KnowledgeGraphs

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
👤 Chenkai Pan, Xinglong Xu, Yuhang Xu et al.

🎯 Task
Domain-specific LLM fine-tuning

💡 Idea
Shared L1 concepts, L2 relations, and L3 reasoning chains drive both SFT data and benchmarks; failures are traced to concept gaps or reasoning deficits and repaired with targeted data patches.

✨ Why it's interesting
Across 16 disciplines, one debug round let a 32B model beat GPT-5.4, Gemini-3-flash, and DeepSeek-v3.2.

💻 Repo
⭐ OpenRaiser/ProDa — 43 stars

🔗 paper

via @Papers.Data.Code

GitHub - OpenRaiser/ProDa: 📖 Data Engineering from Raw Corpora

📖 Data Engineering from Raw Corpora. Contribute to OpenRaiser/ProDa development by creating an account on GitHub.

7 views16:00

Papers.Data.Code

📄 Paper #Paper #MultimodalAgents #MultimodalAgents #ToolUse

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
👤 V Team, Wenyi Hong, Xiaotao Gu et al.

🎯 Task
Multimodal agent foundation model

💡 Idea
Native multimodal agent model with CogViT and multimodal multi-token prediction using <|image|> placeholders, plus joint RL over 30+ perception, reasoning, coding, and GUI tasks for end-to-end tool use.

✨ Why it's interesting
Scores 94.8 on Design2Code and 75.7 on AndroidWorld; RL adds +4.9 on OSWorld.

💻 Repo
⭐ zai-org/GLM-V — 2.3k stars
⭐ zai-org/ImageMining — 2.3k stars
⭐ zai-org/GLM-skills — 2.3k stars

🔗 paper

via @Papers.Data.Code

GitHub - zai-org/GLM-V: GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning - zai-org/GLM-V

8 views08:00

Papers.Data.Code

📊 Dataset #Dataset #RubricBasedEvaluation #RubricBasedEvaluation #PhysicianWritten

HealthBench Professional
👤 openai

🎯 Task
Clinical response evaluation

💡 Idea
Structured medical eval examples with conversations, physician responses, and scored rubric items, labeled by use case, red-teaming vs good-faith, difficulty, and specialty.

✨ Why it's interesting
Physician answers plus rubrics enable consistent scoring of model performance across clinical use cases.

Downloads: 5.7k | Likes: 43

🔗 dataset 🔗 repo

via @Papers.Data.Code

openai/healthbench-professional · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

8 views10:00

Papers.Data.Code

💻 Repo #Repo #Gbnf #Gbnf #LlamaCpp

Structured Cot
👤 andthattoo

🎯 Task
Reasoning token compression for code generation

💡 Idea
Constrain a model's thinking into short structured fields like GOAL/APPROACH/EDGE or GOAL/STATE/ALGO/EDGE/VERIFY at inference time, then let it generate code normally to reduce verbose CoT and compare free vs constrained runs.

✨ Why it's interesting
No training; 22.4× fewer think tokens on HumanEval+ and +14 pp pass@1 on LiveCodeBench.

💻 Repo
⭐ andthattoo/structured-cot — 196 stars (+154 3d)
Python

🔗 paper

via @Papers.Data.Code

GitHub - andthattoo/structured-cot: Structured Chain-of-Thought

Structured Chain-of-Thought. Contribute to andthattoo/structured-cot development by creating an account on GitHub.

7 views13:00

Papers.Data.Code

📄 Paper #Paper #LLM #MultiAgentSystems #AgentOrchestration

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
👤 Zhengxu Yu, Yu Fu, Zhiyuan He et al.

🎯 Task
Multi-agent organization and coordination

💡 Idea
Talent-Container architecture separates agent identity from runtime, while a Talent Market recruits verified agents on demand and E2R tree search plans, executes, and reviews tasks with formal guarantees.

✨ Why it's interesting
84.67% success on PRDBench, beating prior SOTA by 15.48 points.

💻 Repo
⭐ 1mancompany/OneManCompany — 119 stars

🔗 paper

via @Papers.Data.Code

GitHub - 1mancompany/OneManCompany: Build Your Agent Company with OMC

Build Your Agent Company with OMC. Contribute to 1mancompany/OneManCompany development by creating an account on GitHub.

🔥1

7 views16:00

Papers.Data.Code

📋 Weekly Digest | Apr 25 – May 02
#WeeklyDigest

📄 Papers

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video fine-tuning ⟶ boosts 3D consistency PSNR
→ Learn more...

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
#MultimodalAgents #ToolUse #ReinforcementLearning
Multimodal agent model ⟶ tool use and GUI interaction
→ Learn more...

Recursive Multi-Agent Systems
#MultiAgentSystems #Reasoning #LatentSpace
RecursiveMAS latent recursion ⟶ +8.3% accuracy, fewer tokens
→ Learn more...

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings ⟶ unified multimodal understanding generation
→ Learn more...

Step-Audio-R1.5 Technical Report
#AudioReasoning #Rlhf #SpokenDialogue
Rubric-guided audio RLHF ⟶ improves long-turn spoken dialogue
→ Learn more...

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
#MultiAgentSystems #AgentOrchestration #TaskPlanning
OMC multi-agent orchestration ⟶ 84.67% on PRDBench
→ Learn more...

💻 Repos

andthattoo/structured-cot ⭐
#Gbnf #LlamaCpp #CodeBenchmarks
Grammar-constrained CoT ⟶ 22.4× fewer think tokens
→ Learn more...

deepseek-ai/TileKernels ⭐
#MixtureOfExperts #Quantization #GpuKernels
TileLang GPU kernels ⟶ LLM ops near hardware limits
→ Learn more...

antirez/llama.cpp-deepseek-v4-flash ⭐
#Cpp #Gguf #Quantization
llama.cpp DeepSeek v4 Flash ⟶ local MacBook inference
→ Learn more...

📊 Datasets

H³D: High-quality Holistic 3D Editing Dataset
#3DEditing #InstructionFollowing #PartLevel
3D editing dataset ⟶ trains part-level 3D editors
→ Learn more...

MathNet v0 — Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟶ reasoning and retrieval benchmark
→ Learn more...

➡️ Tomorrow — Computer Vision Monthly

via @Papers.Data.Code

7 views09:00

Papers.Data.Code

📈 Monthly: Computer Vision | Apr 03 – May 03
#MonthlyDigest #CV

📄 Papers

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video tuning ⟶ boosts 3D consistency PSNR
→ Learn more...

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM ⟶ unifies multimodal understanding and generation
→ Learn more...

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings ⟶ unified multimodal understanding generation
→ Learn more...

Vista4D: Video Reshooting with 4D Point Clouds
#NovelViewSynthesis #VideoDiffusion #3DReconstruction
4D point cloud reshooting ⟶ best camera and 3D consistency
→ Learn more...

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
Diffusion HOI video synthesis ⟶ improves stability and contact realism
→ Learn more...

💻 Repos

facex-engine/facex ⭐
#FaceVerification #Webassembly #CpuInference
Local face embeddings ⟶ browser CPU verification
→ Learn more...

📊 Datasets

Sleep Health & Daily Performance Dataset
#Classification #Regression #HealthConditions
Synthetic sleep health dataset ⟶ benchmarks 3 prediction tasks
→ Learn more...

⚡ Trends

▸ Unified multimodal models increasingly merge visual understanding, generation, and editing end-to-end.
▸ Video generation methods add explicit 3D or geometry grounding for consistency.
▸ Specialized architectural priors target realism in controllable human and camera-centric video synthesis.

🧭 TL;DR

📄 World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
RL adds strong 3D-consistent video generation without architecture or inference changes

⭐ facex-engine/facex ⭐
Tiny local face verification runs fast on browser and CPU

💡 Vision models are converging toward unified, geometry-grounded, practically deployable generation systems.

via @Papers.Data.Code

7 views15:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #AgentSystems #FoundationModels

Heterogeneous Scientific Foundation Model Collaboration
👤 Zihao Li, Jiaru Zou, Feihao Fang et al.

🎯 Task
Heterogeneous scientific agent systems

💡 Idea
LLM-to-FM interfaces wrap specialized foundation models as agents: a query compiler creates structured calls, a response adapter feeds outputs back to reasoning, and a planner can orchestrate mixed LLM and FM agents.

✨ Why it's interesting
~7% higher utility, ~30% fewer tokens, and ~10% faster than single-LLM agents.

💻 Repo
⭐ Violet24K/Eywa — 18 stars

🔗 paper

via @Papers.Data.Code

GitHub - Violet24K/Eywa: Heterogeneous Scientific Foundation Model Collaboration

Heterogeneous Scientific Foundation Model Collaboration - Violet24K/Eywa

7 views08:00

Papers.Data.Code

📊 Dataset #Dataset #LLM #CodingAgent #AgentTraces

SWE-chat
👤 SALT-NLP

🎯 Task
AI coding session modeling

💡 Idea
205+ repositories of real developer–AI coding sessions with full chat transcripts, tool calls, thinking traces, code changes, and authorship attribution between humans and agents.

✨ Why it's interesting
Combines interaction traces with code edits and authorship labels, enabling study of real human-agent coding workflows.

Size: 205+ repositories

Downloads: 1.5k | Likes: 34

🔗 dataset

via @Papers.Data.Code

6 views10:00

Papers.Data.Code

💻 Repo #Repo #LLM #DiscreteDiffusion #Distillation

Tide
👤 PKU-YuanGroup

🎯 Task
Diffusion LLM distillation

💡 Idea
Distill large diffusion LLM teachers into a 0.6B student even when teacher and student differ in architecture, attention, and tokenizer, with released training scripts, checkpoints, datasets, and 8-benchmark evaluation.

✨ Why it's interesting
+1.53 avg over BD3LM, +16.48 HumanEval over AR, 22× lower peak memory, and 5.2× faster inference.

💻 Repo
⭐ PKU-YuanGroup/TIDE — 64 stars (+62 3d)
Python

via @Papers.Data.Code

GitHub - PKU-YuanGroup/TIDE: Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models - PKU-YuanGroup/TIDE

7 views13:00

Papers.Data.Code

📄 Paper #Paper #Robotics #SLAM #OpenVocabulary

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
👤 Zaid Nasser, Mikhail Iumanov, Tianhao Li et al.

🎯 Task
Open-vocabulary semantic SLAM

💡 Idea
Tightly coupled bundle adjustment fuses dense RADIO/RADSeg vision-language embeddings with geometry, plus temporally adaptive robust kernels to down-weight moving or displaced objects.

✨ Why it's interesting
Best average ATE on dynamic TUM-RGBD: 1.63 cm; top-3 on Replica semantic mapping.

💻 Repo
⭐ be2rlab/RADIO-ViPE — 74 stars

🔗 paper

via @Papers.Data.Code

GitHub - be2rlab/RADIO-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM

Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM - be2rlab/RADIO-ViPE

6 views16:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #VideoGeneration #DiffusionModels

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
👤 Houyuan Chen, Hong Li, Xianghao Kong et al.

🎯 Task
Multimodal video generation

💡 Idea
Stochastic condition masking enables omni-directional generation; decoupled gated LoRA adds per-modality adapters only for targets; cross-modal self-attention shares keys/values across modalities for alignment.

✨ Why it's interesting
Competitive with state of the art across tasks; robust in-the-wild with <1k training videos.

💻 Repo
⭐ houyuanchen111/UniVidX — 44 stars

🔗 paper

via @Papers.Data.Code

GitHub - houyuanchen111/UniVidX: [SIGGRAPH 2026 / TOG] Official code of the paper "UniVidX: A Unified Multimodal Framework for…

[SIGGRAPH 2026 / TOG] Official code of the paper "UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors". - houyuanchen111/UniVidX

6 views08:00

Papers.Data.Code

📊 Dataset #Dataset #TimeSeries #GlobalHealth #CountryLevel

WHO Global Health Indicators for Prediction
👤 patelris

🎯 Task
Global health time series forecasting

💡 Idea
100k+ country-year health records across wide, long, latest-value, and metadata tables: 200+ countries, 2000-2024, 43 indicator definitions, demographics, mortality, spending, immunization, nutrition, and GDP.

✨ Why it's interesting
Wide + long formats and country metadata support cross-country trend analysis, forecasting, and dashboarding.

Size: 100k+ data points; 5,275 rows main table

Downloads: 284 | Likes: 26

🔗 dataset

via @Papers.Data.Code

Health Indicators Dataset for Forecasting (WHO)

150+ countries with 60 years of health, mortality & development data

6 views10:00

Papers.Data.Code

📄 Paper #Paper #CV #MotionCapture #PoseEstimation

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
👤 Kehong Gong, Zhengyu Wen, Dao Thien Phong et al.

🎯 Task
Arbitrary-skeleton motion capture

💡 Idea
Learnable Video-to-Pose-to-Rotation pipeline with GL-GMHA attention. A reference pose-rotation pair plus rest pose anchors each asset's coordinate system, making pose-to-rotation prediction learnable and end-to-end.

✨ Why it's interesting
Cuts rotation error from ~17° to ~10°, reaches 6.54° on unseen skeletons, and runs ~20× faster.

💻 Repo
⭐ animotionlab26/MocapAnything — 166 stars

🔗 paper

via @Papers.Data.Code

GitHub - animotionlab26/MocapAnything

Contribute to animotionlab26/MocapAnything development by creating an account on GitHub.

6 views13:00

Papers.Data.Code

💻 Repo #Repo #CV #3DReconstruction #GaussianSplatting

Gen Wild Splat
👤 Vinayak-VG

🎯 Task
Sparse-view 3D reconstruction from unconstrained images

💡 Idea
Reconstructs a 3D Gaussian splat from 2-6 unposed photos, jointly estimating camera poses, depth, and appearance while masking transient objects and optionally refining renderings for multi-view consistency.

✨ Why it's interesting
Produces a 3D Gaussian splat in roughly 3 seconds on a single A6000 GPU.

💻 Repo
⭐ Vinayak-VG/GenWildSplat — 24 stars (+24 3d)
Python

via @Papers.Data.Code

GitHub - Vinayak-VG/GenWildSplat: [CVPR 2026] GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

[CVPR 2026] GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images - Vinayak-VG/GenWildSplat

5 views16:00

Papers.Data.Code

📄 Paper #Paper #Robotics #VisionLanguageAction #EmbodiedReasoning

MolmoAct2: Action Reasoning Models for Real-world Deployment
👤 Haoquan Fang, Jiafei Duan, Donovan Clay et al.

🎯 Task
Vision-language-action robot control

💡 Idea
Embodied-reasoning VLM + flow-matching action expert via per-layer KV-cache conditioning, plus adaptive depth tokens that update only changed regions to cut reasoning latency.

✨ Why it's interesting
Beats strong VLA baselines incl. π0.5 on 7 benchmarks; Molmo2-ER gets 63.8% avg on 13 ER benchmarks.

💻 Repo
⭐ allenai/molmoact2 — 30 stars

🔗 paper

via @Papers.Data.Code

GitHub - allenai/molmoact2: Official Repository for MolmoAct2

Official Repository for MolmoAct2. Contribute to allenai/molmoact2 development by creating an account on GitHub.

7 views08:00

Papers.Data.Code

📊 Dataset #Dataset #Tabular #QuantumChemistry #AtomizationEnergy

MSR-ACC/TAE25
👤 microsoft

🎯 Task
Molecular property prediction

💡 Idea
73,040 QCSchema molecular records with CCSD(T)/CBS total atomization energies via W1-F12, covering closed-shell neutral equilibrium molecules with up to 5 non-hydrogen atoms from elements up to argon, plus geometry, graphs, and related energy fields.

✨ Why it's interesting
Large, accurate, chemically diverse labels enable broad benchmarking and training beyond typical organic-only sets.

Size: 73,040 molecules

Downloads: 360 | Likes: 4

🔗 dataset

via @Papers.Data.Code

microsoft/msr-acc-tae25 · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

6 views10:00