Papers.Data.Code
18 subscribers
101 links
Only meaningful ML signals: papers, repos & datasets. Selected, not collected. 3–4 posts/day. πŸ“„πŸ’»πŸ“Š
papers.data.code@gmail.com
Download Telegram
πŸ’» Repo #Repo #CV #FaceVerification #Webassembly

Face X
πŸ‘€ facex-engine

🎯 Task
Face verification

πŸ’‘ Idea
Local face embedding engine for browser, C, Go, Python, and CLI. It computes 512-d embeddings and cosine similarity, with no dependencies, optional encrypted weights, and SIMD-optimized CPU inference.

✨ Why it's interesting
Claims 3.0 ms native latency, 99.73% LFW accuracy, and 1.30x faster inference than ONNX Runtime.

πŸ’» Repo
⭐ facex-engine/facex β€” 82 stars (+82 3d)
C

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #TextToVideo #ReinforcementLearning

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
πŸ‘€ Weijie Wang, Xiaoxuan He, Youping Gu et al.

🎯 Task
3D-consistent text-to-video generation

πŸ’‘ Idea
Flow-GRPO fine-tunes a video model with rewards from 3D reconstruction, meta-view VLM scoring, trajectory alignment, and aesthetics; camera motion is injected by warping latent noise instead of adding control modules.

✨ Why it's interesting
Improves 3D consistency by 10.23 dB and 7.91 dB PSNR while preserving general video quality.

πŸ’» Repo
⭐ microsoft/World-R1 β€” 197 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #NLP #CompetitionMath #Multimodal

MathNet v0 β€” Olympiad Math Reasoning & Retrieval
πŸ‘€ ShadenA

🎯 Task
Olympiad math reasoning and retrieval

πŸ’‘ Idea
27,817 problems in v0 from 58 country/regional configs, with problem markdown, official solutions, topic paths, language, provenance, and 7,541 inline images; sourced from official booklets across 47 countries and 17 languages.

✨ Why it's interesting
30K-scale multilingual expert data enables hard reasoning, retrieval, and RAG evaluation beyond small English-only math sets.

Size: 27,817 problems, 7,541 images, 58 configs

Downloads: 9.3k | Likes: 26

πŸ”— dataset πŸ”— paper πŸ”— repo

via @Papers.Data.Code
πŸ’» Repo #Repo #Cpp #Cpp #Gguf

Llama Cpp Deep Seek V4 Flash
πŸ‘€ antirez

🎯 Task
Local LLM inference

πŸ’‘ Idea
DeepSeek v4 Flash support in llama.cpp with generated GGUFs using 2-bit quantization of routed experts, targeting MacBooks with 128GB RAM; works with CPU and Metal backends.

✨ Why it's interesting
Targets 128GB MacBooks for local DSv4 inference; Metal backend is faster than CPU.

πŸ’» Repo
⭐ antirez/llama.cpp-deepseek-v4-flash β€” 124 stars (+124 3d)
C++

πŸ”— paper πŸ”— paper πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #CV #MultimodalLearning #ImageGeneration

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
πŸ‘€ Zhiheng Liu, Weiming Ren, Xiaoke Huang et al.

🎯 Task
Unified multimodal understanding and generation

πŸ’‘ Idea
Direct patch embeddings replace VAE and representation encoders, so one transformer handles text, images, and pixel-space generation end to end. A masking-based visual feature learning scheme stabilizes training and improves pixel-space representations.

✨ Why it's interesting
At 7B, it reaches SOTA among native UMMs on understanding and stays competitive on generation.

πŸ’» Repo
⭐ facebookresearch/tuna-2 β€” 139 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #MultiAgentSystems #MultiAgentSystems #Reasoning

Recursive Multi-Agent Systems
πŸ‘€ Xiyuan Yang, Jiaru Zou, Rui Pan et al.

🎯 Task
Multi-agent LLM reasoning

πŸ’‘ Idea
Latent-state recursion across agents via lightweight RecursiveLink modules β€” agents pass and refine hidden states in a loop, with inner-outer training for whole-system credit assignment instead of text-based coordination.

✨ Why it's interesting
Avg +8.3% accuracy, 1.2-2.4x faster inference, and 34.6-75.6% fewer tokens vs baselines.

πŸ’» Repo
⭐ RecursiveMAS/RecursiveMAS β€” 30 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #AudioReasoning #AudioReasoning #Rlhf

Step-Audio-R1.5 Technical Report
πŸ‘€ Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu et al.

🎯 Task
Audio reasoning for multi-turn spoken dialogue

πŸ’‘ Idea
RLHF with a rubric-guided generated reward model compares responses in multi-turn audio chats, optimizing naturalness, coherence, and instruction retention beyond label-only RLVR.

✨ Why it's interesting
77.97 avg across 8 benchmarks, +5.47 over Step-Audio-R1; 41.15 on Audio MC.

πŸ’» Repo
⭐ stepfun-ai/Step-Audio-R1 β€” 647 stars

πŸ”— paper πŸ”— dataset

via @Papers.Data.Code
πŸ”₯1
πŸ’» Repo #Repo #TestDrivenDevelopment #TestDrivenDevelopment #CodingAgents

Evan Flow
πŸ‘€ evanklem

🎯 Task
AI-assisted software development workflow

πŸ’‘ Idea
Single-entry workflow for Claude Code that orchestrates brainstorm β†’ plan β†’ execute β†’ iterate, with vertical-slice TDD inside coding tasks, optional parallel coder/overseer subagents, and a hook blocking dangerous git commands.

✨ Why it's interesting
Keeps users in control with approval checkpoints, no auto-commits, and blocked destructive git ops.

πŸ’» Repo
⭐ evanklem/evanflow β€” 356 stars (+356 3d)
Shell

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #InstructionTuning #InstructionTuning #KnowledgeGraphs

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
πŸ‘€ Chenkai Pan, Xinglong Xu, Yuhang Xu et al.

🎯 Task
Domain-specific LLM fine-tuning

πŸ’‘ Idea
Shared L1 concepts, L2 relations, and L3 reasoning chains drive both SFT data and benchmarks; failures are traced to concept gaps or reasoning deficits and repaired with targeted data patches.

✨ Why it's interesting
Across 16 disciplines, one debug round let a 32B model beat GPT-5.4, Gemini-3-flash, and DeepSeek-v3.2.

πŸ’» Repo
⭐ OpenRaiser/ProDa β€” 43 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #MultimodalAgents #MultimodalAgents #ToolUse

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
πŸ‘€ V Team, Wenyi Hong, Xiaotao Gu et al.

🎯 Task
Multimodal agent foundation model

πŸ’‘ Idea
Native multimodal agent model with CogViT and multimodal multi-token prediction using <|image|> placeholders, plus joint RL over 30+ perception, reasoning, coding, and GUI tasks for end-to-end tool use.

✨ Why it's interesting
Scores 94.8 on Design2Code and 75.7 on AndroidWorld; RL adds +4.9 on OSWorld.

πŸ’» Repo
⭐ zai-org/GLM-V β€” 2.3k stars
⭐ zai-org/ImageMining β€” 2.3k stars
⭐ zai-org/GLM-skills β€” 2.3k stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #RubricBasedEvaluation #RubricBasedEvaluation #PhysicianWritten

HealthBench Professional
πŸ‘€ openai

🎯 Task
Clinical response evaluation

πŸ’‘ Idea
Structured medical eval examples with conversations, physician responses, and scored rubric items, labeled by use case, red-teaming vs good-faith, difficulty, and specialty.

✨ Why it's interesting
Physician answers plus rubrics enable consistent scoring of model performance across clinical use cases.

Downloads: 5.7k | Likes: 43

πŸ”— dataset πŸ”— repo

via @Papers.Data.Code
πŸ’» Repo #Repo #Gbnf #Gbnf #LlamaCpp

Structured Cot
πŸ‘€ andthattoo

🎯 Task
Reasoning token compression for code generation

πŸ’‘ Idea
Constrain a model's thinking into short structured fields like GOAL/APPROACH/EDGE or GOAL/STATE/ALGO/EDGE/VERIFY at inference time, then let it generate code normally to reduce verbose CoT and compare free vs constrained runs.

✨ Why it's interesting
No training; 22.4Γ— fewer think tokens on HumanEval+ and +14 pp pass@1 on LiveCodeBench.

πŸ’» Repo
⭐ andthattoo/structured-cot β€” 196 stars (+154 3d)
Python

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #LLM #MultiAgentSystems #AgentOrchestration

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
πŸ‘€ Zhengxu Yu, Yu Fu, Zhiyuan He et al.

🎯 Task
Multi-agent organization and coordination

πŸ’‘ Idea
Talent-Container architecture separates agent identity from runtime, while a Talent Market recruits verified agents on demand and E2R tree search plans, executes, and reviews tasks with formal guarantees.

✨ Why it's interesting
84.67% success on PRDBench, beating prior SOTA by 15.48 points.

πŸ’» Repo
⭐ 1mancompany/OneManCompany β€” 119 stars

πŸ”— paper

via @Papers.Data.Code
πŸ”₯1
πŸ“‹ Weekly Digest | Apr 25 – May 02
#WeeklyDigest

πŸ“„ Papers

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video fine-tuning ⟢ boosts 3D consistency PSNR
β†’ Learn more...

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
#MultimodalAgents #ToolUse #ReinforcementLearning
Multimodal agent model ⟢ tool use and GUI interaction
β†’ Learn more...

Recursive Multi-Agent Systems
#MultiAgentSystems #Reasoning #LatentSpace
RecursiveMAS latent recursion ⟢ +8.3% accuracy, fewer tokens
β†’ Learn more...

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings ⟢ unified multimodal understanding generation
β†’ Learn more...

Step-Audio-R1.5 Technical Report
#AudioReasoning #Rlhf #SpokenDialogue
Rubric-guided audio RLHF ⟢ improves long-turn spoken dialogue
β†’ Learn more...

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
#MultiAgentSystems #AgentOrchestration #TaskPlanning
OMC multi-agent orchestration ⟢ 84.67% on PRDBench
β†’ Learn more...

πŸ’» Repos

andthattoo/structured-cot ⭐
#Gbnf #LlamaCpp #CodeBenchmarks
Grammar-constrained CoT ⟢ 22.4Γ— fewer think tokens
β†’ Learn more...

deepseek-ai/TileKernels ⭐
#MixtureOfExperts #Quantization #GpuKernels
TileLang GPU kernels ⟢ LLM ops near hardware limits
β†’ Learn more...

antirez/llama.cpp-deepseek-v4-flash ⭐
#Cpp #Gguf #Quantization
llama.cpp DeepSeek v4 Flash ⟢ local MacBook inference
β†’ Learn more...

πŸ“Š Datasets

HΒ³D: High-quality Holistic 3D Editing Dataset
#3DEditing #InstructionFollowing #PartLevel
3D editing dataset ⟢ trains part-level 3D editors
β†’ Learn more...

MathNet v0 β€” Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟢ reasoning and retrieval benchmark
β†’ Learn more...

➑️ Tomorrow β€” Computer Vision Monthly

via @Papers.Data.Code
πŸ“ˆ Monthly: Computer Vision | Apr 03 – May 03
#MonthlyDigest #CV

πŸ“„ Papers

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video tuning ⟢ boosts 3D consistency PSNR
β†’ Learn more...

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM ⟢ unifies multimodal understanding and generation
β†’ Learn more...

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings ⟢ unified multimodal understanding generation
β†’ Learn more...

Vista4D: Video Reshooting with 4D Point Clouds
#NovelViewSynthesis #VideoDiffusion #3DReconstruction
4D point cloud reshooting ⟢ best camera and 3D consistency
β†’ Learn more...

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
Diffusion HOI video synthesis ⟢ improves stability and contact realism
β†’ Learn more...

πŸ’» Repos

facex-engine/facex ⭐
#FaceVerification #Webassembly #CpuInference
Local face embeddings ⟢ browser CPU verification
β†’ Learn more...

πŸ“Š Datasets

Sleep Health & Daily Performance Dataset
#Classification #Regression #HealthConditions
Synthetic sleep health dataset ⟢ benchmarks 3 prediction tasks
β†’ Learn more...

⚑ Trends

β–Έ Unified multimodal models increasingly merge visual understanding, generation, and editing end-to-end.
β–Έ Video generation methods add explicit 3D or geometry grounding for consistency.
β–Έ Specialized architectural priors target realism in controllable human and camera-centric video synthesis.

🧭 TL;DR

πŸ“„ World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
RL adds strong 3D-consistent video generation without architecture or inference changes

⭐ facex-engine/facex ⭐
Tiny local face verification runs fast on browser and CPU

πŸ’‘ Vision models are converging toward unified, geometry-grounded, practically deployable generation systems.

via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #AgentSystems #FoundationModels

Heterogeneous Scientific Foundation Model Collaboration
πŸ‘€ Zihao Li, Jiaru Zou, Feihao Fang et al.

🎯 Task
Heterogeneous scientific agent systems

πŸ’‘ Idea
LLM-to-FM interfaces wrap specialized foundation models as agents: a query compiler creates structured calls, a response adapter feeds outputs back to reasoning, and a planner can orchestrate mixed LLM and FM agents.

✨ Why it's interesting
~7% higher utility, ~30% fewer tokens, and ~10% faster than single-LLM agents.

πŸ’» Repo
⭐ Violet24K/Eywa β€” 18 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #LLM #CodingAgent #AgentTraces

SWE-chat
πŸ‘€ SALT-NLP

🎯 Task
AI coding session modeling

πŸ’‘ Idea
205+ repositories of real developer–AI coding sessions with full chat transcripts, tool calls, thinking traces, code changes, and authorship attribution between humans and agents.

✨ Why it's interesting
Combines interaction traces with code edits and authorship labels, enabling study of real human-agent coding workflows.

Size: 205+ repositories

Downloads: 1.5k | Likes: 34

πŸ”— dataset

via @Papers.Data.Code
πŸ’» Repo #Repo #LLM #DiscreteDiffusion #Distillation

Tide
πŸ‘€ PKU-YuanGroup

🎯 Task
Diffusion LLM distillation

πŸ’‘ Idea
Distill large diffusion LLM teachers into a 0.6B student even when teacher and student differ in architecture, attention, and tokenizer, with released training scripts, checkpoints, datasets, and 8-benchmark evaluation.

✨ Why it's interesting
+1.53 avg over BD3LM, +16.48 HumanEval over AR, 22Γ— lower peak memory, and 5.2Γ— faster inference.

πŸ’» Repo
⭐ PKU-YuanGroup/TIDE β€” 64 stars (+62 3d)
Python


via @Papers.Data.Code
πŸ“„ Paper #Paper #Robotics #SLAM #OpenVocabulary

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
πŸ‘€ Zaid Nasser, Mikhail Iumanov, Tianhao Li et al.

🎯 Task
Open-vocabulary semantic SLAM

πŸ’‘ Idea
Tightly coupled bundle adjustment fuses dense RADIO/RADSeg vision-language embeddings with geometry, plus temporally adaptive robust kernels to down-weight moving or displaced objects.

✨ Why it's interesting
Best average ATE on dynamic TUM-RGBD: 1.63 cm; top-3 on Replica semantic mapping.

πŸ’» Repo
⭐ be2rlab/RADIO-ViPE β€” 74 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“„ Paper #Paper #Multimodal #VideoGeneration #DiffusionModels

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
πŸ‘€ Houyuan Chen, Hong Li, Xianghao Kong et al.

🎯 Task
Multimodal video generation

πŸ’‘ Idea
Stochastic condition masking enables omni-directional generation; decoupled gated LoRA adds per-modality adapters only for targets; cross-modal self-attention shares keys/values across modalities for alignment.

✨ Why it's interesting
Competitive with state of the art across tasks; robust in-the-wild with <1k training videos.

πŸ’» Repo
⭐ houyuanchen111/UniVidX β€” 44 stars

πŸ”— paper

via @Papers.Data.Code
πŸ“Š Dataset #Dataset #TimeSeries #GlobalHealth #CountryLevel

WHO Global Health Indicators for Prediction
πŸ‘€ patelris

🎯 Task
Global health time series forecasting

πŸ’‘ Idea
100k+ country-year health records across wide, long, latest-value, and metadata tables: 200+ countries, 2000-2024, 43 indicator definitions, demographics, mortality, spending, immunization, nutrition, and GDP.

✨ Why it's interesting
Wide + long formats and country metadata support cross-country trend analysis, forecasting, and dashboarding.

Size: 100k+ data points; 5,275 rows main table

Downloads: 284 | Likes: 26

πŸ”— dataset

via @Papers.Data.Code