π Dataset #Dataset #RubricBasedEvaluation #RubricBasedEvaluation #PhysicianWritten
HealthBench Professional
π€ openai
π― Task
Clinical response evaluation
π‘ Idea
Structured medical eval examples with conversations, physician responses, and scored rubric items, labeled by use case, red-teaming vs good-faith, difficulty, and specialty.
β¨ Why it's interesting
Physician answers plus rubrics enable consistent scoring of model performance across clinical use cases.
Downloads: 5.7k | Likes: 43
π dataset π repo
via @Papers.Data.Code
HealthBench Professional
π€ openai
π― Task
Clinical response evaluation
π‘ Idea
Structured medical eval examples with conversations, physician responses, and scored rubric items, labeled by use case, red-teaming vs good-faith, difficulty, and specialty.
β¨ Why it's interesting
Physician answers plus rubrics enable consistent scoring of model performance across clinical use cases.
Downloads: 5.7k | Likes: 43
π dataset π repo
via @Papers.Data.Code
huggingface.co
openai/healthbench-professional Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #Gbnf #Gbnf #LlamaCpp
Structured Cot
π€ andthattoo
π― Task
Reasoning token compression for code generation
π‘ Idea
Constrain a model's thinking into short structured fields like GOAL/APPROACH/EDGE or GOAL/STATE/ALGO/EDGE/VERIFY at inference time, then let it generate code normally to reduce verbose CoT and compare free vs constrained runs.
β¨ Why it's interesting
No training; 22.4Γ fewer think tokens on HumanEval+ and +14 pp pass@1 on LiveCodeBench.
π» Repo
β andthattoo/structured-cot β 196 stars (+154 3d)
Python
π paper
via @Papers.Data.Code
Structured Cot
π€ andthattoo
π― Task
Reasoning token compression for code generation
π‘ Idea
Constrain a model's thinking into short structured fields like GOAL/APPROACH/EDGE or GOAL/STATE/ALGO/EDGE/VERIFY at inference time, then let it generate code normally to reduce verbose CoT and compare free vs constrained runs.
β¨ Why it's interesting
No training; 22.4Γ fewer think tokens on HumanEval+ and +14 pp pass@1 on LiveCodeBench.
π» Repo
β andthattoo/structured-cot β 196 stars (+154 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - andthattoo/structured-cot: Structured Chain-of-Thought
Structured Chain-of-Thought. Contribute to andthattoo/structured-cot development by creating an account on GitHub.
π Paper #Paper #LLM #MultiAgentSystems #AgentOrchestration
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
π€ Zhengxu Yu, Yu Fu, Zhiyuan He et al.
π― Task
Multi-agent organization and coordination
π‘ Idea
Talent-Container architecture separates agent identity from runtime, while a Talent Market recruits verified agents on demand and E2R tree search plans, executes, and reviews tasks with formal guarantees.
β¨ Why it's interesting
84.67% success on PRDBench, beating prior SOTA by 15.48 points.
π» Repo
β 1mancompany/OneManCompany β 119 stars
π paper
via @Papers.Data.Code
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
π€ Zhengxu Yu, Yu Fu, Zhiyuan He et al.
π― Task
Multi-agent organization and coordination
π‘ Idea
Talent-Container architecture separates agent identity from runtime, while a Talent Market recruits verified agents on demand and E2R tree search plans, executes, and reviews tasks with formal guarantees.
β¨ Why it's interesting
84.67% success on PRDBench, beating prior SOTA by 15.48 points.
π» Repo
β 1mancompany/OneManCompany β 119 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - 1mancompany/OneManCompany: Build Your Agent Company with OMC
Build Your Agent Company with OMC. Contribute to 1mancompany/OneManCompany development by creating an account on GitHub.
π₯1
π Weekly Digest | Apr 25 β May 02
#WeeklyDigest
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video fine-tuning βΆ boosts 3D consistency PSNR
β Learn more...
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
#MultimodalAgents #ToolUse #ReinforcementLearning
Multimodal agent model βΆ tool use and GUI interaction
β Learn more...
Recursive Multi-Agent Systems
#MultiAgentSystems #Reasoning #LatentSpace
RecursiveMAS latent recursion βΆ +8.3% accuracy, fewer tokens
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Step-Audio-R1.5 Technical Report
#AudioReasoning #Rlhf #SpokenDialogue
Rubric-guided audio RLHF βΆ improves long-turn spoken dialogue
β Learn more...
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
#MultiAgentSystems #AgentOrchestration #TaskPlanning
OMC multi-agent orchestration βΆ 84.67% on PRDBench
β Learn more...
π» Repos
andthattoo/structured-cot β
#Gbnf #LlamaCpp #CodeBenchmarks
Grammar-constrained CoT βΆ 22.4Γ fewer think tokens
β Learn more...
deepseek-ai/TileKernels β
#MixtureOfExperts #Quantization #GpuKernels
TileLang GPU kernels βΆ LLM ops near hardware limits
β Learn more...
antirez/llama.cpp-deepseek-v4-flash β
#Cpp #Gguf #Quantization
llama.cpp DeepSeek v4 Flash βΆ local MacBook inference
β Learn more...
π Datasets
HΒ³D: High-quality Holistic 3D Editing Dataset
#3DEditing #InstructionFollowing #PartLevel
3D editing dataset βΆ trains part-level 3D editors
β Learn more...
MathNet v0 β Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset βΆ reasoning and retrieval benchmark
β Learn more...
β‘οΈ Tomorrow β Computer Vision Monthly
via @Papers.Data.Code
#WeeklyDigest
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video fine-tuning βΆ boosts 3D consistency PSNR
β Learn more...
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
#MultimodalAgents #ToolUse #ReinforcementLearning
Multimodal agent model βΆ tool use and GUI interaction
β Learn more...
Recursive Multi-Agent Systems
#MultiAgentSystems #Reasoning #LatentSpace
RecursiveMAS latent recursion βΆ +8.3% accuracy, fewer tokens
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Step-Audio-R1.5 Technical Report
#AudioReasoning #Rlhf #SpokenDialogue
Rubric-guided audio RLHF βΆ improves long-turn spoken dialogue
β Learn more...
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
#MultiAgentSystems #AgentOrchestration #TaskPlanning
OMC multi-agent orchestration βΆ 84.67% on PRDBench
β Learn more...
π» Repos
andthattoo/structured-cot β
#Gbnf #LlamaCpp #CodeBenchmarks
Grammar-constrained CoT βΆ 22.4Γ fewer think tokens
β Learn more...
deepseek-ai/TileKernels β
#MixtureOfExperts #Quantization #GpuKernels
TileLang GPU kernels βΆ LLM ops near hardware limits
β Learn more...
antirez/llama.cpp-deepseek-v4-flash β
#Cpp #Gguf #Quantization
llama.cpp DeepSeek v4 Flash βΆ local MacBook inference
β Learn more...
π Datasets
HΒ³D: High-quality Holistic 3D Editing Dataset
#3DEditing #InstructionFollowing #PartLevel
3D editing dataset βΆ trains part-level 3D editors
β Learn more...
MathNet v0 β Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset βΆ reasoning and retrieval benchmark
β Learn more...
β‘οΈ Tomorrow β Computer Vision Monthly
via @Papers.Data.Code
π Monthly: Computer Vision | Apr 03 β May 03
#MonthlyDigest #CV
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video tuning βΆ boosts 3D consistency PSNR
β Learn more...
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM βΆ unifies multimodal understanding and generation
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Vista4D: Video Reshooting with 4D Point Clouds
#NovelViewSynthesis #VideoDiffusion #3DReconstruction
4D point cloud reshooting βΆ best camera and 3D consistency
β Learn more...
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
Diffusion HOI video synthesis βΆ improves stability and contact realism
β Learn more...
π» Repos
facex-engine/facex β
#FaceVerification #Webassembly #CpuInference
Local face embeddings βΆ browser CPU verification
β Learn more...
π Datasets
Sleep Health & Daily Performance Dataset
#Classification #Regression #HealthConditions
Synthetic sleep health dataset βΆ benchmarks 3 prediction tasks
β Learn more...
β‘ Trends
βΈ Unified multimodal models increasingly merge visual understanding, generation, and editing end-to-end.
βΈ Video generation methods add explicit 3D or geometry grounding for consistency.
βΈ Specialized architectural priors target realism in controllable human and camera-centric video synthesis.
π§ TL;DR
π World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
RL adds strong 3D-consistent video generation without architecture or inference changes
β facex-engine/facex β
Tiny local face verification runs fast on browser and CPU
π‘ Vision models are converging toward unified, geometry-grounded, practically deployable generation systems.
via @Papers.Data.Code
#MonthlyDigest #CV
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video tuning βΆ boosts 3D consistency PSNR
β Learn more...
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM βΆ unifies multimodal understanding and generation
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Vista4D: Video Reshooting with 4D Point Clouds
#NovelViewSynthesis #VideoDiffusion #3DReconstruction
4D point cloud reshooting βΆ best camera and 3D consistency
β Learn more...
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
Diffusion HOI video synthesis βΆ improves stability and contact realism
β Learn more...
π» Repos
facex-engine/facex β
#FaceVerification #Webassembly #CpuInference
Local face embeddings βΆ browser CPU verification
β Learn more...
π Datasets
Sleep Health & Daily Performance Dataset
#Classification #Regression #HealthConditions
Synthetic sleep health dataset βΆ benchmarks 3 prediction tasks
β Learn more...
β‘ Trends
βΈ Unified multimodal models increasingly merge visual understanding, generation, and editing end-to-end.
βΈ Video generation methods add explicit 3D or geometry grounding for consistency.
βΈ Specialized architectural priors target realism in controllable human and camera-centric video synthesis.
π§ TL;DR
π World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
RL adds strong 3D-consistent video generation without architecture or inference changes
β facex-engine/facex β
Tiny local face verification runs fast on browser and CPU
π‘ Vision models are converging toward unified, geometry-grounded, practically deployable generation systems.
via @Papers.Data.Code
π Paper #Paper #Multimodal #AgentSystems #FoundationModels
Heterogeneous Scientific Foundation Model Collaboration
π€ Zihao Li, Jiaru Zou, Feihao Fang et al.
π― Task
Heterogeneous scientific agent systems
π‘ Idea
LLM-to-FM interfaces wrap specialized foundation models as agents: a query compiler creates structured calls, a response adapter feeds outputs back to reasoning, and a planner can orchestrate mixed LLM and FM agents.
β¨ Why it's interesting
~7% higher utility, ~30% fewer tokens, and ~10% faster than single-LLM agents.
π» Repo
β Violet24K/Eywa β 18 stars
π paper
via @Papers.Data.Code
Heterogeneous Scientific Foundation Model Collaboration
π€ Zihao Li, Jiaru Zou, Feihao Fang et al.
π― Task
Heterogeneous scientific agent systems
π‘ Idea
LLM-to-FM interfaces wrap specialized foundation models as agents: a query compiler creates structured calls, a response adapter feeds outputs back to reasoning, and a planner can orchestrate mixed LLM and FM agents.
β¨ Why it's interesting
~7% higher utility, ~30% fewer tokens, and ~10% faster than single-LLM agents.
π» Repo
β Violet24K/Eywa β 18 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - Violet24K/Eywa: Heterogeneous Scientific Foundation Model Collaboration
Heterogeneous Scientific Foundation Model Collaboration - Violet24K/Eywa
π Dataset #Dataset #LLM #CodingAgent #AgentTraces
SWE-chat
π€ SALT-NLP
π― Task
AI coding session modeling
π‘ Idea
205+ repositories of real developerβAI coding sessions with full chat transcripts, tool calls, thinking traces, code changes, and authorship attribution between humans and agents.
β¨ Why it's interesting
Combines interaction traces with code edits and authorship labels, enabling study of real human-agent coding workflows.
Size: 205+ repositories
Downloads: 1.5k | Likes: 34
π dataset
via @Papers.Data.Code
SWE-chat
π€ SALT-NLP
π― Task
AI coding session modeling
π‘ Idea
205+ repositories of real developerβAI coding sessions with full chat transcripts, tool calls, thinking traces, code changes, and authorship attribution between humans and agents.
β¨ Why it's interesting
Combines interaction traces with code edits and authorship labels, enabling study of real human-agent coding workflows.
Size: 205+ repositories
Downloads: 1.5k | Likes: 34
π dataset
via @Papers.Data.Code
π» Repo #Repo #LLM #DiscreteDiffusion #Distillation
Tide
π€ PKU-YuanGroup
π― Task
Diffusion LLM distillation
π‘ Idea
Distill large diffusion LLM teachers into a 0.6B student even when teacher and student differ in architecture, attention, and tokenizer, with released training scripts, checkpoints, datasets, and 8-benchmark evaluation.
β¨ Why it's interesting
+1.53 avg over BD3LM, +16.48 HumanEval over AR, 22Γ lower peak memory, and 5.2Γ faster inference.
π» Repo
β PKU-YuanGroup/TIDE β 64 stars (+62 3d)
Python
via @Papers.Data.Code
Tide
π€ PKU-YuanGroup
π― Task
Diffusion LLM distillation
π‘ Idea
Distill large diffusion LLM teachers into a 0.6B student even when teacher and student differ in architecture, attention, and tokenizer, with released training scripts, checkpoints, datasets, and 8-benchmark evaluation.
β¨ Why it's interesting
+1.53 avg over BD3LM, +16.48 HumanEval over AR, 22Γ lower peak memory, and 5.2Γ faster inference.
π» Repo
β PKU-YuanGroup/TIDE β 64 stars (+62 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - PKU-YuanGroup/TIDE: Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models - PKU-YuanGroup/TIDE
π Paper #Paper #Robotics #SLAM #OpenVocabulary
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
π€ Zaid Nasser, Mikhail Iumanov, Tianhao Li et al.
π― Task
Open-vocabulary semantic SLAM
π‘ Idea
Tightly coupled bundle adjustment fuses dense RADIO/RADSeg vision-language embeddings with geometry, plus temporally adaptive robust kernels to down-weight moving or displaced objects.
β¨ Why it's interesting
Best average ATE on dynamic TUM-RGBD: 1.63 cm; top-3 on Replica semantic mapping.
π» Repo
β be2rlab/RADIO-ViPE β 74 stars
π paper
via @Papers.Data.Code
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
π€ Zaid Nasser, Mikhail Iumanov, Tianhao Li et al.
π― Task
Open-vocabulary semantic SLAM
π‘ Idea
Tightly coupled bundle adjustment fuses dense RADIO/RADSeg vision-language embeddings with geometry, plus temporally adaptive robust kernels to down-weight moving or displaced objects.
β¨ Why it's interesting
Best average ATE on dynamic TUM-RGBD: 1.63 cm; top-3 on Replica semantic mapping.
π» Repo
β be2rlab/RADIO-ViPE β 74 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - be2rlab/RADIO-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM
Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM - be2rlab/RADIO-ViPE
π Paper #Paper #Multimodal #VideoGeneration #DiffusionModels
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
π€ Houyuan Chen, Hong Li, Xianghao Kong et al.
π― Task
Multimodal video generation
π‘ Idea
Stochastic condition masking enables omni-directional generation; decoupled gated LoRA adds per-modality adapters only for targets; cross-modal self-attention shares keys/values across modalities for alignment.
β¨ Why it's interesting
Competitive with state of the art across tasks; robust in-the-wild with <1k training videos.
π» Repo
β houyuanchen111/UniVidX β 44 stars
π paper
via @Papers.Data.Code
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
π€ Houyuan Chen, Hong Li, Xianghao Kong et al.
π― Task
Multimodal video generation
π‘ Idea
Stochastic condition masking enables omni-directional generation; decoupled gated LoRA adds per-modality adapters only for targets; cross-modal self-attention shares keys/values across modalities for alignment.
β¨ Why it's interesting
Competitive with state of the art across tasks; robust in-the-wild with <1k training videos.
π» Repo
β houyuanchen111/UniVidX β 44 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - houyuanchen111/UniVidX: [SIGGRAPH 2026 / TOG] Official code of the paper "UniVidX: A Unified Multimodal Framework forβ¦
[SIGGRAPH 2026 / TOG] Official code of the paper "UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors". - houyuanchen111/UniVidX
π Dataset #Dataset #TimeSeries #GlobalHealth #CountryLevel
WHO Global Health Indicators for Prediction
π€ patelris
π― Task
Global health time series forecasting
π‘ Idea
100k+ country-year health records across wide, long, latest-value, and metadata tables: 200+ countries, 2000-2024, 43 indicator definitions, demographics, mortality, spending, immunization, nutrition, and GDP.
β¨ Why it's interesting
Wide + long formats and country metadata support cross-country trend analysis, forecasting, and dashboarding.
Size: 100k+ data points; 5,275 rows main table
Downloads: 284 | Likes: 26
π dataset
via @Papers.Data.Code
WHO Global Health Indicators for Prediction
π€ patelris
π― Task
Global health time series forecasting
π‘ Idea
100k+ country-year health records across wide, long, latest-value, and metadata tables: 200+ countries, 2000-2024, 43 indicator definitions, demographics, mortality, spending, immunization, nutrition, and GDP.
β¨ Why it's interesting
Wide + long formats and country metadata support cross-country trend analysis, forecasting, and dashboarding.
Size: 100k+ data points; 5,275 rows main table
Downloads: 284 | Likes: 26
π dataset
via @Papers.Data.Code
Kaggle
Health Indicators Dataset for Forecasting (WHO)
150+ countries with 60 years of health, mortality & development data
π Paper #Paper #CV #MotionCapture #PoseEstimation
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
π€ Kehong Gong, Zhengyu Wen, Dao Thien Phong et al.
π― Task
Arbitrary-skeleton motion capture
π‘ Idea
Learnable Video-to-Pose-to-Rotation pipeline with GL-GMHA attention. A reference pose-rotation pair plus rest pose anchors each asset's coordinate system, making pose-to-rotation prediction learnable and end-to-end.
β¨ Why it's interesting
Cuts rotation error from ~17Β° to ~10Β°, reaches 6.54Β° on unseen skeletons, and runs ~20Γ faster.
π» Repo
β animotionlab26/MocapAnything β 166 stars
π paper
via @Papers.Data.Code
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
π€ Kehong Gong, Zhengyu Wen, Dao Thien Phong et al.
π― Task
Arbitrary-skeleton motion capture
π‘ Idea
Learnable Video-to-Pose-to-Rotation pipeline with GL-GMHA attention. A reference pose-rotation pair plus rest pose anchors each asset's coordinate system, making pose-to-rotation prediction learnable and end-to-end.
β¨ Why it's interesting
Cuts rotation error from ~17Β° to ~10Β°, reaches 6.54Β° on unseen skeletons, and runs ~20Γ faster.
π» Repo
β animotionlab26/MocapAnything β 166 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - animotionlab26/MocapAnything
Contribute to animotionlab26/MocapAnything development by creating an account on GitHub.
π» Repo #Repo #CV #3DReconstruction #GaussianSplatting
Gen Wild Splat
π€ Vinayak-VG
π― Task
Sparse-view 3D reconstruction from unconstrained images
π‘ Idea
Reconstructs a 3D Gaussian splat from 2-6 unposed photos, jointly estimating camera poses, depth, and appearance while masking transient objects and optionally refining renderings for multi-view consistency.
β¨ Why it's interesting
Produces a 3D Gaussian splat in roughly 3 seconds on a single A6000 GPU.
π» Repo
β Vinayak-VG/GenWildSplat β 24 stars (+24 3d)
Python
via @Papers.Data.Code
Gen Wild Splat
π€ Vinayak-VG
π― Task
Sparse-view 3D reconstruction from unconstrained images
π‘ Idea
Reconstructs a 3D Gaussian splat from 2-6 unposed photos, jointly estimating camera poses, depth, and appearance while masking transient objects and optionally refining renderings for multi-view consistency.
β¨ Why it's interesting
Produces a 3D Gaussian splat in roughly 3 seconds on a single A6000 GPU.
π» Repo
β Vinayak-VG/GenWildSplat β 24 stars (+24 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - Vinayak-VG/GenWildSplat: [CVPR 2026] GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images
[CVPR 2026] GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images - Vinayak-VG/GenWildSplat
π Paper #Paper #Robotics #VisionLanguageAction #EmbodiedReasoning
MolmoAct2: Action Reasoning Models for Real-world Deployment
π€ Haoquan Fang, Jiafei Duan, Donovan Clay et al.
π― Task
Vision-language-action robot control
π‘ Idea
Embodied-reasoning VLM + flow-matching action expert via per-layer KV-cache conditioning, plus adaptive depth tokens that update only changed regions to cut reasoning latency.
β¨ Why it's interesting
Beats strong VLA baselines incl. Ο0.5 on 7 benchmarks; Molmo2-ER gets 63.8% avg on 13 ER benchmarks.
π» Repo
β allenai/molmoact2 β 30 stars
π paper
via @Papers.Data.Code
MolmoAct2: Action Reasoning Models for Real-world Deployment
π€ Haoquan Fang, Jiafei Duan, Donovan Clay et al.
π― Task
Vision-language-action robot control
π‘ Idea
Embodied-reasoning VLM + flow-matching action expert via per-layer KV-cache conditioning, plus adaptive depth tokens that update only changed regions to cut reasoning latency.
β¨ Why it's interesting
Beats strong VLA baselines incl. Ο0.5 on 7 benchmarks; Molmo2-ER gets 63.8% avg on 13 ER benchmarks.
π» Repo
β allenai/molmoact2 β 30 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - allenai/molmoact2: Official Repository for MolmoAct2
Official Repository for MolmoAct2. Contribute to allenai/molmoact2 development by creating an account on GitHub.
π Dataset #Dataset #Tabular #QuantumChemistry #AtomizationEnergy
MSR-ACC/TAE25
π€ microsoft
π― Task
Molecular property prediction
π‘ Idea
73,040 QCSchema molecular records with CCSD(T)/CBS total atomization energies via W1-F12, covering closed-shell neutral equilibrium molecules with up to 5 non-hydrogen atoms from elements up to argon, plus geometry, graphs, and related energy fields.
β¨ Why it's interesting
Large, accurate, chemically diverse labels enable broad benchmarking and training beyond typical organic-only sets.
Size: 73,040 molecules
Downloads: 360 | Likes: 4
π dataset
via @Papers.Data.Code
MSR-ACC/TAE25
π€ microsoft
π― Task
Molecular property prediction
π‘ Idea
73,040 QCSchema molecular records with CCSD(T)/CBS total atomization energies via W1-F12, covering closed-shell neutral equilibrium molecules with up to 5 non-hydrogen atoms from elements up to argon, plus geometry, graphs, and related energy fields.
β¨ Why it's interesting
Large, accurate, chemically diverse labels enable broad benchmarking and training beyond typical organic-only sets.
Size: 73,040 molecules
Downloads: 360 | Likes: 4
π dataset
via @Papers.Data.Code
huggingface.co
microsoft/msr-acc-tae25 Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #Multimodal #VisionEncoder #AutoregressivePretraining
GenLIP
π€ YanFangCS
π― Task
Generative vision-language pretraining
π‘ Idea
Pretrain a ViT-based vision encoder for MLLMs using a single Transformer and a single autoregressive language modeling objective, without contrastive loss, dual towers, or an extra text decoder.
β¨ Why it's interesting
Simplifies MLLM vision pretraining and reports particularly strong gains on Doc and OCR tasks.
π» Repo
β YanFangCS/GenLIP β 49 stars (+49 3d)
Python
π paper
via @Papers.Data.Code
GenLIP
π€ YanFangCS
π― Task
Generative vision-language pretraining
π‘ Idea
Pretrain a ViT-based vision encoder for MLLMs using a single Transformer and a single autoregressive language modeling objective, without contrastive loss, dual towers, or an extra text decoder.
β¨ Why it's interesting
Simplifies MLLM vision pretraining and reports particularly strong gains on Doc and OCR tasks.
π» Repo
β YanFangCS/GenLIP β 49 stars (+49 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - YanFangCS/GenLIP: Official repo for "Let ViT Speak: Generative Language-Image Pre-training"
Official repo for "Let ViT Speak: Generative Language-Image Pre-training" - YanFangCS/GenLIP
π Paper #Paper #Tabular #DiffusionModels #DecisionTrees
Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
π€ Sai Niranjan Ramachandran, Suvrit Sra
π― Task
Tabular generation and tree-to-network distillation
π‘ Idea
Treeβflow correspondence maps refined tree partitions to PF-ODEs and diffusion dynamics to hierarchies; GTSM unifies boosting and score matching. TREEFLOW conditions flows on tree paths, and DSM-TREE distills full tree decisions.
β¨ Why it's interesting
TREEFLOW is 2Γ faster; best TSTR on 3/5 and best Wasserstein on 4/5 benchmarks.
π paper
via @Papers.Data.Code
Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
π€ Sai Niranjan Ramachandran, Suvrit Sra
π― Task
Tabular generation and tree-to-network distillation
π‘ Idea
Treeβflow correspondence maps refined tree partitions to PF-ODEs and diffusion dynamics to hierarchies; GTSM unifies boosting and score matching. TREEFLOW conditions flows on tree paths, and DSM-TREE distills full tree decisions.
β¨ Why it's interesting
TREEFLOW is 2Γ faster; best TSTR on 3/5 and best Wasserstein on 4/5 benchmarks.
π paper
via @Papers.Data.Code
arXiv.org
Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
Decision trees and diffusion models are ostensibly disparate model classes, one discrete and hierarchical, the other continuous and dynamic. This work unifies the two by establishing a crisp...
π Paper #Paper #LLM #SearchAgents #SupervisedFineTuning
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
π€ Yuwen Du, Rui Ye, Shuo Tang et al.
π― Task
LLM search agent training
π‘ Idea
High-difficulty trajectory synthesis for SFT: enlarge source graphs, expand the tool set, and filter out short trajectories to force longer multi-hop ReAct search without CPT or RL.
β¨ Why it's interesting
46.0 BrowseComp, 58.1 BC-ZH, 34.6 HLE, 78.0 xbench; beats Tongyi DeepResearch.
π» Repo
β PolarSeeker/OpenSeeker β 634 stars
π paper
via @Papers.Data.Code
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
π€ Yuwen Du, Rui Ye, Shuo Tang et al.
π― Task
LLM search agent training
π‘ Idea
High-difficulty trajectory synthesis for SFT: enlarge source graphs, expand the tool set, and filter out short trajectories to force longer multi-hop ReAct search without CPT or RL.
β¨ Why it's interesting
46.0 BrowseComp, 58.1 BC-ZH, 34.6 HLE, 78.0 xbench; beats Tongyi DeepResearch.
π» Repo
β PolarSeeker/OpenSeeker β 634 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - PolarSeeker/OpenSeeker: OpenSeeker: A search agent with open-source data and models
OpenSeeker: A search agent with open-source data and models - PolarSeeker/OpenSeeker
π Paper #Paper #Multimodal #ReinforcementLearning #KnowledgeDistillation
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
π€ Sudong Wang, Weiquan Huang, Xiaomin Yu et al.
π― Task
Multimodal reasoning post-training
π‘ Idea
Black-box adversarial on-policy distillation with an MoE discriminator separates perception and reasoning feedback, aligning post-SFT outputs to supervision before RL without teacher logits.
β¨ Why it's interesting
Boosts average accuracy over SFTβRLVR by +4.4 on 4B and +6.0 on 8B.
π» Repo
β XIAO4579/PRISM β 53 stars
π paper
via @Papers.Data.Code
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
π€ Sudong Wang, Weiquan Huang, Xiaomin Yu et al.
π― Task
Multimodal reasoning post-training
π‘ Idea
Black-box adversarial on-policy distillation with an MoE discriminator separates perception and reasoning feedback, aligning post-SFT outputs to supervision before RL without teacher logits.
β¨ Why it's interesting
Boosts average accuracy over SFTβRLVR by +4.4 on 4B and +6.0 on 8B.
π» Repo
β XIAO4579/PRISM β 53 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - XIAO4579/PRISM
Contribute to XIAO4579/PRISM development by creating an account on GitHub.
π» Repo #Repo #Multimodal #TextToImage #FlowMatching
Leap Align Code
π€ RockeyCoss
π― Task
Preference alignment for text-to-image flow matching models
π‘ Idea
Aligns flow-matching image generators with human preference rewards by replacing full-trajectory backpropagation with a two-step leap trajectory, so optimization can target any generation step during sampling.
β¨ Why it's interesting
Enables gradient propagation to any generation step while avoiding full-trajectory memory cost.
π» Repo
β RockeyCoss/LeapAlign_Code β 12 stars (+12 3d)
Python
π paper
via @Papers.Data.Code
Leap Align Code
π€ RockeyCoss
π― Task
Preference alignment for text-to-image flow matching models
π‘ Idea
Aligns flow-matching image generators with human preference rewards by replacing full-trajectory backpropagation with a two-step leap trajectory, so optimization can target any generation step during sampling.
β¨ Why it's interesting
Enables gradient propagation to any generation step while avoiding full-trajectory memory cost.
π» Repo
β RockeyCoss/LeapAlign_Code β 12 stars (+12 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - RockeyCoss/LeapAlign_Code: [CVPR2026] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Buildingβ¦
[CVPR2026] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories - RockeyCoss/LeapAlign_Code
π Paper #Paper #LLM #ContextLearning #MultiAgentSystems
From Context to Skills: Can Language Models Learn from Context Skillfully?
π€ Shuzheng Si, Haozhe Zhao, Yu Lei et al.
π― Task
Context learning for language models
π‘ Idea
Multi-agent self-play builds skills instead of updating weights: Challenger makes tasks/rubrics, Reasoner solves with evolving skills, Judge gives binary feedback, and Cross-time Replay picks the most generalizable skill set.
β¨ Why it's interesting
Improves CL-bench solving rates, e.g. GPT-4.1 11.1%β16.5% and GPT-5.1 21.2%β25.8%.
π» Repo
β S1s-Z/Ctx2Skill β 44 stars
π paper
via @Papers.Data.Code
From Context to Skills: Can Language Models Learn from Context Skillfully?
π€ Shuzheng Si, Haozhe Zhao, Yu Lei et al.
π― Task
Context learning for language models
π‘ Idea
Multi-agent self-play builds skills instead of updating weights: Challenger makes tasks/rubrics, Reasoner solves with evolving skills, Judge gives binary feedback, and Cross-time Replay picks the most generalizable skill set.
β¨ Why it's interesting
Improves CL-bench solving rates, e.g. GPT-4.1 11.1%β16.5% and GPT-5.1 21.2%β25.8%.
π» Repo
β S1s-Z/Ctx2Skill β 44 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - S1s-Z/Ctx2Skill: Code for "From Context to Skills: Can Language Models Learn from Context Skillfully? "
Code for "From Context to Skills: Can Language Models Learn from Context Skillfully? " - S1s-Z/Ctx2Skill
π₯1