📊 Dataset #Dataset #Multimodal #ImageGeneration #PermissiveLicense
gpic
👤 stanford-vision-lab
🎯 Task
Visual generation
💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
✨ Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.
Size: 100M images
Downloads: 187 | Likes: 4
🔗 dataset
via @Papers.Data.Code
gpic
👤 stanford-vision-lab
🎯 Task
Visual generation
💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
✨ Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.
Size: 100M images
Downloads: 187 | Likes: 4
🔗 dataset
via @Papers.Data.Code
📄 Paper #Paper #CV #3DGeneration #ArticulatedObjects
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
👤 Yunhan Yang, Chunshi Wang, Junliang Ye et al.
🎯 Task
Physics-grounded 3D asset generation
💡 Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.
✨ Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.
💻 Repo
⭐ HKU-MMLab/PhysForge — 44 stars
🔗 paper
via @Papers.Data.Code
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
👤 Yunhan Yang, Chunshi Wang, Junliang Ye et al.
🎯 Task
Physics-grounded 3D asset generation
💡 Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.
✨ Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.
💻 Repo
⭐ HKU-MMLab/PhysForge — 44 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - HKU-MMLab/PhysForge: [ICML 2026] PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
[ICML 2026] PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World - HKU-MMLab/PhysForge
📋 Weekly Digest | May 02 – May 09
#WeeklyDigest
📄 Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟶ SOTA ~30B search agents
→ Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟶ simulation-ready articulated assets
→ Learn more...
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟶ predicts animation-ready rotations
→ Learn more...
💻 Repos
PKU-YuanGroup/TIDE ⭐
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟶ 0.6B student, lower cost
→ Learn more...
Vinayak-VG/GenWildSplat ⭐
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟶ 3D Gaussian splat in 3s
→ Learn more...
YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...
📊 Datasets
SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟶ studies human-agent coding workflows
→ Learn more...
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...
➡️ Tomorrow — Multimodal & Agents Monthly
via @Papers.Data.Code
#WeeklyDigest
📄 Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories ⟶ SOTA ~30B search agents
→ Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation ⟶ simulation-ready articulated assets
→ Learn more...
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture ⟶ predicts animation-ready rotations
→ Learn more...
💻 Repos
PKU-YuanGroup/TIDE ⭐
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation ⟶ 0.6B student, lower cost
→ Learn more...
Vinayak-VG/GenWildSplat ⭐
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction ⟶ 3D Gaussian splat in 3s
→ Learn more...
YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...
📊 Datasets
SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset ⟶ studies human-agent coding workflows
→ Learn more...
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...
➡️ Tomorrow — Multimodal & Agents Monthly
via @Papers.Data.Code
📈 Monthly: Multimodal & Agents | Apr 10 – May 10
#MonthlyDigest #Multimodal #Agents
📄 Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟶ boosts accuracy over SFT→RLVR
→ Learn more...
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟶ multimodal pixel-aligned generation
→ Learn more...
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟶ open-vocabulary 3D in dynamics
→ Learn more...
💻 Repos
YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...
RockeyCoss/LeapAlign_Code ⭐
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟶ preference-aligns flow-matching T2I
→ Learn more...
📊 Datasets
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...
MathNet v0 — Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟶ reasoning and retrieval benchmark
→ Learn more...
⚡ Trends
▸ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
▸ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
▸ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.
🧭 TL;DR
📄 MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.
💡 Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.
via @Papers.Data.Code
#MonthlyDigest #Multimodal #Agents
📄 Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model ⟶ beats VLA baselines on 7 benchmarks
→ Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL ⟶ +13.8 points on 7 benchmarks
→ Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface ⟶ scientific tasks on structured data
→ Learn more...
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment ⟶ boosts accuracy over SFT→RLVR
→ Learn more...
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion ⟶ multimodal pixel-aligned generation
→ Learn more...
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM ⟶ open-vocabulary 3D in dynamics
→ Learn more...
💻 Repos
YanFangCS/GenLIP ⭐
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining ⟶ strong Doc and OCR gains
→ Learn more...
RockeyCoss/LeapAlign_Code ⭐
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory ⟶ preference-aligns flow-matching T2I
→ Learn more...
📊 Datasets
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus ⟶ visual generation research
→ Learn more...
MathNet v0 — Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset ⟶ reasoning and retrieval benchmark
→ Learn more...
⚡ Trends
▸ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
▸ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
▸ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.
🧭 TL;DR
📄 MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.
💡 Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.
via @Papers.Data.Code
📄 Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
👤 Tao Liu, Hao Yan, Mengting Chen et al.
🎯 Task
Few-step text-to-image diffusion distillation
💡 Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.
✨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.
💻 Repo
⭐ byliutao/cdm — 77 stars
🔗 paper
via @Papers.Data.Code
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
👤 Tao Liu, Hao Yan, Mengting Chen et al.
🎯 Task
Few-step text-to-image diffusion distillation
💡 Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.
✨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.
💻 Repo
⭐ byliutao/cdm — 77 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - byliutao/CDM: Continuous-Time Distribution Matching for Few-Step Diffusion Distillation👏
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation👏 - byliutao/CDM
📊 Dataset #Dataset #CV #ImageGeneration #PermissiveLicense
giant-permissive-image-corpus
👤 stanford-vision-lab
🎯 Task
Visual generation
💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
✨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.
Size: 100M images
Downloads: 86 | Likes: 3
🔗 dataset
via @Papers.Data.Code
giant-permissive-image-corpus
👤 stanford-vision-lab
🎯 Task
Visual generation
💡 Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
✨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.
Size: 100M images
Downloads: 86 | Likes: 3
🔗 dataset
via @Papers.Data.Code
💻 Repo #Repo #LLM #Benchmark #SoftwareEngineering
Program Bench
👤 facebookresearch
🎯 Task
Program reconstruction benchmark
💡 Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.
✨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.
💻 Repo
⭐ facebookresearch/ProgramBench — 390 stars (+278 3d)
Python
via @Papers.Data.Code
Program Bench
👤 facebookresearch
🎯 Task
Program reconstruction benchmark
💡 Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.
✨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.
💻 Repo
⭐ facebookresearch/ProgramBench — 390 stars (+278 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - facebookresearch/ProgramBench: Can Language Models Rebuild Programs From Scratch?
Can Language Models Rebuild Programs From Scratch? - facebookresearch/ProgramBench
📄 Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation
Flow-OPD: On-Policy Distillation for Flow Matching Models
👤 Zhen Fang, Wenxuan Huang, Yu Zeng et al.
🎯 Task
Text-to-image model alignment
💡 Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.
✨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63→92 and OCR 59→94, about 10 points over GRPO.
💻 Repo
⭐ CostaliyA/Flow-OPD — 80 stars
🔗 paper
via @Papers.Data.Code
Flow-OPD: On-Policy Distillation for Flow Matching Models
👤 Zhen Fang, Wenxuan Huang, Yu Zeng et al.
🎯 Task
Text-to-image model alignment
💡 Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.
✨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63→92 and OCR 59→94, about 10 points over GRPO.
💻 Repo
⭐ CostaliyA/Flow-OPD — 80 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - CostaliyA/Flow-OPD: Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"
Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models" - CostaliyA/Flow-OPD
📄 Paper #Paper #LLM #TestTimeScaling #Reasoning
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
👤 Tong Zheng, Haolin Liu, Chengsong Huang et al.
🎯 Task
Test-time scaling for LLM reasoning
💡 Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-β parameterization and execution-trace feedback.
✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.
💻 Repo
⭐ zhengkid/AutoTTS — 43 stars
🔗 paper
via @Papers.Data.Code
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
👤 Tong Zheng, Haolin Liu, Chengsong Huang et al.
🎯 Task
Test-time scaling for LLM reasoning
💡 Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-β parameterization and execution-trace feedback.
✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.
💻 Repo
⭐ zhengkid/AutoTTS — 43 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - zhengkid/AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"
The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling" - zhengkid/AutoTTS
📊 Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators
AI Index Data: Growth, Talent (Cambridge/Harvard)
👤 patelris
🎯 Task
Global AI readiness and growth analysis
💡 Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.
✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.
Size: 259,546 observations, 24,453 indicators
Downloads: 242 | Likes: 28
🔗 dataset
via @Papers.Data.Code
AI Index Data: Growth, Talent (Cambridge/Harvard)
👤 patelris
🎯 Task
Global AI readiness and growth analysis
💡 Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.
✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.
Size: 259,546 observations, 24,453 indicators
Downloads: 242 | Likes: 28
🔗 dataset
via @Papers.Data.Code
Kaggle
AI Index Data: Growth, Talent (Cambridge/Harvard)
259K observations across 24K+ AI metrics from Cambridge/Harvard
📄 Paper #Paper #LLM #ReinforcementLearning #PostTraining
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
👤 Yun Qu, Qi Wang, Yixiu Mao et al.
🎯 Task
LLM post-training with verifiable rewards
💡 Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.
✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.
🔗 paper
via @Papers.Data.Code
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
👤 Yun Qu, Qi Wang, Yixiu Mao et al.
🎯 Task
LLM post-training with verifiable rewards
💡 Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.
✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.
🔗 paper
via @Papers.Data.Code
arXiv.org
Listwise Policy Optimization: Group-based RLVR as...
Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes,...
💻 Repo #Repo #Robotics #InertialOdometry #SelfSupervised
Kiss Imu
👤 sparolab
🎯 Task
Self-supervised inertial odometry
💡 Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.
✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.
💻 Repo
⭐ sparolab/KISS-IMU — 63 stars (+43 3d)
Python
🔗 paper
via @Papers.Data.Code
Kiss Imu
👤 sparolab
🎯 Task
Self-supervised inertial odometry
💡 Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.
✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.
💻 Repo
⭐ sparolab/KISS-IMU — 63 stars (+43 3d)
Python
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - sparolab/KISS-IMU: KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference.…
KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference. @ ICRA'26 Award Finalist - sparolab/KISS-IMU
📄 Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
👤 George Wu, Nan Jing, Qing Yi et al.
🎯 Task
Test-time scaling for LLM reasoning
💡 Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.
✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.
💻 Repo
⭐ george-QF/TMAS-code — 4 stars
🔗 paper
via @Papers.Data.Code
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
👤 George Wu, Nan Jing, Qing Yi et al.
🎯 Task
Test-time scaling for LLM reasoning
💡 Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.
✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.
💻 Repo
⭐ george-QF/TMAS-code — 4 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - george-QF/TMAS-code
Contribute to george-QF/TMAS-code development by creating an account on GitHub.
🔥1
📊 Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing
Hyperspectral Invasive Detection Dataset
👤 ziya07
🎯 Task
Hyperspectral invasive plant classification
💡 Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.
✨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.
Downloads: 33 | Likes: 13
🔗 dataset
via @Papers.Data.Code
Hyperspectral Invasive Detection Dataset
👤 ziya07
🎯 Task
Hyperspectral invasive plant classification
💡 Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.
✨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.
Downloads: 33 | Likes: 13
🔗 dataset
via @Papers.Data.Code
Kaggle
Hyperspectral Invasive Detection Dataset
Spectral-Spatial Vegetation Features for Intelligent Ecological Mapping
🔥 Repo #Repo #LLM #Metal #KvCache
Ds4
👤 antirez
🎯 Task
Local LLM inference and serving
💡 Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.
✨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.
💻 Repo
⭐ antirez/ds4 — 8.0k stars (+5.3k 3d)
C
via @Papers.Data.Code
Ds4
👤 antirez
🎯 Task
Local LLM inference and serving
💡 Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.
✨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.
💻 Repo
⭐ antirez/ds4 — 8.0k stars (+5.3k 3d)
C
via @Papers.Data.Code
GitHub
GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal and CUDA
DeepSeek 4 Flash local inference engine for Metal and CUDA - antirez/ds4
📄 Paper #Paper #LLM #MemoryMechanisms #Attention
δ-mem: Efficient Online Memory for Large Language Models
👤 Jingdi Lei, Di Zhang, Junxian Li et al.
🎯 Task
Long-term memory augmentation for LLMs
💡 Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.
✨ Why it's interesting
With only an 8×8 state, average score reaches 1.10× the frozen backbone and 1.15× the best non-δ-mem baseline; 1.31× on MemoryAgentBench and 1.20× on LoCoMo.
💻 Repo
⭐ declare-lab/delta-Mem — 53 stars
🔗 paper
via @Papers.Data.Code
δ-mem: Efficient Online Memory for Large Language Models
👤 Jingdi Lei, Di Zhang, Junxian Li et al.
🎯 Task
Long-term memory augmentation for LLMs
💡 Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.
✨ Why it's interesting
With only an 8×8 state, average score reaches 1.10× the frozen backbone and 1.15× the best non-δ-mem baseline; 1.31× on MemoryAgentBench and 1.20× on LoCoMo.
💻 Repo
⭐ declare-lab/delta-Mem — 53 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - declare-lab/delta-Mem: The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models
The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models - declare-lab/delta-Mem
📊 Dataset #Dataset #Tabular #Epidemiology #InfectiousDisease
🦠 Hantavirus (Andes Virus) — Global Epidemiology
👤 zkskhurram
🎯 Task
Infectious disease epidemiology analysis
💡 Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993–2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.
✨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.
Size: 7 tables, 25 countries, 1993–2025
📊 Dataset
📥 662 downloads
❤️ 26 likes
🔗 dataset
via @Papers.Data.Code
🦠 Hantavirus (Andes Virus) — Global Epidemiology
👤 zkskhurram
🎯 Task
Infectious disease epidemiology analysis
💡 Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993–2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.
✨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.
Size: 7 tables, 25 countries, 1993–2025
📊 Dataset
📥 662 downloads
❤️ 26 likes
🔗 dataset
via @Papers.Data.Code
Kaggle
🦠 Hantavirus (Andes Virus) — Global Epidemiology
🌍 Comprehensive worldwide dataset covering HPS/HFRS cases, clinical outcomes
💻 Repo #Repo #CV #4dReconstruction #DynamicScenes
D4rt
👤 lucidrains
🎯 Task
dynamic scene reconstruction from video
💡 Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.
✨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.
💻 Repo
⭐ lucidrains/d4rt — 50 stars (+50 3d)
Python
via @Papers.Data.Code
D4rt
👤 lucidrains
🎯 Task
dynamic scene reconstruction from video
💡 Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.
✨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.
💻 Repo
⭐ lucidrains/d4rt — 50 stars (+50 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - lucidrains/d4rt: Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind
Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind - lucidrains/d4rt
📄 Paper #Paper #Multimodal #VisionLanguageModels #ImageGeneration
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
👤 Haiwen Diao, Penghao Wu, Hanming Deng et al.
🎯 Task
Unified multimodal understanding and generation
💡 Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.
✨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32× compression.
💻 Repo
⭐ OpenSenseNova/SenseNova-U1 — 1.7k stars
🔗 paper
via @Papers.Data.Code
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
👤 Haiwen Diao, Penghao Wu, Hanming Deng et al.
🎯 Task
Unified multimodal understanding and generation
💡 Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.
✨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32× compression.
💻 Repo
⭐ OpenSenseNova/SenseNova-U1 — 1.7k stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - OpenSenseNova/SenseNova-U1: SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles - OpenSenseNova/SenseNova-U1
📄 Paper #Paper #CV #VideoGeneration #DiffusionModels
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
👤 Yuchao Gu, Guian Fang, Yuxin Jiang et al.
🎯 Task
Any-step video generation
💡 Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.
✨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.
💻 Repo
⭐ NVlabs/AnyFlow — 202 stars
⭐ NVLabs/AnyFlow — 202 stars
🔗 paper
via @Papers.Data.Code
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
👤 Yuchao Gu, Guian Fang, Yuxin Jiang et al.
🎯 Task
Any-step video generation
💡 Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.
✨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.
💻 Repo
⭐ NVlabs/AnyFlow — 202 stars
⭐ NVLabs/AnyFlow — 202 stars
🔗 paper
via @Papers.Data.Code
GitHub
GitHub - NVlabs/AnyFlow
Contribute to NVlabs/AnyFlow development by creating an account on GitHub.
📄 Paper #Paper #LLM #ReinforcementLearning #AgentTraining
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
👤 Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.
🎯 Task
Long-form deep research agent training
💡 Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.
✨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.
🔗 paper
via @Papers.Data.Code
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
👤 Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.
🎯 Task
Long-form deep research agent training
💡 Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.
✨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.
🔗 paper
via @Papers.Data.Code
arXiv.org
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond...
Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the regime of verifiable rewards. Their...