π Paper #Paper #Multimodal #ReinforcementLearning #KnowledgeDistillation
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
π€ Sudong Wang, Weiquan Huang, Xiaomin Yu et al.
π― Task
Multimodal reasoning post-training
π‘ Idea
Black-box adversarial on-policy distillation with an MoE discriminator separates perception and reasoning feedback, aligning post-SFT outputs to supervision before RL without teacher logits.
β¨ Why it's interesting
Boosts average accuracy over SFTβRLVR by +4.4 on 4B and +6.0 on 8B.
π» Repo
β XIAO4579/PRISM β 53 stars
π paper
via @Papers.Data.Code
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
π€ Sudong Wang, Weiquan Huang, Xiaomin Yu et al.
π― Task
Multimodal reasoning post-training
π‘ Idea
Black-box adversarial on-policy distillation with an MoE discriminator separates perception and reasoning feedback, aligning post-SFT outputs to supervision before RL without teacher logits.
β¨ Why it's interesting
Boosts average accuracy over SFTβRLVR by +4.4 on 4B and +6.0 on 8B.
π» Repo
β XIAO4579/PRISM β 53 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - XIAO4579/PRISM
Contribute to XIAO4579/PRISM development by creating an account on GitHub.
π» Repo #Repo #Multimodal #TextToImage #FlowMatching
Leap Align Code
π€ RockeyCoss
π― Task
Preference alignment for text-to-image flow matching models
π‘ Idea
Aligns flow-matching image generators with human preference rewards by replacing full-trajectory backpropagation with a two-step leap trajectory, so optimization can target any generation step during sampling.
β¨ Why it's interesting
Enables gradient propagation to any generation step while avoiding full-trajectory memory cost.
π» Repo
β RockeyCoss/LeapAlign_Code β 12 stars (+12 3d)
Python
π paper
via @Papers.Data.Code
Leap Align Code
π€ RockeyCoss
π― Task
Preference alignment for text-to-image flow matching models
π‘ Idea
Aligns flow-matching image generators with human preference rewards by replacing full-trajectory backpropagation with a two-step leap trajectory, so optimization can target any generation step during sampling.
β¨ Why it's interesting
Enables gradient propagation to any generation step while avoiding full-trajectory memory cost.
π» Repo
β RockeyCoss/LeapAlign_Code β 12 stars (+12 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - RockeyCoss/LeapAlign_Code: [CVPR2026] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Buildingβ¦
[CVPR2026] LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories - RockeyCoss/LeapAlign_Code
π Paper #Paper #LLM #ContextLearning #MultiAgentSystems
From Context to Skills: Can Language Models Learn from Context Skillfully?
π€ Shuzheng Si, Haozhe Zhao, Yu Lei et al.
π― Task
Context learning for language models
π‘ Idea
Multi-agent self-play builds skills instead of updating weights: Challenger makes tasks/rubrics, Reasoner solves with evolving skills, Judge gives binary feedback, and Cross-time Replay picks the most generalizable skill set.
β¨ Why it's interesting
Improves CL-bench solving rates, e.g. GPT-4.1 11.1%β16.5% and GPT-5.1 21.2%β25.8%.
π» Repo
β S1s-Z/Ctx2Skill β 44 stars
π paper
via @Papers.Data.Code
From Context to Skills: Can Language Models Learn from Context Skillfully?
π€ Shuzheng Si, Haozhe Zhao, Yu Lei et al.
π― Task
Context learning for language models
π‘ Idea
Multi-agent self-play builds skills instead of updating weights: Challenger makes tasks/rubrics, Reasoner solves with evolving skills, Judge gives binary feedback, and Cross-time Replay picks the most generalizable skill set.
β¨ Why it's interesting
Improves CL-bench solving rates, e.g. GPT-4.1 11.1%β16.5% and GPT-5.1 21.2%β25.8%.
π» Repo
β S1s-Z/Ctx2Skill β 44 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - S1s-Z/Ctx2Skill: Code for "From Context to Skills: Can Language Models Learn from Context Skillfully? "
Code for "From Context to Skills: Can Language Models Learn from Context Skillfully? " - S1s-Z/Ctx2Skill
π₯1
π Paper #Paper #Multimodal #AgenticRL #ToolUse
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
π€ Shuang Chen, Kaituo Feng, Hangting Chen et al.
π― Task
Multimodal deep search agents
π‘ Idea
Wikipedia path-sampled multi-hop VQA plus a unified search/OCR/image-enhancement toolset train agents with fatal-aware GRPO, masking post-failure tokens and clamping advantages to keep useful pre-failure reasoning.
β¨ Why it's interesting
Improves average score from 47.8 to 61.6; +13.8 points across 7 benchmarks.
π» Repo
β shawn0728/OpenSearch-VL β 69 stars
π paper
via @Papers.Data.Code
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
π€ Shuang Chen, Kaituo Feng, Hangting Chen et al.
π― Task
Multimodal deep search agents
π‘ Idea
Wikipedia path-sampled multi-hop VQA plus a unified search/OCR/image-enhancement toolset train agents with fatal-aware GRPO, masking post-failure tokens and clamping advantages to keep useful pre-failure reasoning.
β¨ Why it's interesting
Improves average score from 47.8 to 61.6; +13.8 points across 7 benchmarks.
π» Repo
β shawn0728/OpenSearch-VL β 69 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - shawn0728/OpenSearch-VL: π OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agentsβ¦
π OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diverse visual/search tools, and fatal-aware agentic reinforcement...
π Paper #Paper #NLP #InformationRetrieval #Benchmarking
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
π€ Yilun Zhao, Jinbiao Wei, Tingyu Song et al.
π― Task
Reasoning-intensive retrieval
π‘ Idea
Aspect-annotated retrieval benchmark plus aspect-decomposed synthetic training. BRIGHT-PRO labels multi-aspect evidence and tests static/agentic search; RTriever-Synth creates complementary positives and positive-conditioned hard negatives for LoRA tuning.
β¨ Why it's interesting
RTriever-4B substantially improves over Qwen3-Embedding-4B.
π» Repo
β yale-nlp/Bright-Pro β 11 stars
π paper
via @Papers.Data.Code
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
π€ Yilun Zhao, Jinbiao Wei, Tingyu Song et al.
π― Task
Reasoning-intensive retrieval
π‘ Idea
Aspect-annotated retrieval benchmark plus aspect-decomposed synthetic training. BRIGHT-PRO labels multi-aspect evidence and tests static/agentic search; RTriever-Synth creates complementary positives and positive-conditioned hard negatives for LoRA tuning.
β¨ Why it's interesting
RTriever-4B substantially improves over Qwen3-Embedding-4B.
π» Repo
β yale-nlp/Bright-Pro β 11 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - yale-nlp/Bright-Pro: Data and code for ACL 2026 Paper "Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancingβ¦
Data and code for ACL 2026 Paper "Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems" - yale-nlp/Bright-Pro
π Dataset #Dataset #Multimodal #ImageGeneration #PermissiveLicense
gpic
π€ stanford-vision-lab
π― Task
Visual generation
π‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
β¨ Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.
Size: 100M images
Downloads: 187 | Likes: 4
π dataset
via @Papers.Data.Code
gpic
π€ stanford-vision-lab
π― Task
Visual generation
π‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
β¨ Why it's interesting
Its fully permissive 100M-image scale supports large-scale visual generation research with usable licensing.
Size: 100M images
Downloads: 187 | Likes: 4
π dataset
via @Papers.Data.Code
π Paper #Paper #CV #3DGeneration #ArticulatedObjects
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
π€ Yunhan Yang, Chunshi Wang, Junliang Ye et al.
π― Task
Physics-grounded 3D asset generation
π‘ Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.
β¨ Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.
π» Repo
β HKU-MMLab/PhysForge β 44 stars
π paper
via @Papers.Data.Code
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
π€ Yunhan Yang, Chunshi Wang, Junliang Ye et al.
π― Task
Physics-grounded 3D asset generation
π‘ Idea
VLM-planned hierarchical physical blueprints guide a diffusion model; KineVoxel Injection jointly generates geometry with joint origin, axis, and limits for interactive parts.
β¨ Why it's interesting
On PhysDB, CD 22.89 vs 25.30 and interaction 0.96 vs 0.34 over PhysXGen.
π» Repo
β HKU-MMLab/PhysForge β 44 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - HKU-MMLab/PhysForge: [ICML 2026] PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
[ICML 2026] PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World - HKU-MMLab/PhysForge
π Weekly Digest | May 02 β May 09
#WeeklyDigest
π Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model βΆ beats VLA baselines on 7 benchmarks
β Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL βΆ +13.8 points on 7 benchmarks
β Learn more...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories βΆ SOTA ~30B search agents
β Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface βΆ scientific tasks on structured data
β Learn more...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation βΆ simulation-ready articulated assets
β Learn more...
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture βΆ predicts animation-ready rotations
β Learn more...
π» Repos
PKU-YuanGroup/TIDE β
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation βΆ 0.6B student, lower cost
β Learn more...
Vinayak-VG/GenWildSplat β
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction βΆ 3D Gaussian splat in 3s
β Learn more...
YanFangCS/GenLIP β
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining βΆ strong Doc and OCR gains
β Learn more...
π Datasets
SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset βΆ studies human-agent coding workflows
β Learn more...
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus βΆ visual generation research
β Learn more...
β‘οΈ Tomorrow β Multimodal & Agents Monthly
via @Papers.Data.Code
#WeeklyDigest
π Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model βΆ beats VLA baselines on 7 benchmarks
β Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL βΆ +13.8 points on 7 benchmarks
β Learn more...
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
#SearchAgents #SupervisedFineTuning #ToolUse
10.6k hard trajectories βΆ SOTA ~30B search agents
β Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface βΆ scientific tasks on structured data
β Learn more...
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
#3DGeneration #ArticulatedObjects #DiffusionModels
Two-stage 3D generation βΆ simulation-ready articulated assets
β Learn more...
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
#MotionCapture #PoseEstimation #3DHumanPoseEstimation
Monocular motion capture βΆ predicts animation-ready rotations
β Learn more...
π» Repos
PKU-YuanGroup/TIDE β
#DiscreteDiffusion #Distillation #CodeGeneration
Cross-architecture distillation βΆ 0.6B student, lower cost
β Learn more...
Vinayak-VG/GenWildSplat β
#3DReconstruction #GaussianSplatting #NovelViewSynthesis
Sparse-view 3D reconstruction βΆ 3D Gaussian splat in 3s
β Learn more...
YanFangCS/GenLIP β
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining βΆ strong Doc and OCR gains
β Learn more...
π Datasets
SWE-chat
#CodingAgent #AgentTraces #HumanAICollaboration
SWE-chat dataset βΆ studies human-agent coding workflows
β Learn more...
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus βΆ visual generation research
β Learn more...
β‘οΈ Tomorrow β Multimodal & Agents Monthly
via @Papers.Data.Code
π Monthly: Multimodal & Agents | Apr 10 β May 10
#MonthlyDigest #Multimodal #Agents
π Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model βΆ beats VLA baselines on 7 benchmarks
β Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL βΆ +13.8 points on 7 benchmarks
β Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface βΆ scientific tasks on structured data
β Learn more...
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment βΆ boosts accuracy over SFTβRLVR
β Learn more...
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion βΆ multimodal pixel-aligned generation
β Learn more...
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM βΆ open-vocabulary 3D in dynamics
β Learn more...
π» Repos
YanFangCS/GenLIP β
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining βΆ strong Doc and OCR gains
β Learn more...
RockeyCoss/LeapAlign_Code β
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory βΆ preference-aligns flow-matching T2I
β Learn more...
π Datasets
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus βΆ visual generation research
β Learn more...
MathNet v0 β Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset βΆ reasoning and retrieval benchmark
β Learn more...
β‘ Trends
βΈ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
βΈ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
βΈ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.
π§ TL;DR
π MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.
π‘ Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.
via @Papers.Data.Code
#MonthlyDigest #Multimodal #Agents
π Papers
MolmoAct2: Action Reasoning Models for Real-world Deployment
#VisionLanguageAction #EmbodiedReasoning #ImitationLearning
Vision-language-action model βΆ beats VLA baselines on 7 benchmarks
β Learn more...
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
#AgenticRL #ToolUse #MultimodalReasoning
Failure-aware multimodal search RL βΆ +13.8 points on 7 benchmarks
β Learn more...
Heterogeneous Scientific Foundation Model Collaboration
#AgentSystems #FoundationModels #ScientificAI
LLM-FM agent interface βΆ scientific tasks on structured data
β Learn more...
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
#ReinforcementLearning #KnowledgeDistillation #Reasoning
PRISM pre-alignment βΆ boosts accuracy over SFTβRLVR
β Learn more...
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
#VideoGeneration #DiffusionModels #ParameterEfficientFineTuning
Unified video diffusion βΆ multimodal pixel-aligned generation
β Learn more...
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
#SLAM #OpenVocabulary #VisualLanguageModels
Tightly coupled VLM SLAM βΆ open-vocabulary 3D in dynamics
β Learn more...
π» Repos
YanFangCS/GenLIP β
#VisionEncoder #AutoregressivePretraining #OCR
Autoregressive ViT pretraining βΆ strong Doc and OCR gains
β Learn more...
RockeyCoss/LeapAlign_Code β
#TextToImage #FlowMatching #PreferenceOptimization
Two-step leap trajectory βΆ preference-aligns flow-matching T2I
β Learn more...
π Datasets
gpic
#ImageGeneration #PermissiveLicense #ImageText
Permissive 100M image corpus βΆ visual generation research
β Learn more...
MathNet v0 β Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset βΆ reasoning and retrieval benchmark
β Learn more...
β‘ Trends
βΈ Agents increasingly orchestrate external tools or specialized models through structured interfaces.
βΈ Multimodal RL training adds intermediate alignment stages to preserve reasoning quality.
βΈ Shared action tokenization is emerging for embodied control and cross-embodiment transfer.
π§ TL;DR
π MolmoAct2: Action Reasoning Models for Real-world Deployment
Open VLA model beats strong baselines with adaptive low-latency embodied reasoning.
π‘ Multimodal agents are shifting toward tool-grounded, efficient, real-world deployment.
via @Papers.Data.Code
π Paper #Paper #CV #DiffusionDistillation #TextToImageGeneration
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
π€ Tao Liu, Hao Yan, Mengting Chen et al.
π― Task
Few-step text-to-image diffusion distillation
π‘ Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.
β¨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.
π» Repo
β byliutao/cdm β 77 stars
π paper
via @Papers.Data.Code
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
π€ Tao Liu, Hao Yan, Mengting Chen et al.
π― Task
Few-step text-to-image diffusion distillation
π‘ Idea
Continuous-time distribution matching with dynamic random-length schedules and velocity-based off-trajectory matching. It supervises arbitrary times, not fixed anchors, to reduce drift and preserve fine details without GAN or reward losses.
β¨ Why it's interesting
At 4 NFE on SD3-Medium, it reaches HPSv3 9.561 vs 9.176 for D-DMD.
π» Repo
β byliutao/cdm β 77 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - byliutao/CDM: Continuous-Time Distribution Matching for Few-Step Diffusion Distillationπ
Continuous-Time Distribution Matching for Few-Step Diffusion Distillationπ - byliutao/CDM
π Dataset #Dataset #CV #ImageGeneration #PermissiveLicense
giant-permissive-image-corpus
π€ stanford-vision-lab
π― Task
Visual generation
π‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
β¨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.
Size: 100M images
Downloads: 86 | Likes: 3
π dataset
via @Papers.Data.Code
giant-permissive-image-corpus
π€ stanford-vision-lab
π― Task
Visual generation
π‘ Idea
100M high-quality, diverse images in a fully permissive image corpus for visual generation.
β¨ Why it's interesting
Fully permissive 100M-image scale supports training and studying visual generation without restrictive licensing.
Size: 100M images
Downloads: 86 | Likes: 3
π dataset
via @Papers.Data.Code
π» Repo #Repo #LLM #Benchmark #SoftwareEngineering
Program Bench
π€ facebookresearch
π― Task
Program reconstruction benchmark
π‘ Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.
β¨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.
π» Repo
β facebookresearch/ProgramBench β 390 stars (+278 3d)
Python
via @Papers.Data.Code
Program Bench
π€ facebookresearch
π― Task
Program reconstruction benchmark
π‘ Idea
Evaluates LM-based software agents on recreating complete codebases that match an original program's behavior using only binaries, docs, and test suites.
β¨ Why it's interesting
Provides a black-box benchmark for full-program reverse engineering by LM agents.
π» Repo
β facebookresearch/ProgramBench β 390 stars (+278 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - facebookresearch/ProgramBench: Can Language Models Rebuild Programs From Scratch?
Can Language Models Rebuild Programs From Scratch? - facebookresearch/ProgramBench
π Paper #Paper #Multimodal #TextToImage #KnowledgeDistillation
Flow-OPD: On-Policy Distillation for Flow Matching Models
π€ Zhen Fang, Wenxuan Huang, Yu Zeng et al.
π― Task
Text-to-image model alignment
π‘ Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.
β¨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63β92 and OCR 59β94, about 10 points over GRPO.
π» Repo
β CostaliyA/Flow-OPD β 80 stars
π paper
via @Papers.Data.Code
Flow-OPD: On-Policy Distillation for Flow Matching Models
π€ Zhen Fang, Wenxuan Huang, Yu Zeng et al.
π― Task
Text-to-image model alignment
π‘ Idea
On-policy multi-teacher distillation for flow matching: route each sampled trajectory to a task-specific teacher for dense velocity-field supervision, then use manifold anchor regularization to keep outputs on a high-quality visual manifold.
β¨ Why it's interesting
On SD 3.5 Medium, GenEval rises 63β92 and OCR 59β94, about 10 points over GRPO.
π» Repo
β CostaliyA/Flow-OPD β 80 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - CostaliyA/Flow-OPD: Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"
Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models" - CostaliyA/Flow-OPD
π Paper #Paper #LLM #TestTimeScaling #Reasoning
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
π€ Tong Zheng, Haolin Liu, Chengsong Huang et al.
π― Task
Test-time scaling for LLM reasoning
π‘ Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-Ξ² parameterization and execution-trace feedback.
β¨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.
π» Repo
β zhengkid/AutoTTS β 43 stars
π paper
via @Papers.Data.Code
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
π€ Tong Zheng, Haolin Liu, Chengsong Huang et al.
π― Task
Test-time scaling for LLM reasoning
π‘ Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-Ξ² parameterization and execution-trace feedback.
β¨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.
π» Repo
β zhengkid/AutoTTS β 43 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - zhengkid/AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"
The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling" - zhengkid/AutoTTS
π Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators
AI Index Data: Growth, Talent (Cambridge/Harvard)
π€ patelris
π― Task
Global AI readiness and growth analysis
π‘ Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.
β¨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.
Size: 259,546 observations, 24,453 indicators
Downloads: 242 | Likes: 28
π dataset
via @Papers.Data.Code
AI Index Data: Growth, Talent (Cambridge/Harvard)
π€ patelris
π― Task
Global AI readiness and growth analysis
π‘ Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.
β¨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.
Size: 259,546 observations, 24,453 indicators
Downloads: 242 | Likes: 28
π dataset
via @Papers.Data.Code
Kaggle
AI Index Data: Growth, Talent (Cambridge/Harvard)
259K observations across 24K+ AI metrics from Cambridge/Harvard
π Paper #Paper #LLM #ReinforcementLearning #PostTraining
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
π€ Yun Qu, Qi Wang, Yixiu Mao et al.
π― Task
LLM post-training with verifiable rewards
π‘ Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.
β¨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.
π paper
via @Papers.Data.Code
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
π€ Yun Qu, Qi Wang, Yixiu Mao et al.
π― Task
LLM post-training with verifiable rewards
π‘ Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.
β¨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.
π paper
via @Papers.Data.Code
arXiv.org
Listwise Policy Optimization: Group-based RLVR as...
Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes,...
π» Repo #Repo #Robotics #InertialOdometry #SelfSupervised
Kiss Imu
π€ sparolab
π― Task
Self-supervised inertial odometry
π‘ Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.
β¨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.
π» Repo
β sparolab/KISS-IMU β 63 stars (+43 3d)
Python
π paper
via @Papers.Data.Code
Kiss Imu
π€ sparolab
π― Task
Self-supervised inertial odometry
π‘ Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.
β¨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.
π» Repo
β sparolab/KISS-IMU β 63 stars (+43 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - sparolab/KISS-IMU: KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference.β¦
KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference. @ ICRA'26 Award Finalist - sparolab/KISS-IMU
π Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
π€ George Wu, Nan Jing, Qing Yi et al.
π― Task
Test-time scaling for LLM reasoning
π‘ Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.
β¨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.
π» Repo
β george-QF/TMAS-code β 4 stars
π paper
via @Papers.Data.Code
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
π€ George Wu, Nan Jing, Qing Yi et al.
π― Task
Test-time scaling for LLM reasoning
π‘ Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.
β¨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.
π» Repo
β george-QF/TMAS-code β 4 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - george-QF/TMAS-code
Contribute to george-QF/TMAS-code development by creating an account on GitHub.
π₯1
π Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing
Hyperspectral Invasive Detection Dataset
π€ ziya07
π― Task
Hyperspectral invasive plant classification
π‘ Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.
β¨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.
Downloads: 33 | Likes: 13
π dataset
via @Papers.Data.Code
Hyperspectral Invasive Detection Dataset
π€ ziya07
π― Task
Hyperspectral invasive plant classification
π‘ Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.
β¨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.
Downloads: 33 | Likes: 13
π dataset
via @Papers.Data.Code
Kaggle
Hyperspectral Invasive Detection Dataset
Spectral-Spatial Vegetation Features for Intelligent Ecological Mapping
π₯ Repo #Repo #LLM #Metal #KvCache
Ds4
π€ antirez
π― Task
Local LLM inference and serving
π‘ Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.
β¨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.
π» Repo
β antirez/ds4 β 8.0k stars (+5.3k 3d)
C
via @Papers.Data.Code
Ds4
π€ antirez
π― Task
Local LLM inference and serving
π‘ Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.
β¨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.
π» Repo
β antirez/ds4 β 8.0k stars (+5.3k 3d)
C
via @Papers.Data.Code
GitHub
GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal and CUDA
DeepSeek 4 Flash local inference engine for Metal and CUDA - antirez/ds4
π Paper #Paper #LLM #MemoryMechanisms #Attention
Ξ΄-mem: Efficient Online Memory for Large Language Models
π€ Jingdi Lei, Di Zhang, Junxian Li et al.
π― Task
Long-term memory augmentation for LLMs
π‘ Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.
β¨ Why it's interesting
With only an 8Γ8 state, average score reaches 1.10Γ the frozen backbone and 1.15Γ the best non-Ξ΄-mem baseline; 1.31Γ on MemoryAgentBench and 1.20Γ on LoCoMo.
π» Repo
β declare-lab/delta-Mem β 53 stars
π paper
via @Papers.Data.Code
Ξ΄-mem: Efficient Online Memory for Large Language Models
π€ Jingdi Lei, Di Zhang, Junxian Li et al.
π― Task
Long-term memory augmentation for LLMs
π‘ Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.
β¨ Why it's interesting
With only an 8Γ8 state, average score reaches 1.10Γ the frozen backbone and 1.15Γ the best non-Ξ΄-mem baseline; 1.31Γ on MemoryAgentBench and 1.20Γ on LoCoMo.
π» Repo
β declare-lab/delta-Mem β 53 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - declare-lab/delta-Mem: The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models
The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models - declare-lab/delta-Mem