π» Repo #Repo #Robotics #InertialOdometry #SelfSupervised
Kiss Imu
π€ sparolab
π― Task
Self-supervised inertial odometry
π‘ Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.
β¨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.
π» Repo
β sparolab/KISS-IMU β 63 stars (+43 3d)
Python
π paper
via @Papers.Data.Code
Kiss Imu
π€ sparolab
π― Task
Self-supervised inertial odometry
π‘ Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.
β¨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.
π» Repo
β sparolab/KISS-IMU β 63 stars (+43 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - sparolab/KISS-IMU: KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference.β¦
KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference. @ ICRA'26 Award Finalist - sparolab/KISS-IMU
π Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
π€ George Wu, Nan Jing, Qing Yi et al.
π― Task
Test-time scaling for LLM reasoning
π‘ Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.
β¨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.
π» Repo
β george-QF/TMAS-code β 4 stars
π paper
via @Papers.Data.Code
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
π€ George Wu, Nan Jing, Qing Yi et al.
π― Task
Test-time scaling for LLM reasoning
π‘ Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.
β¨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.
π» Repo
β george-QF/TMAS-code β 4 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - george-QF/TMAS-code
Contribute to george-QF/TMAS-code development by creating an account on GitHub.
π₯1
π Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing
Hyperspectral Invasive Detection Dataset
π€ ziya07
π― Task
Hyperspectral invasive plant classification
π‘ Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.
β¨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.
Downloads: 33 | Likes: 13
π dataset
via @Papers.Data.Code
Hyperspectral Invasive Detection Dataset
π€ ziya07
π― Task
Hyperspectral invasive plant classification
π‘ Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.
β¨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.
Downloads: 33 | Likes: 13
π dataset
via @Papers.Data.Code
Kaggle
Hyperspectral Invasive Detection Dataset
Spectral-Spatial Vegetation Features for Intelligent Ecological Mapping
π₯ Repo #Repo #LLM #Metal #KvCache
Ds4
π€ antirez
π― Task
Local LLM inference and serving
π‘ Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.
β¨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.
π» Repo
β antirez/ds4 β 8.0k stars (+5.3k 3d)
C
via @Papers.Data.Code
Ds4
π€ antirez
π― Task
Local LLM inference and serving
π‘ Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.
β¨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.
π» Repo
β antirez/ds4 β 8.0k stars (+5.3k 3d)
C
via @Papers.Data.Code
GitHub
GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal and CUDA
DeepSeek 4 Flash local inference engine for Metal and CUDA - antirez/ds4
π Paper #Paper #LLM #MemoryMechanisms #Attention
Ξ΄-mem: Efficient Online Memory for Large Language Models
π€ Jingdi Lei, Di Zhang, Junxian Li et al.
π― Task
Long-term memory augmentation for LLMs
π‘ Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.
β¨ Why it's interesting
With only an 8Γ8 state, average score reaches 1.10Γ the frozen backbone and 1.15Γ the best non-Ξ΄-mem baseline; 1.31Γ on MemoryAgentBench and 1.20Γ on LoCoMo.
π» Repo
β declare-lab/delta-Mem β 53 stars
π paper
via @Papers.Data.Code
Ξ΄-mem: Efficient Online Memory for Large Language Models
π€ Jingdi Lei, Di Zhang, Junxian Li et al.
π― Task
Long-term memory augmentation for LLMs
π‘ Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.
β¨ Why it's interesting
With only an 8Γ8 state, average score reaches 1.10Γ the frozen backbone and 1.15Γ the best non-Ξ΄-mem baseline; 1.31Γ on MemoryAgentBench and 1.20Γ on LoCoMo.
π» Repo
β declare-lab/delta-Mem β 53 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - declare-lab/delta-Mem: The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models
The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models - declare-lab/delta-Mem
π Dataset #Dataset #Tabular #Epidemiology #InfectiousDisease
π¦ Hantavirus (Andes Virus) β Global Epidemiology
π€ zkskhurram
π― Task
Infectious disease epidemiology analysis
π‘ Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993β2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.
β¨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.
Size: 7 tables, 25 countries, 1993β2025
π Dataset
π₯ 662 downloads
β€οΈ 26 likes
π dataset
via @Papers.Data.Code
π¦ Hantavirus (Andes Virus) β Global Epidemiology
π€ zkskhurram
π― Task
Infectious disease epidemiology analysis
π‘ Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993β2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.
β¨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.
Size: 7 tables, 25 countries, 1993β2025
π Dataset
π₯ 662 downloads
β€οΈ 26 likes
π dataset
via @Papers.Data.Code
Kaggle
π¦ Hantavirus (Andes Virus) β Global Epidemiology
π Comprehensive worldwide dataset covering HPS/HFRS cases, clinical outcomes
π» Repo #Repo #CV #4dReconstruction #DynamicScenes
D4rt
π€ lucidrains
π― Task
dynamic scene reconstruction from video
π‘ Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.
β¨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.
π» Repo
β lucidrains/d4rt β 50 stars (+50 3d)
Python
via @Papers.Data.Code
D4rt
π€ lucidrains
π― Task
dynamic scene reconstruction from video
π‘ Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.
β¨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.
π» Repo
β lucidrains/d4rt β 50 stars (+50 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - lucidrains/d4rt: Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind
Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind - lucidrains/d4rt
π Paper #Paper #Multimodal #VisionLanguageModels #ImageGeneration
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
π€ Haiwen Diao, Penghao Wu, Hanming Deng et al.
π― Task
Unified multimodal understanding and generation
π‘ Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.
β¨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32Γ compression.
π» Repo
β OpenSenseNova/SenseNova-U1 β 1.7k stars
π paper
via @Papers.Data.Code
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
π€ Haiwen Diao, Penghao Wu, Hanming Deng et al.
π― Task
Unified multimodal understanding and generation
π‘ Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.
β¨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32Γ compression.
π» Repo
β OpenSenseNova/SenseNova-U1 β 1.7k stars
π paper
via @Papers.Data.Code
GitHub
GitHub - OpenSenseNova/SenseNova-U1: SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles - OpenSenseNova/SenseNova-U1
π Paper #Paper #CV #VideoGeneration #DiffusionModels
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
π€ Yuchao Gu, Guian Fang, Yuxin Jiang et al.
π― Task
Any-step video generation
π‘ Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.
β¨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.
π» Repo
β NVlabs/AnyFlow β 202 stars
β NVLabs/AnyFlow β 202 stars
π paper
via @Papers.Data.Code
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
π€ Yuchao Gu, Guian Fang, Yuxin Jiang et al.
π― Task
Any-step video generation
π‘ Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.
β¨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.
π» Repo
β NVlabs/AnyFlow β 202 stars
β NVLabs/AnyFlow β 202 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - NVlabs/AnyFlow
Contribute to NVlabs/AnyFlow development by creating an account on GitHub.
π Paper #Paper #LLM #ReinforcementLearning #AgentTraining
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
π€ Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.
π― Task
Long-form deep research agent training
π‘ Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.
β¨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.
π paper
via @Papers.Data.Code
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
π€ Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.
π― Task
Long-form deep research agent training
π‘ Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.
β¨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.
π paper
via @Papers.Data.Code
arXiv.org
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond...
Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the regime of verifiable rewards. Their...
π Dataset #Dataset #NLP #StemReasoning #VisualQuestionAnswering
open-mm-rl
π€ TuringEnterprises
π― Task
Multimodal STEM question answering
π‘ Idea
40 MIT-licensed STEM QA examples across physics, math, biology, and chemistry, spanning single-image, multi-panel, and multi-image formats with deterministic final answers.
β¨ Why it's interesting
Deterministic, programmatically checkable answers make advanced multimodal STEM reasoning benchmarkable for RL and outcome-supervised training.
Size: 40 examples, 15.5 MB
π Dataset
π₯ 2.6k downloads
β€οΈ 94 likes
π dataset
via @Papers.Data.Code
open-mm-rl
π€ TuringEnterprises
π― Task
Multimodal STEM question answering
π‘ Idea
40 MIT-licensed STEM QA examples across physics, math, biology, and chemistry, spanning single-image, multi-panel, and multi-image formats with deterministic final answers.
β¨ Why it's interesting
Deterministic, programmatically checkable answers make advanced multimodal STEM reasoning benchmarkable for RL and outcome-supervised training.
Size: 40 examples, 15.5 MB
π Dataset
π₯ 2.6k downloads
β€οΈ 94 likes
π dataset
via @Papers.Data.Code
huggingface.co
TuringEnterprises/Open-MM-RL Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #CV #ImageToVideo #2kGeneration
Swifti2v
π€ HKUST-LongGroup
π― Task
high-resolution image-to-video generation
π‘ Idea
Generate native 2K videos from a single image by first producing a low-res motion reference, then refining to high resolution while conditioning on both the input image and the Stage I video.
β¨ Why it's interesting
Matches strong 2K end-to-end I2V baselines on key VBench-I2V metrics with 202Γ less GPU-time; 81-frame 2K output runs in ~111s on one H800 and fits on a 24 GB RTX 4090.
π» Repo
β HKUST-LongGroup/SwiftI2V β 71 stars (+47 3d)
HTML
π paper
via @Papers.Data.Code
Swifti2v
π€ HKUST-LongGroup
π― Task
high-resolution image-to-video generation
π‘ Idea
Generate native 2K videos from a single image by first producing a low-res motion reference, then refining to high resolution while conditioning on both the input image and the Stage I video.
β¨ Why it's interesting
Matches strong 2K end-to-end I2V baselines on key VBench-I2V metrics with 202Γ less GPU-time; 81-frame 2K output runs in ~111s on one H800 and fits on a 24 GB RTX 4090.
π» Repo
β HKUST-LongGroup/SwiftI2V β 71 stars (+47 3d)
HTML
π paper
via @Papers.Data.Code
GitHub
GitHub - HKUST-LongGroup/SwiftI2V: Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditionalβ¦
Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation" - HKUST-LongGroup/SwiftI2V
π Weekly Digest Β· May 09 β May 16
#WeeklyDigest
π Papers
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
#VideoGeneration #DiffusionModels #Distillation
Any-step video diffusion βΆ 84.05 VBench at 4 NFEs
β Learn more...
Flow-OPD: On-Policy Distillation for Flow Matching Models
#TextToImage #KnowledgeDistillation #ReinforcementLearning
On-policy flow distillation βΆ boosts GenEval and OCR
β Learn more...
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
#VisionLanguageModels #ImageGeneration #MixtureOfExperts
NEO-unify multimodal model βΆ unifies understanding and generation
β Learn more...
Ξ΄-mem: Efficient Online Memory for Large Language Models
#MemoryMechanisms #Attention #ParameterEfficientTuning
Online associative memory βΆ steers attention for long-horizon tasks
β Learn more...
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
#TestTimeScaling #Reasoning #AgenticSearch
Offline replay controller βΆ improves accuracy-cost tradeoffs
β Learn more...
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
#ReinforcementLearning #AgentTraining #LongContextReasoning
Rubric-guided meta-RL βΆ stagewise credit for research agents
β Learn more...
π» Repos
antirez/ds4 β
#Metal #KvCache #OpenaiCompatible
Metal local inference βΆ 1M context with disk KV cache
β Learn more...
facebookresearch/ProgramBench β
#Benchmark #SoftwareEngineering #ReverseEngineering
Program reconstruction benchmark βΆ tests LM reverse engineering
β Learn more...
sparolab/KISS-IMU β
#InertialOdometry #SelfSupervised #LidarPseudoLabels
Self-supervised IMU odometry βΆ denoises raw IMU with LiDAR labels
β Learn more...
π Datasets
AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset βΆ cross-country trend forecasting
β Learn more...
giant-permissive-image-corpus
#ImageGeneration #PermissiveLicense #ImageDataset
Permissive image corpus βΆ trains visual generation
β Learn more...
β‘οΈ Tomorrow β Efficient ML Monthly
via @Papers.Data.Code
#WeeklyDigest
π Papers
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
#VideoGeneration #DiffusionModels #Distillation
Any-step video diffusion βΆ 84.05 VBench at 4 NFEs
β Learn more...
Flow-OPD: On-Policy Distillation for Flow Matching Models
#TextToImage #KnowledgeDistillation #ReinforcementLearning
On-policy flow distillation βΆ boosts GenEval and OCR
β Learn more...
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
#VisionLanguageModels #ImageGeneration #MixtureOfExperts
NEO-unify multimodal model βΆ unifies understanding and generation
β Learn more...
Ξ΄-mem: Efficient Online Memory for Large Language Models
#MemoryMechanisms #Attention #ParameterEfficientTuning
Online associative memory βΆ steers attention for long-horizon tasks
β Learn more...
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
#TestTimeScaling #Reasoning #AgenticSearch
Offline replay controller βΆ improves accuracy-cost tradeoffs
β Learn more...
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
#ReinforcementLearning #AgentTraining #LongContextReasoning
Rubric-guided meta-RL βΆ stagewise credit for research agents
β Learn more...
π» Repos
antirez/ds4 β
#Metal #KvCache #OpenaiCompatible
Metal local inference βΆ 1M context with disk KV cache
β Learn more...
facebookresearch/ProgramBench β
#Benchmark #SoftwareEngineering #ReverseEngineering
Program reconstruction benchmark βΆ tests LM reverse engineering
β Learn more...
sparolab/KISS-IMU β
#InertialOdometry #SelfSupervised #LidarPseudoLabels
Self-supervised IMU odometry βΆ denoises raw IMU with LiDAR labels
β Learn more...
π Datasets
AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset βΆ cross-country trend forecasting
β Learn more...
giant-permissive-image-corpus
#ImageGeneration #PermissiveLicense #ImageDataset
Permissive image corpus βΆ trains visual generation
β Learn more...
β‘οΈ Tomorrow β Efficient ML Monthly
via @Papers.Data.Code
π1
π Monthly Β· Efficient ML Β· Apr 17 β May 17
#MonthlyDigest #EfficientML
π Papers
Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
#DiffusionModels #DecisionTrees #KnowledgeDistillation
Trees and flows βΆ faster tabular generation
β Learn more...
π Datasets
MSR-ACC/TAE25
#QuantumChemistry #AtomizationEnergy #CoupledCluster
Quantum chemistry dataset βΆ trains atomization energy models
β Learn more...
AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset βΆ cross-country trend forecasting
β Learn more...
WHO Global Health Indicators for Prediction
#GlobalHealth #CountryLevel #WorldBank
Global health panel data βΆ cross-country trend analysis
β Learn more...
β‘ Trends
βΈ Longitudinal country-level datasets increasingly target forecasting and cross-country trend analysis.
βΈ Wide, linked, multi-table dataset formats are becoming standard for benchmarking.
βΈ Efficiency gains come from unifying model families and distilling complex systems.
π§ TL;DR
π Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
Unifies trees and diffusion, delivering faster tabular generation and effective distillation.
π‘ Efficiency advances increasingly come from unifying classical structures with generative modeling.
via @Papers.Data.Code
#MonthlyDigest #EfficientML
π Papers
Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
#DiffusionModels #DecisionTrees #KnowledgeDistillation
Trees and flows βΆ faster tabular generation
β Learn more...
π Datasets
MSR-ACC/TAE25
#QuantumChemistry #AtomizationEnergy #CoupledCluster
Quantum chemistry dataset βΆ trains atomization energy models
β Learn more...
AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset βΆ cross-country trend forecasting
β Learn more...
WHO Global Health Indicators for Prediction
#GlobalHealth #CountryLevel #WorldBank
Global health panel data βΆ cross-country trend analysis
β Learn more...
β‘ Trends
βΈ Longitudinal country-level datasets increasingly target forecasting and cross-country trend analysis.
βΈ Wide, linked, multi-table dataset formats are becoming standard for benchmarking.
βΈ Efficiency gains come from unifying model families and distilling complex systems.
π§ TL;DR
π Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
Unifies trees and diffusion, delivering faster tabular generation and effective distillation.
π‘ Efficiency advances increasingly come from unifying classical structures with generative modeling.
via @Papers.Data.Code
π Paper #Paper #LLM #Reasoning #ReinforcementLearning
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
π€ Yafu Li, Runzhe Zhan, Haoran Zhang et al.
π― Task
Olympiad-level mathematical and scientific reasoning
π‘ Idea
Instead of domain-specific systems, it uses one scaling recipe: reverse-perplexity long-CoT SFT to instill proof search and self-checking, then coarse verifiable-reward RL, proof-level RL with self-refinement/replay, and test-time verification loops.
β¨ Why it's interesting
SU-01 gets 57.6% on IMO-ProofBench, 70.2% with test-time scaling, and reaches the IMO 2025 gold line with 35 points.
π» Repo
β Simplified-Reasoning/SU-01 β 68 stars
π paper
via @Papers.Data.Code
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
π€ Yafu Li, Runzhe Zhan, Haoran Zhang et al.
π― Task
Olympiad-level mathematical and scientific reasoning
π‘ Idea
Instead of domain-specific systems, it uses one scaling recipe: reverse-perplexity long-CoT SFT to instill proof search and self-checking, then coarse verifiable-reward RL, proof-level RL with self-refinement/replay, and test-time verification loops.
β¨ Why it's interesting
SU-01 gets 57.6% on IMO-ProofBench, 70.2% with test-time scaling, and reaches the IMO 2025 gold line with 35 points.
π» Repo
β Simplified-Reasoning/SU-01 β 68 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - Simplified-Reasoning/SU-01: SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling - Simplified-Reasoning/SU-01
π Dataset #Dataset #LLM #SoftwareEngineering #ToolUse
Orchard
π€ microsoft
π― Task
Agentic software engineering and web GUI interaction
π‘ Idea
~110K agent trajectories in 2 parallel subsets: 107,185 SWE chat+tool rollouts over 2,788 GitHub repos with hidden-test pass/fail labels, plus 3,070 GUI decision-point rows with screenshots, chat context, and judge-verified rewards across 409 web tasks.
β¨ Why it's interesting
Verified patch outcomes and judge-scored GUI steps make agent training and evaluation measurable across real coding and browser tasks.
β 110,255 samples, ~10.97 GB
π dataset
via @Papers.Data.Code
Orchard
π€ microsoft
π― Task
Agentic software engineering and web GUI interaction
π‘ Idea
~110K agent trajectories in 2 parallel subsets: 107,185 SWE chat+tool rollouts over 2,788 GitHub repos with hidden-test pass/fail labels, plus 3,070 GUI decision-point rows with screenshots, chat context, and judge-verified rewards across 409 web tasks.
β¨ Why it's interesting
Verified patch outcomes and judge-scored GUI steps make agent training and evaluation measurable across real coding and browser tasks.
β 110,255 samples, ~10.97 GB
π dataset
via @Papers.Data.Code
huggingface.co
microsoft/Orchard Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #CV #DepthEstimation #CameraPose
vggt Omega
π€ facebookresearch
π― Task
multi-view camera and depth reconstruction
π‘ Idea
Infer camera parameters and per-image depth from a set of images in one forward pass, and optionally produce text-aligned embeddings for the same visual inputs.
β¨ Why it's interesting
Runs end-to-end on a single A100 with 6.02 GB for 1 frame and 43.15 GB for 500 frames, with released 1B checkpoints and demo code.
π» Repo
β facebookresearch/vggt-omega β 413 stars (+413 3d)
Python
via @Papers.Data.Code
vggt Omega
π€ facebookresearch
π― Task
multi-view camera and depth reconstruction
π‘ Idea
Infer camera parameters and per-image depth from a set of images in one forward pass, and optionally produce text-aligned embeddings for the same visual inputs.
β¨ Why it's interesting
Runs end-to-end on a single A100 with 6.02 GB for 1 frame and 43.15 GB for 500 frames, with released 1B checkpoints and demo code.
π» Repo
β facebookresearch/vggt-omega β 413 stars (+413 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - facebookresearch/vggt-omega: [CVPR 2026 Oral] VGGT Omega
[CVPR 2026 Oral] VGGT Omega. Contribute to facebookresearch/vggt-omega development by creating an account on GitHub.
π Paper #Paper #CV #VideoGeneration #DiffusionModels
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
π€ Min Zhao, Hongzhou Zhu, Kaiwen Zheng et al.
π― Task
Real-time autoregressive video generation
π‘ Idea
Instead of costly AR-teacher ODE trajectory distillation, it initializes few-step AR students with causal consistency distillation: same AR flow-map target, but learned from single online adjacent-step teacher updates, making frame-wise 1-2 step rollout practical.
β¨ Why it's interesting
At frame-wise 2-step, beats 4-step chunk-wise Causal Forcing by +0.1 VBench Total, +0.3 Quality, +0.335 VisionReward; 50% lower first-frame latency, ~4x cheaper Stage 2.
π» Repo
β thu-ml/Causal-Forcing β 665 stars
π paper
via @Papers.Data.Code
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
π€ Min Zhao, Hongzhou Zhu, Kaiwen Zheng et al.
π― Task
Real-time autoregressive video generation
π‘ Idea
Instead of costly AR-teacher ODE trajectory distillation, it initializes few-step AR students with causal consistency distillation: same AR flow-map target, but learned from single online adjacent-step teacher updates, making frame-wise 1-2 step rollout practical.
β¨ Why it's interesting
At frame-wise 2-step, beats 4-step chunk-wise Causal Forcing by +0.1 VBench Total, +0.3 Quality, +0.335 VisionReward; 50% lower first-frame latency, ~4x cheaper Stage 2.
π» Repo
β thu-ml/Causal-Forcing β 665 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - thu-ml/Causal-Forcing: [ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Rightβ¦
[ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forci...
π Paper #Paper #Multimodal #LongContextModeling #VisionLanguageModels
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
π€ Zhaowei Wang, Lishu Luo, Haodong Duan et al.
π― Task
Long-context vision-language modeling
π‘ Idea
Instead of OCR-style long-data training, use instruction-formatted long-document VQA. A balanced length mix and retrieval-heavy task mixture beat 128K-focused or transcription-based training for extending LVLM context.
β¨ Why it's interesting
With 5B tokens, MMProLong improves long-doc VQA by 7.1%, stays strong at 256K/512K beyond 128K training, and exceeds baselines by 20%+ there.
π paper
via @Papers.Data.Code
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
π€ Zhaowei Wang, Lishu Luo, Haodong Duan et al.
π― Task
Long-context vision-language modeling
π‘ Idea
Instead of OCR-style long-data training, use instruction-formatted long-document VQA. A balanced length mix and retrieval-heavy task mixture beat 128K-focused or transcription-based training for extending LVLM context.
β¨ Why it's interesting
With 5B tokens, MMProLong improves long-doc VQA by 7.1%, stays strong at 256K/512K beyond 128K training, and exceeds baselines by 20%+ there.
π paper
via @Papers.Data.Code
arXiv.org
Training Long-Context Vision-Language Models Effectively with...
Long-context modeling is becoming a core capability of modern large vision-language models (LVLMs), enabling sustained context management across long-document understanding, video analysis, and...
π Dataset #Dataset #LLM #HallucinationDetection #Multilingual
LLM Hallucination Benchmark Dataset
π€ alitaqishah
π― Task
LLM hallucination detection and analysis
π‘ Idea
200 annotated LLM responses spanning 5 models, 8 domains, 7 languages, 7 hallucination types, 4 annotator types, and 4 mitigation strategies, with prompt, response, hallucination label, span, severity, and verified correction.
β¨ Why it's interesting
Makes cross-model, multilingual hallucination detection and mitigation evaluation directly measurable with typed error labels and corrected references.
β 200 annotated responses
π dataset
via @Papers.Data.Code
LLM Hallucination Benchmark Dataset
π€ alitaqishah
π― Task
LLM hallucination detection and analysis
π‘ Idea
200 annotated LLM responses spanning 5 models, 8 domains, 7 languages, 7 hallucination types, 4 annotator types, and 4 mitigation strategies, with prompt, response, hallucination label, span, severity, and verified correction.
β¨ Why it's interesting
Makes cross-model, multilingual hallucination detection and mitigation evaluation directly measurable with typed error labels and corrected references.
β 200 annotated responses
π dataset
via @Papers.Data.Code
Kaggle
LLM Hallucination Benchmark Dataset
Multi-model hallucination labels across domains, tasks & languages (2024β25)
π Paper #Paper #LLM #ReinforcementLearning #KnowledgeDistillation
Self-Distilled Agentic Reinforcement Learning
π€ Zhengxi Lu, Zhiyuan Yao, Zhuowen Han et al.
π― Task
Agentic RL for multi-turn LLMs
π‘ Idea
Instead of naively mixing OPSD with RL, SDAR keeps RL as the backbone and uses detached token-level gates to apply distillation selectivelyβamplifying positive teacher-student gaps and softening negative teacher rejections.
β¨ Why it's interesting
Beats GRPO by +9.4% on ALFWorld, +7.0% on Search-QA, and +10.2% WebShop-Acc, while avoiding naive GRPO+OPSD instability.
π» Repo
β ZJU-REAL/SDAR β 96 stars
π paper
via @Papers.Data.Code
Self-Distilled Agentic Reinforcement Learning
π€ Zhengxi Lu, Zhiyuan Yao, Zhuowen Han et al.
π― Task
Agentic RL for multi-turn LLMs
π‘ Idea
Instead of naively mixing OPSD with RL, SDAR keeps RL as the backbone and uses detached token-level gates to apply distillation selectivelyβamplifying positive teacher-student gaps and softening negative teacher rejections.
β¨ Why it's interesting
Beats GRPO by +9.4% on ALFWorld, +7.0% on Search-QA, and +10.2% WebShop-Acc, while avoiding naive GRPO+OPSD instability.
π» Repo
β ZJU-REAL/SDAR β 96 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - ZJU-REAL/SDAR: Official code for "Self-Distilled Agentic Reinforcement Learning"
Official code for "Self-Distilled Agentic Reinforcement Learning" - ZJU-REAL/SDAR