Papers.Data.Code

📄 Paper #Paper #LLM #TestTimeScaling #Reasoning

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
👤 Tong Zheng, Haolin Liu, Chengsong Huang et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Offline replay controller synthesis for width-depth TTS: an agent learns branch, continue, probe, prune, and stop rules from pre-collected trajectories, using single-β parameterization and execution-trace feedback.

✨ Why it's interesting
Discovery costs $39.9/160 min and improves accuracy-cost tradeoffs over hand-crafted baselines.

💻 Repo
⭐ zhengkid/AutoTTS — 43 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - zhengkid/AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"

The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling" - zhengkid/AutoTTS

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #TimeSeries #GlobalAI #CountryIndicators

AI Index Data: Growth, Talent (Cambridge/Harvard)
👤 patelris

🎯 Task
Global AI readiness and growth analysis

💡 Idea
259,546 verified rows in tidy long format across 24,453 indicators, 227 countries and territories, and 8 source systems spanning benchmarks, talent, infrastructure, governance, patents, skills, and GovTech.

✨ Why it's interesting
Harmonized multi-source panel data enables cross-country trend, clustering, and forecasting analyses over 27 years.

Size: 259,546 observations, 24,453 indicators

Downloads: 242 | Likes: 28

🔗 dataset

via @Papers.Data.Code

Kaggle

AI Index Data: Growth, Talent (Cambridge/Harvard)

259K observations across 24K+ AI metrics from Cambridge/Harvard

5 views13:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #PostTraining

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
👤 Yun Qu, Qi Wang, Yixiu Mao et al.

🎯 Task
LLM post-training with verifiable rewards

💡 Idea
Explicit target-projection on the response simplex: build a closed-form Gibbs target over sampled responses, then project the policy to it with forward or reverse KL instead of implicit group-based policy gradients.

✨ Why it's interesting
Across reasoning tasks and LLM backbones, LPO consistently beats matched PG baselines in Pass@1/Pass@k.

🔗 paper

via @Papers.Data.Code

arXiv.org

Listwise Policy Optimization: Group-based RLVR as...

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes,...

5 views16:00

Papers.Data.Code

💻 Repo #Repo #Robotics #InertialOdometry #SelfSupervised

Kiss Imu
👤 sparolab

🎯 Task
Self-supervised inertial odometry

💡 Idea
Train an IMU odometry model from raw IMU plus LiDAR-odometry pseudo-labels, using motion-balanced sampling and a frequency gate to better cover under-represented motion regimes.

✨ Why it's interesting
Handles under-represented motion regimes during training via motion-balanced sampling and frequency gating.

💻 Repo
⭐ sparolab/KISS-IMU — 63 stars (+43 3d)
Python

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - sparolab/KISS-IMU: KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference.…

KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference. @ ICRA'26 Award Finalist - sparolab/KISS-IMU

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #TestTimeScaling #MultiAgentReasoning

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
👤 George Wu, Nan Jing, Qing Yi et al.

🎯 Task
Test-time scaling for LLM reasoning

💡 Idea
Multi-agent inference with hierarchical memories: an experience bank stores reliable intermediate conclusions and feedback, while a guideline bank tracks explored strategies to avoid redundancy; hybrid-reward RL trains correctness, memory use, and novel exploration.

✨ Why it's interesting
On challenging reasoning benchmarks, it shows stronger iterative scaling than prior TTS baselines.

💻 Repo
⭐ george-QF/TMAS-code — 4 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - george-QF/TMAS-code

Contribute to george-QF/TMAS-code development by creating an account on GitHub.

🔥1

6 views10:00

Papers.Data.Code

📊 Dataset #Dataset #Multimodal #HyperspectralImaging #RemoteSensing

Hyperspectral Invasive Detection Dataset
👤 ziya07

🎯 Task
Hyperspectral invasive plant classification

💡 Idea
Hyperspectral vegetation observations with .mat image cubes plus tabular metadata: 10 spectral bands, PCA and Gabor features, geolocation, environmental variables, species/status labels, confidence, and ground truth across ecological regions.

✨ Why it's interesting
Combines spectra, texture, environment, and verified labels to support invasive-species detection and ecological mapping studies.

Downloads: 33 | Likes: 13

🔗 dataset

via @Papers.Data.Code

Kaggle

Hyperspectral Invasive Detection Dataset

Spectral-Spatial Vegetation Features for Intelligent Ecological Mapping

6 views13:00

Papers.Data.Code

🔥 Repo #Repo #LLM #Metal #KvCache

Ds4
👤 antirez

🎯 Task
Local LLM inference and serving

💡 Idea
Run DeepSeek V4 Flash locally on Apple Metal with a model-specific engine, chat CLI, OpenAI/Anthropic-compatible server, long-context support, and disk-persistent KV cache to reuse prompt prefixes across sessions.

✨ Why it's interesting
Supports up to 1M-token context and disk KV persistence; reports 468 t/s prefill on M3 Ultra q2.

💻 Repo
⭐ antirez/ds4 — 8.0k stars (+5.3k 3d)
C

via @Papers.Data.Code

GitHub

GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal and CUDA

DeepSeek 4 Flash local inference engine for Metal and CUDA - antirez/ds4

5 views16:00

Papers.Data.Code

📄 Paper #Paper #LLM #MemoryMechanisms #Attention

δ-mem: Efficient Online Memory for Large Language Models
👤 Jingdi Lei, Di Zhang, Junxian Li et al.

🎯 Task
Long-term memory augmentation for LLMs

💡 Idea
Instead of storing history as extra tokens, retrieval text, or static adapters, it keeps a fixed-size online associative state and turns its readout into low-rank attention corrections for a frozen backbone.

✨ Why it's interesting
With only an 8×8 state, average score reaches 1.10× the frozen backbone and 1.15× the best non-δ-mem baseline; 1.31× on MemoryAgentBench and 1.20× on LoCoMo.

💻 Repo
⭐ declare-lab/delta-Mem — 53 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - declare-lab/delta-Mem: The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models

The official repo of the paper: delta-Mem: Efficient Online Memory for Large Language Models - declare-lab/delta-Mem

6 views08:00

Papers.Data.Code

📊 Dataset #Dataset #Tabular #Epidemiology #InfectiousDisease

🦠 Hantavirus (Andes Virus) — Global Epidemiology
👤 zkskhurram

🎯 Task
Infectious disease epidemiology analysis

💡 Idea
7 linked tables covering 25 countries across 5 WHO regions: yearly data from 1993–2025, outbreaks, monthly trends, clinical outcomes, environmental risk factors, virus strains, and a consolidated master table.

✨ Why it's interesting
Combines epidemiology, clinical, environmental, and strain data in one dataset, enabling cross-country HPS/HFRS trend and risk analysis from a single source.

Size: 7 tables, 25 countries, 1993–2025

📊 Dataset
📥 662 downloads
❤️ 26 likes

🔗 dataset

via @Papers.Data.Code

Kaggle

🦠 Hantavirus (Andes Virus) — Global Epidemiology

🌍 Comprehensive worldwide dataset covering HPS/HFRS cases, clinical outcomes

6 views10:00

Papers.Data.Code

💻 Repo #Repo #CV #4dReconstruction #DynamicScenes

D4rt
👤 lucidrains

🎯 Task
dynamic scene reconstruction from video

💡 Idea
Predict 3D points in dynamic scenes from video plus coordinate and time queries, with a trainable PyTorch model that can return losses for supervision or direct point predictions.

✨ Why it's interesting
Provides a ready-to-use D4RT implementation with batched variable-length video/query handling for 4D reconstruction experiments.

💻 Repo
⭐ lucidrains/d4rt — 50 stars (+50 3d)
Python

via @Papers.Data.Code

GitHub

GitHub - lucidrains/d4rt: Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind

Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, Deepmind - lucidrains/d4rt

6 views13:00

Papers.Data.Code

📄 Paper #Paper #Multimodal #VisionLanguageModels #ImageGeneration

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
👤 Haiwen Diao, Penghao Wu, Hanming Deng et al.

🎯 Task
Unified multimodal understanding and generation

💡 Idea
Instead of bolting together encoder-based understanding and VAE/diffusion generation, it uses one native pixel-text backbone with shared attention and stream-specific MoT blocks, trained jointly for text prediction and pixel-space flow matching.

✨ Why it's interesting
Authors claim it rivals top understanding-only VLMs and outperforms prior open-source unified models across understanding, reasoning, and generation; generation runs at 32× compression.

💻 Repo
⭐ OpenSenseNova/SenseNova-U1 — 1.7k stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - OpenSenseNova/SenseNova-U1: SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles

SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles - OpenSenseNova/SenseNova-U1

6 views16:00

Papers.Data.Code

📄 Paper #Paper #CV #VideoGeneration #DiffusionModels

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
👤 Yuchao Gu, Guian Fang, Yuxin Jiang et al.

🎯 Task
Any-step video generation

💡 Idea
Instead of endpoint consistency maps for fixed few-step sampling, it learns arbitrary-time flow-map transitions along the full ODE path, then uses shortcut backward simulation for on-policy distillation to cut discretization error and causal exposure bias.

✨ Why it's interesting
On 14B T2V, it gets 84.05 VBench at 4 NFEs and 84.41 at 32; beats Krea-Realtime-14B's 83.25 at 4 and rCM-14B's 83.73 at 4.

💻 Repo
⭐ NVlabs/AnyFlow — 202 stars
⭐ NVLabs/AnyFlow — 202 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - NVlabs/AnyFlow

Contribute to NVlabs/AnyFlow development by creating an account on GitHub.

6 views08:00

Papers.Data.Code

📄 Paper #Paper #LLM #ReinforcementLearning #AgentTraining

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
👤 Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang et al.

🎯 Task
Long-form deep research agent training

💡 Idea
Instead of using rubrics only to score final answers, RubricEM uses them to structure execution, reward each stage, and store experience. It decomposes research into Plan/Research/Review/Answer, applies stagewise GRPO for denser credit, and jointly trains a reflection policy as reusable memory.

✨ Why it's interesting
RubricEM-8B outperforms comparable open models on 4 long-form research benchmarks and approaches proprietary deep-research systems after 1400 RL steps.

🔗 paper

via @Papers.Data.Code

arXiv.org

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond...

Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the regime of verifiable rewards. Their...

5 views10:00

Papers.Data.Code

📊 Dataset #Dataset #NLP #StemReasoning #VisualQuestionAnswering

open-mm-rl
👤 TuringEnterprises

🎯 Task
Multimodal STEM question answering

💡 Idea
40 MIT-licensed STEM QA examples across physics, math, biology, and chemistry, spanning single-image, multi-panel, and multi-image formats with deterministic final answers.

✨ Why it's interesting
Deterministic, programmatically checkable answers make advanced multimodal STEM reasoning benchmarkable for RL and outcome-supervised training.

Size: 40 examples, 15.5 MB

📊 Dataset
📥 2.6k downloads
❤️ 94 likes

🔗 dataset

via @Papers.Data.Code

huggingface.co

TuringEnterprises/Open-MM-RL · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

5 views13:00

Papers.Data.Code

💻 Repo #Repo #CV #ImageToVideo #2kGeneration

Swifti2v
👤 HKUST-LongGroup

🎯 Task
high-resolution image-to-video generation

💡 Idea
Generate native 2K videos from a single image by first producing a low-res motion reference, then refining to high resolution while conditioning on both the input image and the Stage I video.

✨ Why it's interesting
Matches strong 2K end-to-end I2V baselines on key VBench-I2V metrics with 202× less GPU-time; 81-frame 2K output runs in ~111s on one H800 and fits on a 24 GB RTX 4090.

💻 Repo
⭐ HKUST-LongGroup/SwiftI2V — 71 stars (+47 3d)
HTML

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - HKUST-LongGroup/SwiftI2V: Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional…

Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation" - HKUST-LongGroup/SwiftI2V

4 views16:00

Papers.Data.Code

📋 Weekly Digest · May 09 – May 16
#WeeklyDigest

📄 Papers

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
#VideoGeneration #DiffusionModels #Distillation
Any-step video diffusion ⟶ 84.05 VBench at 4 NFEs
→ Learn more...

Flow-OPD: On-Policy Distillation for Flow Matching Models
#TextToImage #KnowledgeDistillation #ReinforcementLearning
On-policy flow distillation ⟶ boosts GenEval and OCR
→ Learn more...

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
#VisionLanguageModels #ImageGeneration #MixtureOfExperts
NEO-unify multimodal model ⟶ unifies understanding and generation
→ Learn more...

δ-mem: Efficient Online Memory for Large Language Models
#MemoryMechanisms #Attention #ParameterEfficientTuning
Online associative memory ⟶ steers attention for long-horizon tasks
→ Learn more...

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
#TestTimeScaling #Reasoning #AgenticSearch
Offline replay controller ⟶ improves accuracy-cost tradeoffs
→ Learn more...

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
#ReinforcementLearning #AgentTraining #LongContextReasoning
Rubric-guided meta-RL ⟶ stagewise credit for research agents
→ Learn more...

💻 Repos

antirez/ds4 ⭐
#Metal #KvCache #OpenaiCompatible
Metal local inference ⟶ 1M context with disk KV cache
→ Learn more...

facebookresearch/ProgramBench ⭐
#Benchmark #SoftwareEngineering #ReverseEngineering
Program reconstruction benchmark ⟶ tests LM reverse engineering
→ Learn more...

sparolab/KISS-IMU ⭐
#InertialOdometry #SelfSupervised #LidarPseudoLabels
Self-supervised IMU odometry ⟶ denoises raw IMU with LiDAR labels
→ Learn more...

📊 Datasets

AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset ⟶ cross-country trend forecasting
→ Learn more...

giant-permissive-image-corpus
#ImageGeneration #PermissiveLicense #ImageDataset
Permissive image corpus ⟶ trains visual generation
→ Learn more...

➡️ Tomorrow — Efficient ML Monthly

via @Papers.Data.Code

👍1

6 views09:00

Papers.Data.Code

📈 Monthly · Efficient ML · Apr 17 – May 17
#MonthlyDigest #EfficientML

📄 Papers

Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
#DiffusionModels #DecisionTrees #KnowledgeDistillation
Trees and flows ⟶ faster tabular generation
→ Learn more...

📊 Datasets

MSR-ACC/TAE25
#QuantumChemistry #AtomizationEnergy #CoupledCluster
Quantum chemistry dataset ⟶ trains atomization energy models
→ Learn more...

AI Index Data: Growth, Talent (Cambridge/Harvard)
#GlobalAI #CountryIndicators #LongitudinalData
Global AI panel dataset ⟶ cross-country trend forecasting
→ Learn more...

WHO Global Health Indicators for Prediction
#GlobalHealth #CountryLevel #WorldBank
Global health panel data ⟶ cross-country trend analysis
→ Learn more...

⚡ Trends

▸ Longitudinal country-level datasets increasingly target forecasting and cross-country trend analysis.
▸ Wide, linked, multi-table dataset formats are becoming standard for benchmarking.
▸ Efficiency gains come from unifying model families and distilling complex systems.

🧭 TL;DR

📄 Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
Unifies trees and diffusion, delivering faster tabular generation and effective distillation.

💡 Efficiency advances increasingly come from unifying classical structures with generative modeling.

via @Papers.Data.Code

5 views15:00

Papers.Data.Code

📄 Paper #Paper #LLM #Reasoning #ReinforcementLearning

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
👤 Yafu Li, Runzhe Zhan, Haoran Zhang et al.

🎯 Task
Olympiad-level mathematical and scientific reasoning

💡 Idea
Instead of domain-specific systems, it uses one scaling recipe: reverse-perplexity long-CoT SFT to instill proof search and self-checking, then coarse verifiable-reward RL, proof-level RL with self-refinement/replay, and test-time verification loops.

✨ Why it's interesting
SU-01 gets 57.6% on IMO-ProofBench, 70.2% with test-time scaling, and reaches the IMO 2025 gold line with 35 points.

💻 Repo
⭐ Simplified-Reasoning/SU-01 — 68 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - Simplified-Reasoning/SU-01: SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling - Simplified-Reasoning/SU-01

4 views08:00

Papers.Data.Code

📊 Dataset #Dataset #LLM #SoftwareEngineering #ToolUse

Orchard
👤 microsoft

🎯 Task
Agentic software engineering and web GUI interaction

💡 Idea
~110K agent trajectories in 2 parallel subsets: 107,185 SWE chat+tool rollouts over 2,788 GitHub repos with hidden-test pass/fail labels, plus 3,070 GUI decision-point rows with screenshots, chat context, and judge-verified rewards across 409 web tasks.

✨ Why it's interesting
Verified patch outcomes and judge-scored GUI steps make agent training and evaluation measurable across real coding and browser tasks.

Ⓢ 110,255 samples, ~10.97 GB

🔗 dataset

via @Papers.Data.Code

huggingface.co

microsoft/Orchard · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

5 views10:00

Papers.Data.Code

💻 Repo #Repo #CV #DepthEstimation #CameraPose

vggt Omega
👤 facebookresearch

🎯 Task
multi-view camera and depth reconstruction

💡 Idea
Infer camera parameters and per-image depth from a set of images in one forward pass, and optionally produce text-aligned embeddings for the same visual inputs.

✨ Why it's interesting
Runs end-to-end on a single A100 with 6.02 GB for 1 frame and 43.15 GB for 500 frames, with released 1B checkpoints and demo code.

💻 Repo
⭐ facebookresearch/vggt-omega — 413 stars (+413 3d)
Python

via @Papers.Data.Code

GitHub

GitHub - facebookresearch/vggt-omega: [CVPR 2026 Oral] VGGT Omega

[CVPR 2026 Oral] VGGT Omega. Contribute to facebookresearch/vggt-omega development by creating an account on GitHub.

5 views13:00

Papers.Data.Code

📄 Paper #Paper #CV #VideoGeneration #DiffusionModels

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
👤 Min Zhao, Hongzhou Zhu, Kaiwen Zheng et al.

🎯 Task
Real-time autoregressive video generation

💡 Idea
Instead of costly AR-teacher ODE trajectory distillation, it initializes few-step AR students with causal consistency distillation: same AR flow-map target, but learned from single online adjacent-step teacher updates, making frame-wise 1-2 step rollout practical.

✨ Why it's interesting
At frame-wise 2-step, beats 4-step chunk-wise Causal Forcing by +0.1 VBench Total, +0.3 Quality, +0.335 VisionReward; 50% lower first-frame latency, ~4x cheaper Stage 2.

💻 Repo
⭐ thu-ml/Causal-Forcing — 665 stars

🔗 paper

via @Papers.Data.Code

GitHub

GitHub - thu-ml/Causal-Forcing: [ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right…

[ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forci...

4 views16:00

About

Blog

Apps

Platform