π₯ Repo #Repo #MixtureOfExperts #MixtureOfExperts #Quantization
Tile Kernels
π€ deepseek-ai
π― Task
LLM GPU kernel optimization
π‘ Idea
Optimized TileLang kernels for LLM ops, including top-k MoE gating/routing, FP8/FP4/E5M6 quantization, batched transpose, Engram, and Manifold HyperConnection, plus trainable torch.autograd.Function wrappers for higher-level layers.
β¨ Why it's interesting
Authors say most kernels approach hardware limits for compute intensity and memory bandwidth.
π» Repo
β deepseek-ai/TileKernels β 1.2k stars (+1.1k 3d)
Python
via @Papers.Data.Code
Tile Kernels
π€ deepseek-ai
π― Task
LLM GPU kernel optimization
π‘ Idea
Optimized TileLang kernels for LLM ops, including top-k MoE gating/routing, FP8/FP4/E5M6 quantization, batched transpose, Engram, and Manifold HyperConnection, plus trainable torch.autograd.Function wrappers for higher-level layers.
β¨ Why it's interesting
Authors say most kernels approach hardware limits for compute intensity and memory bandwidth.
π» Repo
β deepseek-ai/TileKernels β 1.2k stars (+1.1k 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - deepseek-ai/TileKernels: A kernel library written in tilelang
A kernel library written in tilelang. Contribute to deepseek-ai/TileKernels development by creating an account on GitHub.
π Paper #Paper #CV #NovelViewSynthesis #VideoDiffusion
Vista4D: Video Reshooting with 4D Point Clouds
π€ Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca et al.
π― Task
Video reshooting
π‘ Idea
4D-grounded point clouds with temporally persistent static pixels guide a video diffusion model, plus training on noisy reconstructed multiview data to preserve seen content and improve camera control under real-world artifacts.
β¨ Why it's interesting
Best camera/3D consistency; user study wins 67.06% preservation, 68.17% camera, 77.38% fidelity.
π» Repo
β Eyeline-Labs/Vista4D β 88 stars
π paper
via @Papers.Data.Code
Vista4D: Video Reshooting with 4D Point Clouds
π€ Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca et al.
π― Task
Video reshooting
π‘ Idea
4D-grounded point clouds with temporally persistent static pixels guide a video diffusion model, plus training on noisy reconstructed multiview data to preserve seen content and improve camera control under real-world artifacts.
β¨ Why it's interesting
Best camera/3D consistency; user study wins 67.06% preservation, 68.17% camera, 77.38% fidelity.
π» Repo
β Eyeline-Labs/Vista4D β 88 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - Eyeline-Labs/Vista4D: Official code, models, and data for Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight)
Official code, models, and data for Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) - Eyeline-Labs/Vista4D
β€1
π Paper #Paper #LLM #TimeSeries #VisionLanguageModels
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
π€ Yueyang Ding, HaoPeng Zhang, Rui Dai et al.
π― Task
Time series reasoning
π‘ Idea
Dual-view VLM input uses a time-series plot plus an index-value table for precise numerical grounding, then curriculum fine-tunes across L1-L3 reasoning levels on the 83k-sample HiTSR dataset.
β¨ Why it's interesting
Best OOD results: 86.8% L1, 75.6% local L2, 97.5% global L2, 67.0% L3 accuracy.
π» Repo
β RainingNovember/LLaTiSA β 76 stars
π paper
via @Papers.Data.Code
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
π€ Yueyang Ding, HaoPeng Zhang, Rui Dai et al.
π― Task
Time series reasoning
π‘ Idea
Dual-view VLM input uses a time-series plot plus an index-value table for precise numerical grounding, then curriculum fine-tunes across L1-L3 reasoning levels on the 83k-sample HiTSR dataset.
β¨ Why it's interesting
Best OOD results: 86.8% L1, 75.6% local L2, 97.5% global L2, 67.0% L3 accuracy.
π» Repo
β RainingNovember/LLaTiSA β 76 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - RainingNovember/LLaTiSA: This is the official repository of "LLaTiSA: Towards Difficulty-Stratified Time Series Reasoningβ¦
This is the official repository of "LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics". - RainingNovember/LLaTiSA
π Dataset #Dataset #Classification #Classification #Regression
Sleep Health & Daily Performance Dataset
π€ mohankrishnathalla
π― Task
Sleep health prediction
π‘ Idea
100K records, 32 columns, and 3 targets spanning regression, multiclass, and binary tasks. Structured daily snapshots cover sleep metrics, behaviors, mental state, cognitive outcomes, 12 occupations, and 15 countries with no missing values.
β¨ Why it's interesting
100K rows + 3 targets enable benchmarkable sleep, risk, and cognition models from beginner to expert level.
Size: 100K records, 32 columns, 14.3 MB
Downloads: 2.9k | Likes: 49
π dataset
via @Papers.Data.Code
Sleep Health & Daily Performance Dataset
π€ mohankrishnathalla
π― Task
Sleep health prediction
π‘ Idea
100K records, 32 columns, and 3 targets spanning regression, multiclass, and binary tasks. Structured daily snapshots cover sleep metrics, behaviors, mental state, cognitive outcomes, 12 occupations, and 15 countries with no missing values.
β¨ Why it's interesting
100K rows + 3 targets enable benchmarkable sleep, risk, and cognition models from beginner to expert level.
Size: 100K records, 32 columns, 14.3 MB
Downloads: 2.9k | Likes: 49
π dataset
via @Papers.Data.Code
Kaggle
Sleep Health & Daily Performance Dataset
100K records Β· sleep, lifestyle & cognitive scores across 12 occupations
π€1
π Paper #Paper #LLM #AgenticRL #ToolUse
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
π€ Venus Team, Sunhao Dai, Yong Deng et al.
π― Task
Edge-scale deep research agents
π‘ Idea
Cleaned and resampled long-horizon trajectories plus IGPO-based turn-level RL with information-gain rewards and format penalties train a 4B agent from ~10K open data, targeting dense credit assignment for long research runs.
β¨ Why it's interesting
Beats prior agentic models under 9B on multiple deep research benchmarks.
π» Repo
β inclusionAI/DR-Venus β 50 stars
β verl-project/verl β 50 stars
π paper π dataset π dataset
via @Papers.Data.Code
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
π€ Venus Team, Sunhao Dai, Yong Deng et al.
π― Task
Edge-scale deep research agents
π‘ Idea
Cleaned and resampled long-horizon trajectories plus IGPO-based turn-level RL with information-gain rewards and format penalties train a 4B agent from ~10K open data, targeting dense credit assignment for long research runs.
β¨ Why it's interesting
Beats prior agentic models under 9B on multiple deep research benchmarks.
π» Repo
β inclusionAI/DR-Venus β 50 stars
β verl-project/verl β 50 stars
π paper π dataset π dataset
via @Papers.Data.Code
GitHub
GitHub - inclusionAI/DR-Venus
Contribute to inclusionAI/DR-Venus development by creating an account on GitHub.
π» Repo #Repo #CV #FaceVerification #Webassembly
Face X
π€ facex-engine
π― Task
Face verification
π‘ Idea
Local face embedding engine for browser, C, Go, Python, and CLI. It computes 512-d embeddings and cosine similarity, with no dependencies, optional encrypted weights, and SIMD-optimized CPU inference.
β¨ Why it's interesting
Claims 3.0 ms native latency, 99.73% LFW accuracy, and 1.30x faster inference than ONNX Runtime.
π» Repo
β facex-engine/facex β 82 stars (+82 3d)
C
π paper
via @Papers.Data.Code
Face X
π€ facex-engine
π― Task
Face verification
π‘ Idea
Local face embedding engine for browser, C, Go, Python, and CLI. It computes 512-d embeddings and cosine similarity, with no dependencies, optional encrypted weights, and SIMD-optimized CPU inference.
β¨ Why it's interesting
Claims 3.0 ms native latency, 99.73% LFW accuracy, and 1.30x faster inference than ONNX Runtime.
π» Repo
β facex-engine/facex β 82 stars (+82 3d)
C
π paper
via @Papers.Data.Code
GitHub
GitHub - facex-engine/facex: Face verification in the browser. 74 KB WebAssembly. No server, no cloud, no dependencies. Also runsβ¦
Face verification in the browser. 74 KB WebAssembly. No server, no cloud, no dependencies. Also runs native at 3ms on CPU. - facex-engine/facex
π Paper #Paper #CV #TextToVideo #ReinforcementLearning
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
π€ Weijie Wang, Xiaoxuan He, Youping Gu et al.
π― Task
3D-consistent text-to-video generation
π‘ Idea
Flow-GRPO fine-tunes a video model with rewards from 3D reconstruction, meta-view VLM scoring, trajectory alignment, and aesthetics; camera motion is injected by warping latent noise instead of adding control modules.
β¨ Why it's interesting
Improves 3D consistency by 10.23 dB and 7.91 dB PSNR while preserving general video quality.
π» Repo
β microsoft/World-R1 β 197 stars
π paper
via @Papers.Data.Code
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
π€ Weijie Wang, Xiaoxuan He, Youping Gu et al.
π― Task
3D-consistent text-to-video generation
π‘ Idea
Flow-GRPO fine-tunes a video model with rewards from 3D reconstruction, meta-view VLM scoring, trajectory alignment, and aesthetics; camera motion is injected by warping latent noise instead of adding control modules.
β¨ Why it's interesting
Improves 3D consistency by 10.23 dB and 7.91 dB PSNR while preserving general video quality.
π» Repo
β microsoft/World-R1 β 197 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - microsoft/World-R1: [ICML 2026] World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
[ICML 2026] World-R1: Reinforcing 3D Constraints for Text-to-Video Generation - microsoft/World-R1
π Dataset #Dataset #NLP #CompetitionMath #Multimodal
MathNet v0 β Olympiad Math Reasoning & Retrieval
π€ ShadenA
π― Task
Olympiad math reasoning and retrieval
π‘ Idea
27,817 problems in v0 from 58 country/regional configs, with problem markdown, official solutions, topic paths, language, provenance, and 7,541 inline images; sourced from official booklets across 47 countries and 17 languages.
β¨ Why it's interesting
30K-scale multilingual expert data enables hard reasoning, retrieval, and RAG evaluation beyond small English-only math sets.
Size: 27,817 problems, 7,541 images, 58 configs
Downloads: 9.3k | Likes: 26
π dataset π paper π repo
via @Papers.Data.Code
MathNet v0 β Olympiad Math Reasoning & Retrieval
π€ ShadenA
π― Task
Olympiad math reasoning and retrieval
π‘ Idea
27,817 problems in v0 from 58 country/regional configs, with problem markdown, official solutions, topic paths, language, provenance, and 7,541 inline images; sourced from official booklets across 47 countries and 17 languages.
β¨ Why it's interesting
30K-scale multilingual expert data enables hard reasoning, retrieval, and RAG evaluation beyond small English-only math sets.
Size: 27,817 problems, 7,541 images, 58 configs
Downloads: 9.3k | Likes: 26
π dataset π paper π repo
via @Papers.Data.Code
huggingface.co
ShadenA/MathNet Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #Cpp #Cpp #Gguf
Llama Cpp Deep Seek V4 Flash
π€ antirez
π― Task
Local LLM inference
π‘ Idea
DeepSeek v4 Flash support in llama.cpp with generated GGUFs using 2-bit quantization of routed experts, targeting MacBooks with 128GB RAM; works with CPU and Metal backends.
β¨ Why it's interesting
Targets 128GB MacBooks for local DSv4 inference; Metal backend is faster than CPU.
π» Repo
β antirez/llama.cpp-deepseek-v4-flash β 124 stars (+124 3d)
C++
π paper π paper π paper
via @Papers.Data.Code
Llama Cpp Deep Seek V4 Flash
π€ antirez
π― Task
Local LLM inference
π‘ Idea
DeepSeek v4 Flash support in llama.cpp with generated GGUFs using 2-bit quantization of routed experts, targeting MacBooks with 128GB RAM; works with CPU and Metal backends.
β¨ Why it's interesting
Targets 128GB MacBooks for local DSv4 inference; Metal backend is faster than CPU.
π» Repo
β antirez/llama.cpp-deepseek-v4-flash β 124 stars (+124 3d)
C++
π paper π paper π paper
via @Papers.Data.Code
GitHub
GitHub - antirez/llama.cpp-deepseek-v4-flash: Experimental implementation of DeepSeek v4 flaash in llama.cpp
Experimental implementation of DeepSeek v4 flaash in llama.cpp - antirez/llama.cpp-deepseek-v4-flash
π Paper #Paper #CV #MultimodalLearning #ImageGeneration
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
π€ Zhiheng Liu, Weiming Ren, Xiaoke Huang et al.
π― Task
Unified multimodal understanding and generation
π‘ Idea
Direct patch embeddings replace VAE and representation encoders, so one transformer handles text, images, and pixel-space generation end to end. A masking-based visual feature learning scheme stabilizes training and improves pixel-space representations.
β¨ Why it's interesting
At 7B, it reaches SOTA among native UMMs on understanding and stays competitive on generation.
π» Repo
β facebookresearch/tuna-2 β 139 stars
π paper
via @Papers.Data.Code
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
π€ Zhiheng Liu, Weiming Ren, Xiaoke Huang et al.
π― Task
Unified multimodal understanding and generation
π‘ Idea
Direct patch embeddings replace VAE and representation encoders, so one transformer handles text, images, and pixel-space generation end to end. A masking-based visual feature learning scheme stabilizes training and improves pixel-space representations.
β¨ Why it's interesting
At 7B, it reaches SOTA among native UMMs on understanding and stays competitive on generation.
π» Repo
β facebookresearch/tuna-2 β 139 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - facebookresearch/tuna-2: Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understandingβ¦
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation - facebookresearch/tuna-2
π Paper #Paper #MultiAgentSystems #MultiAgentSystems #Reasoning
Recursive Multi-Agent Systems
π€ Xiyuan Yang, Jiaru Zou, Rui Pan et al.
π― Task
Multi-agent LLM reasoning
π‘ Idea
Latent-state recursion across agents via lightweight RecursiveLink modules β agents pass and refine hidden states in a loop, with inner-outer training for whole-system credit assignment instead of text-based coordination.
β¨ Why it's interesting
Avg +8.3% accuracy, 1.2-2.4x faster inference, and 34.6-75.6% fewer tokens vs baselines.
π» Repo
β RecursiveMAS/RecursiveMAS β 30 stars
π paper
via @Papers.Data.Code
Recursive Multi-Agent Systems
π€ Xiyuan Yang, Jiaru Zou, Rui Pan et al.
π― Task
Multi-agent LLM reasoning
π‘ Idea
Latent-state recursion across agents via lightweight RecursiveLink modules β agents pass and refine hidden states in a loop, with inner-outer training for whole-system credit assignment instead of text-based coordination.
β¨ Why it's interesting
Avg +8.3% accuracy, 1.2-2.4x faster inference, and 34.6-75.6% fewer tokens vs baselines.
π» Repo
β RecursiveMAS/RecursiveMAS β 30 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - RecursiveMAS/RecursiveMAS: Offical Implementation for "Recursive Multi-Agent Systems"
Offical Implementation for "Recursive Multi-Agent Systems" - RecursiveMAS/RecursiveMAS
π Paper #Paper #AudioReasoning #AudioReasoning #Rlhf
Step-Audio-R1.5 Technical Report
π€ Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu et al.
π― Task
Audio reasoning for multi-turn spoken dialogue
π‘ Idea
RLHF with a rubric-guided generated reward model compares responses in multi-turn audio chats, optimizing naturalness, coherence, and instruction retention beyond label-only RLVR.
β¨ Why it's interesting
77.97 avg across 8 benchmarks, +5.47 over Step-Audio-R1; 41.15 on Audio MC.
π» Repo
β stepfun-ai/Step-Audio-R1 β 647 stars
π paper π dataset
via @Papers.Data.Code
Step-Audio-R1.5 Technical Report
π€ Yuxin Zhang, Xiangyu Tony Zhang, Daijiao Liu et al.
π― Task
Audio reasoning for multi-turn spoken dialogue
π‘ Idea
RLHF with a rubric-guided generated reward model compares responses in multi-turn audio chats, optimizing naturalness, coherence, and instruction retention beyond label-only RLVR.
β¨ Why it's interesting
77.97 avg across 8 benchmarks, +5.47 over Step-Audio-R1; 41.15 on Audio MC.
π» Repo
β stepfun-ai/Step-Audio-R1 β 647 stars
π paper π dataset
via @Papers.Data.Code
GitHub
GitHub - stepfun-ai/Step-Audio-R1
Contribute to stepfun-ai/Step-Audio-R1 development by creating an account on GitHub.
π₯1
π» Repo #Repo #TestDrivenDevelopment #TestDrivenDevelopment #CodingAgents
Evan Flow
π€ evanklem
π― Task
AI-assisted software development workflow
π‘ Idea
Single-entry workflow for Claude Code that orchestrates brainstorm β plan β execute β iterate, with vertical-slice TDD inside coding tasks, optional parallel coder/overseer subagents, and a hook blocking dangerous git commands.
β¨ Why it's interesting
Keeps users in control with approval checkpoints, no auto-commits, and blocked destructive git ops.
π» Repo
β evanklem/evanflow β 356 stars (+356 3d)
Shell
π paper
via @Papers.Data.Code
Evan Flow
π€ evanklem
π― Task
AI-assisted software development workflow
π‘ Idea
Single-entry workflow for Claude Code that orchestrates brainstorm β plan β execute β iterate, with vertical-slice TDD inside coding tasks, optional parallel coder/overseer subagents, and a hook blocking dangerous git commands.
β¨ Why it's interesting
Keeps users in control with approval checkpoints, no auto-commits, and blocked destructive git ops.
π» Repo
β evanklem/evanflow β 356 stars (+356 3d)
Shell
π paper
via @Papers.Data.Code
GitHub
GitHub - evanklem/evanflow: A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walkβ¦
A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk an idea from brainstorm β plan β execute β iterate, with checkpoints throughout. - evanklem/evanflow
π Paper #Paper #InstructionTuning #InstructionTuning #KnowledgeGraphs
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
π€ Chenkai Pan, Xinglong Xu, Yuhang Xu et al.
π― Task
Domain-specific LLM fine-tuning
π‘ Idea
Shared L1 concepts, L2 relations, and L3 reasoning chains drive both SFT data and benchmarks; failures are traced to concept gaps or reasoning deficits and repaired with targeted data patches.
β¨ Why it's interesting
Across 16 disciplines, one debug round let a 32B model beat GPT-5.4, Gemini-3-flash, and DeepSeek-v3.2.
π» Repo
β OpenRaiser/ProDa β 43 stars
π paper
via @Papers.Data.Code
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
π€ Chenkai Pan, Xinglong Xu, Yuhang Xu et al.
π― Task
Domain-specific LLM fine-tuning
π‘ Idea
Shared L1 concepts, L2 relations, and L3 reasoning chains drive both SFT data and benchmarks; failures are traced to concept gaps or reasoning deficits and repaired with targeted data patches.
β¨ Why it's interesting
Across 16 disciplines, one debug round let a 32B model beat GPT-5.4, Gemini-3-flash, and DeepSeek-v3.2.
π» Repo
β OpenRaiser/ProDa β 43 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - OpenRaiser/ProDa: π Data Engineering from Raw Corpora
π Data Engineering from Raw Corpora. Contribute to OpenRaiser/ProDa development by creating an account on GitHub.
π Paper #Paper #MultimodalAgents #MultimodalAgents #ToolUse
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
π€ V Team, Wenyi Hong, Xiaotao Gu et al.
π― Task
Multimodal agent foundation model
π‘ Idea
Native multimodal agent model with CogViT and multimodal multi-token prediction using <|image|> placeholders, plus joint RL over 30+ perception, reasoning, coding, and GUI tasks for end-to-end tool use.
β¨ Why it's interesting
Scores 94.8 on Design2Code and 75.7 on AndroidWorld; RL adds +4.9 on OSWorld.
π» Repo
β zai-org/GLM-V β 2.3k stars
β zai-org/ImageMining β 2.3k stars
β zai-org/GLM-skills β 2.3k stars
π paper
via @Papers.Data.Code
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
π€ V Team, Wenyi Hong, Xiaotao Gu et al.
π― Task
Multimodal agent foundation model
π‘ Idea
Native multimodal agent model with CogViT and multimodal multi-token prediction using <|image|> placeholders, plus joint RL over 30+ perception, reasoning, coding, and GUI tasks for end-to-end tool use.
β¨ Why it's interesting
Scores 94.8 on Design2Code and 75.7 on AndroidWorld; RL adds +4.9 on OSWorld.
π» Repo
β zai-org/GLM-V β 2.3k stars
β zai-org/ImageMining β 2.3k stars
β zai-org/GLM-skills β 2.3k stars
π paper
via @Papers.Data.Code
GitHub
GitHub - zai-org/GLM-V: GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning - zai-org/GLM-V
π Dataset #Dataset #RubricBasedEvaluation #RubricBasedEvaluation #PhysicianWritten
HealthBench Professional
π€ openai
π― Task
Clinical response evaluation
π‘ Idea
Structured medical eval examples with conversations, physician responses, and scored rubric items, labeled by use case, red-teaming vs good-faith, difficulty, and specialty.
β¨ Why it's interesting
Physician answers plus rubrics enable consistent scoring of model performance across clinical use cases.
Downloads: 5.7k | Likes: 43
π dataset π repo
via @Papers.Data.Code
HealthBench Professional
π€ openai
π― Task
Clinical response evaluation
π‘ Idea
Structured medical eval examples with conversations, physician responses, and scored rubric items, labeled by use case, red-teaming vs good-faith, difficulty, and specialty.
β¨ Why it's interesting
Physician answers plus rubrics enable consistent scoring of model performance across clinical use cases.
Downloads: 5.7k | Likes: 43
π dataset π repo
via @Papers.Data.Code
huggingface.co
openai/healthbench-professional Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #Gbnf #Gbnf #LlamaCpp
Structured Cot
π€ andthattoo
π― Task
Reasoning token compression for code generation
π‘ Idea
Constrain a model's thinking into short structured fields like GOAL/APPROACH/EDGE or GOAL/STATE/ALGO/EDGE/VERIFY at inference time, then let it generate code normally to reduce verbose CoT and compare free vs constrained runs.
β¨ Why it's interesting
No training; 22.4Γ fewer think tokens on HumanEval+ and +14 pp pass@1 on LiveCodeBench.
π» Repo
β andthattoo/structured-cot β 196 stars (+154 3d)
Python
π paper
via @Papers.Data.Code
Structured Cot
π€ andthattoo
π― Task
Reasoning token compression for code generation
π‘ Idea
Constrain a model's thinking into short structured fields like GOAL/APPROACH/EDGE or GOAL/STATE/ALGO/EDGE/VERIFY at inference time, then let it generate code normally to reduce verbose CoT and compare free vs constrained runs.
β¨ Why it's interesting
No training; 22.4Γ fewer think tokens on HumanEval+ and +14 pp pass@1 on LiveCodeBench.
π» Repo
β andthattoo/structured-cot β 196 stars (+154 3d)
Python
π paper
via @Papers.Data.Code
GitHub
GitHub - andthattoo/structured-cot: Structured Chain-of-Thought
Structured Chain-of-Thought. Contribute to andthattoo/structured-cot development by creating an account on GitHub.
π Paper #Paper #LLM #MultiAgentSystems #AgentOrchestration
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
π€ Zhengxu Yu, Yu Fu, Zhiyuan He et al.
π― Task
Multi-agent organization and coordination
π‘ Idea
Talent-Container architecture separates agent identity from runtime, while a Talent Market recruits verified agents on demand and E2R tree search plans, executes, and reviews tasks with formal guarantees.
β¨ Why it's interesting
84.67% success on PRDBench, beating prior SOTA by 15.48 points.
π» Repo
β 1mancompany/OneManCompany β 119 stars
π paper
via @Papers.Data.Code
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
π€ Zhengxu Yu, Yu Fu, Zhiyuan He et al.
π― Task
Multi-agent organization and coordination
π‘ Idea
Talent-Container architecture separates agent identity from runtime, while a Talent Market recruits verified agents on demand and E2R tree search plans, executes, and reviews tasks with formal guarantees.
β¨ Why it's interesting
84.67% success on PRDBench, beating prior SOTA by 15.48 points.
π» Repo
β 1mancompany/OneManCompany β 119 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - 1mancompany/OneManCompany: Build Your Agent Company with OMC
Build Your Agent Company with OMC. Contribute to 1mancompany/OneManCompany development by creating an account on GitHub.
π₯1
π Weekly Digest | Apr 25 β May 02
#WeeklyDigest
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video fine-tuning βΆ boosts 3D consistency PSNR
β Learn more...
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
#MultimodalAgents #ToolUse #ReinforcementLearning
Multimodal agent model βΆ tool use and GUI interaction
β Learn more...
Recursive Multi-Agent Systems
#MultiAgentSystems #Reasoning #LatentSpace
RecursiveMAS latent recursion βΆ +8.3% accuracy, fewer tokens
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Step-Audio-R1.5 Technical Report
#AudioReasoning #Rlhf #SpokenDialogue
Rubric-guided audio RLHF βΆ improves long-turn spoken dialogue
β Learn more...
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
#MultiAgentSystems #AgentOrchestration #TaskPlanning
OMC multi-agent orchestration βΆ 84.67% on PRDBench
β Learn more...
π» Repos
andthattoo/structured-cot β
#Gbnf #LlamaCpp #CodeBenchmarks
Grammar-constrained CoT βΆ 22.4Γ fewer think tokens
β Learn more...
deepseek-ai/TileKernels β
#MixtureOfExperts #Quantization #GpuKernels
TileLang GPU kernels βΆ LLM ops near hardware limits
β Learn more...
antirez/llama.cpp-deepseek-v4-flash β
#Cpp #Gguf #Quantization
llama.cpp DeepSeek v4 Flash βΆ local MacBook inference
β Learn more...
π Datasets
HΒ³D: High-quality Holistic 3D Editing Dataset
#3DEditing #InstructionFollowing #PartLevel
3D editing dataset βΆ trains part-level 3D editors
β Learn more...
MathNet v0 β Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset βΆ reasoning and retrieval benchmark
β Learn more...
β‘οΈ Tomorrow β Computer Vision Monthly
via @Papers.Data.Code
#WeeklyDigest
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video fine-tuning βΆ boosts 3D consistency PSNR
β Learn more...
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
#MultimodalAgents #ToolUse #ReinforcementLearning
Multimodal agent model βΆ tool use and GUI interaction
β Learn more...
Recursive Multi-Agent Systems
#MultiAgentSystems #Reasoning #LatentSpace
RecursiveMAS latent recursion βΆ +8.3% accuracy, fewer tokens
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Step-Audio-R1.5 Technical Report
#AudioReasoning #Rlhf #SpokenDialogue
Rubric-guided audio RLHF βΆ improves long-turn spoken dialogue
β Learn more...
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
#MultiAgentSystems #AgentOrchestration #TaskPlanning
OMC multi-agent orchestration βΆ 84.67% on PRDBench
β Learn more...
π» Repos
andthattoo/structured-cot β
#Gbnf #LlamaCpp #CodeBenchmarks
Grammar-constrained CoT βΆ 22.4Γ fewer think tokens
β Learn more...
deepseek-ai/TileKernels β
#MixtureOfExperts #Quantization #GpuKernels
TileLang GPU kernels βΆ LLM ops near hardware limits
β Learn more...
antirez/llama.cpp-deepseek-v4-flash β
#Cpp #Gguf #Quantization
llama.cpp DeepSeek v4 Flash βΆ local MacBook inference
β Learn more...
π Datasets
HΒ³D: High-quality Holistic 3D Editing Dataset
#3DEditing #InstructionFollowing #PartLevel
3D editing dataset βΆ trains part-level 3D editors
β Learn more...
MathNet v0 β Olympiad Math Reasoning & Retrieval
#CompetitionMath #Multimodal #Retrieval
Multilingual Olympiad math dataset βΆ reasoning and retrieval benchmark
β Learn more...
β‘οΈ Tomorrow β Computer Vision Monthly
via @Papers.Data.Code
π Monthly: Computer Vision | Apr 03 β May 03
#MonthlyDigest #CV
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video tuning βΆ boosts 3D consistency PSNR
β Learn more...
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM βΆ unifies multimodal understanding and generation
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Vista4D: Video Reshooting with 4D Point Clouds
#NovelViewSynthesis #VideoDiffusion #3DReconstruction
4D point cloud reshooting βΆ best camera and 3D consistency
β Learn more...
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
Diffusion HOI video synthesis βΆ improves stability and contact realism
β Learn more...
π» Repos
facex-engine/facex β
#FaceVerification #Webassembly #CpuInference
Local face embeddings βΆ browser CPU verification
β Learn more...
π Datasets
Sleep Health & Daily Performance Dataset
#Classification #Regression #HealthConditions
Synthetic sleep health dataset βΆ benchmarks 3 prediction tasks
β Learn more...
β‘ Trends
βΈ Unified multimodal models increasingly merge visual understanding, generation, and editing end-to-end.
βΈ Video generation methods add explicit 3D or geometry grounding for consistency.
βΈ Specialized architectural priors target realism in controllable human and camera-centric video synthesis.
π§ TL;DR
π World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
RL adds strong 3D-consistent video generation without architecture or inference changes
β facex-engine/facex β
Tiny local face verification runs fast on browser and CPU
π‘ Vision models are converging toward unified, geometry-grounded, practically deployable generation systems.
via @Papers.Data.Code
#MonthlyDigest #CV
π Papers
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
#TextToVideo #ReinforcementLearning #3DConsistency
RL text-to-video tuning βΆ boosts 3D consistency PSNR
β Learn more...
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM βΆ unifies multimodal understanding and generation
β Learn more...
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
#MultimodalLearning #ImageGeneration #VisionLanguageModels
Direct patch embeddings βΆ unified multimodal understanding generation
β Learn more...
Vista4D: Video Reshooting with 4D Point Clouds
#NovelViewSynthesis #VideoDiffusion #3DReconstruction
4D point cloud reshooting βΆ best camera and 3D consistency
β Learn more...
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
Diffusion HOI video synthesis βΆ improves stability and contact realism
β Learn more...
π» Repos
facex-engine/facex β
#FaceVerification #Webassembly #CpuInference
Local face embeddings βΆ browser CPU verification
β Learn more...
π Datasets
Sleep Health & Daily Performance Dataset
#Classification #Regression #HealthConditions
Synthetic sleep health dataset βΆ benchmarks 3 prediction tasks
β Learn more...
β‘ Trends
βΈ Unified multimodal models increasingly merge visual understanding, generation, and editing end-to-end.
βΈ Video generation methods add explicit 3D or geometry grounding for consistency.
βΈ Specialized architectural priors target realism in controllable human and camera-centric video synthesis.
π§ TL;DR
π World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
RL adds strong 3D-consistent video generation without architecture or inference changes
β facex-engine/facex β
Tiny local face verification runs fast on browser and CPU
π‘ Vision models are converging toward unified, geometry-grounded, practically deployable generation systems.
via @Papers.Data.Code
