π Paper #Paper #TextToImage #TextToImage #FewStepGeneration
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
π€ Chenxi Zhao, Chen Zhu, Xiaokun Feng et al.
π― Task
One-step text-to-image generation
π‘ Idea
The authors argue that one-step generation needs text features with strong discriminability and disentanglement. They use an LLM-based text encoder with these properties and adapt MeanFlow time conditioning and training for text-conditioned generation.
β¨ Why it's interesting
It is the first reported MeanFlow extension to text and reaches 0.90 GenEval in 4 steps.
π» Repo
β AMAP-ML/EMF β 85 stars
π paper
via @Papers.Data.Code
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
π€ Chenxi Zhao, Chen Zhu, Xiaokun Feng et al.
π― Task
One-step text-to-image generation
π‘ Idea
The authors argue that one-step generation needs text features with strong discriminability and disentanglement. They use an LLM-based text encoder with these properties and adapt MeanFlow time conditioning and training for text-conditioned generation.
β¨ Why it's interesting
It is the first reported MeanFlow extension to text and reaches 0.90 GenEval in 4 steps.
π» Repo
β AMAP-ML/EMF β 85 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - AMAP-ML/EMF: [2026 CVPR]Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
[2026 CVPR]Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation - AMAP-ML/EMF
π» Repo #Repo #AgentSkills #AgentSkills #Dspy
β intertwine/dspy-agent-skills β 148 stars (+148 3d)
Python
π― Task
DSPy agent skills for coding agents
π‘ Idea
The repo packages five DSPy skills covering fundamentals, evaluation, GEPA optimization, RLM, and an advanced workflow. It provides spec-compliant docs, runnable examples, dual-agent installation, and validation tests.
β¨ Why it's interesting
Includes 80 validation tests and committed example gains up to +24.23 points after GEPA.
via @Papers.Data.Code
β intertwine/dspy-agent-skills β 148 stars (+148 3d)
Python
π― Task
DSPy agent skills for coding agents
π‘ Idea
The repo packages five DSPy skills covering fundamentals, evaluation, GEPA optimization, RLM, and an advanced workflow. It provides spec-compliant docs, runnable examples, dual-agent installation, and validation tests.
β¨ Why it's interesting
Includes 80 validation tests and committed example gains up to +24.23 points after GEPA.
via @Papers.Data.Code
π Dataset #Dataset #DocumentParsing #DocumentParsing #OCR
ParseBench
Creator: llamaindex
π― Task
Document parsing evaluation
π‘ Idea
It provides human-verified pages and rule-based evaluations for five parsing capabilities, with task-specific metrics and source documents in PDF, JPG, and PNG formats.
β¨ Why it's interesting
Covers 2,078 pages from 1,211 documents with 169,011 test rules across five dimensions.
Size: 2,078 pages, 1,211 documents, 169,011 test rules
Downloads: 12.6k | Likes: 68
π dataset
via @Papers.Data.Code
ParseBench
Creator: llamaindex
π― Task
Document parsing evaluation
π‘ Idea
It provides human-verified pages and rule-based evaluations for five parsing capabilities, with task-specific metrics and source documents in PDF, JPG, and PNG formats.
β¨ Why it's interesting
Covers 2,078 pages from 1,211 documents with 169,011 test rules across five dimensions.
Size: 2,078 pages, 1,211 documents, 169,011 test rules
Downloads: 12.6k | Likes: 68
π dataset
via @Papers.Data.Code
π» Repo #Repo #LLM #AIAgents #TelegramBots
β cosmicstack-labs/mercury-agent β 476 stars (+476 3d)
TypeScript
π― Task
Permission-aware AI assistant
π‘ Idea
Mercury runs as a 24/7 agent across CLI and Telegram, using permission-hardened tools, folder-scoped access, command blocklists, and approval flows. It also supports daily token budgets, provider fallback, scheduling, and markdown-defined personality files.
β¨ Why it's interesting
It asks before acting and includes 31 built-in tools with daemon and Telegram support.
via @Papers.Data.Code
β cosmicstack-labs/mercury-agent β 476 stars (+476 3d)
TypeScript
π― Task
Permission-aware AI assistant
π‘ Idea
Mercury runs as a 24/7 agent across CLI and Telegram, using permission-hardened tools, folder-scoped access, command blocklists, and approval flows. It also supports daily token budgets, provider fallback, scheduling, and markdown-defined personality files.
β¨ Why it's interesting
It asks before acting and includes 31 built-in tools with daemon and Telegram support.
via @Papers.Data.Code
π Paper #Paper #AutonomousDriving #AutonomousDriving #TrajectoryPrediction
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
π€ Jinghui Lu, Jiayi Guan, Zhijian Huang et al.
π― Task
Autonomous driving trajectory planning
π‘ Idea
OneVL trains compact latent tokens with two auxiliary decoders: one reconstructs text chain-of-thought, and one predicts future visual tokens as a world model. At inference, the decoders are removed and all latents are prefilled in a single pass.
β¨ Why it's interesting
It is the first latent CoT method reported to outperform explicit CoT across four benchmarks.
π paper
via @Papers.Data.Code
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
π€ Jinghui Lu, Jiayi Guan, Zhijian Huang et al.
π― Task
Autonomous driving trajectory planning
π‘ Idea
OneVL trains compact latent tokens with two auxiliary decoders: one reconstructs text chain-of-thought, and one predicts future visual tokens as a world model. At inference, the decoders are removed and all latents are prefilled in a single pass.
β¨ Why it's interesting
It is the first latent CoT method reported to outperform explicit CoT across four benchmarks.
π paper
via @Papers.Data.Code
π Paper #Paper #CV #VideoGeneration #DiffusionTransformers
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
π€ Xiangyang Luo, Xiaozhe Xin, Tao Feng et al.
π― Task
Human-object interaction video synthesis
π‘ Idea
It adds a human-aware MoE for hand and face regions and jointly trains RGB video with an auxiliary HOI structure stream to inject interaction geometry priors. The HOI branch is removed at inference for zero-overhead RGB generation.
β¨ Why it's interesting
It significantly outperforms prior methods in structural stability and interaction realism.
π» Repo
β luoxyhappy/CoInteract β 44 stars
π paper
via @Papers.Data.Code
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
π€ Xiangyang Luo, Xiaozhe Xin, Tao Feng et al.
π― Task
Human-object interaction video synthesis
π‘ Idea
It adds a human-aware MoE for hand and face regions and jointly trains RGB video with an auxiliary HOI structure stream to inject interaction geometry priors. The HOI branch is removed at inference for zero-overhead RGB generation.
β¨ Why it's interesting
It significantly outperforms prior methods in structural stability and interaction realism.
π» Repo
β luoxyhappy/CoInteract β 44 stars
π paper
via @Papers.Data.Code
π» Repo #Repo #LLM #PromptEngineering #TechnicalWriting
β yzhao062/agent-style β 203 stars (+203 3d)
Python
π― Task
LLM writing style control
π‘ Idea
It packages 21 writing rules, adapters for tools like Claude Code, Codex, Cursor, Copilot, and Aider, plus a review command that audits drafts against the same rules.
β¨ Why it's interesting
Bench results report fewer style violations: 45% drops on Claude Opus 4.7 and GPT-5.4, 82% on Gemini 3 Flash.
via @Papers.Data.Code
β yzhao062/agent-style β 203 stars (+203 3d)
Python
π― Task
LLM writing style control
π‘ Idea
It packages 21 writing rules, adapters for tools like Claude Code, Codex, Cursor, Copilot, and Aider, plus a review command that audits drafts against the same rules.
β¨ Why it's interesting
Bench results report fewer style violations: 45% drops on Claude Opus 4.7 and GPT-5.4, 82% on Gemini 3 Flash.
via @Papers.Data.Code
π Paper #Paper #CV #MultimodalLearning #ImageGeneration
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
π€ Inclusion AI, Tiwei Bie, Haoxing Chen et al.
π― Task
Unified multimodal understanding and generation
π‘ Idea
The model discretizes images with a semantic SigLIP-VQ tokenizer, processes text and vision with a shared block-level masked diffusion MoE backbone, and reconstructs images with a distilled diffusion decoder. It also supports interleaved generation and reasoning.
β¨ Why it's interesting
It matches specialized VLMs on understanding while showing strong image generation and editing.
π» Repo
β inclusionAI/LLaDA2.0-Uni β 98 stars
π paper
via @Papers.Data.Code
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
π€ Inclusion AI, Tiwei Bie, Haoxing Chen et al.
π― Task
Unified multimodal understanding and generation
π‘ Idea
The model discretizes images with a semantic SigLIP-VQ tokenizer, processes text and vision with a shared block-level masked diffusion MoE backbone, and reconstructs images with a distilled diffusion decoder. It also supports interleaved generation and reasoning.
β¨ Why it's interesting
It matches specialized VLMs on understanding while showing strong image generation and editing.
π» Repo
β inclusionAI/LLaDA2.0-Uni β 98 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - inclusionAI/LLaDA2.0-Uni: LLaDA2.0-Uni: Understanding and Generation the World.
LLaDA2.0-Uni: Understanding and Generation the World. - inclusionAI/LLaDA2.0-Uni
π Dataset #Dataset #LLM #SyntheticData #Personas
Nemotron-Personas-Korea
π€ nvidia
π― Task
Korean persona generation
π‘ Idea
The dataset synthesizes adult Korean personas from official population statistics and public sources, covering demographics, geography, occupations, and multiple persona descriptions. It was built with NeMo Data Designer to better reflect real-world Korean population diversity.
β¨ Why it's interesting
Notable as the first large-scale Korean-language persona dataset with 1M records and 7M personas.
Size: 1M records, 7M personas, 2.0 GB
Downloads: 2.0k | Likes: 46
π dataset
via @Papers.Data.Code
Nemotron-Personas-Korea
π€ nvidia
π― Task
Korean persona generation
π‘ Idea
The dataset synthesizes adult Korean personas from official population statistics and public sources, covering demographics, geography, occupations, and multiple persona descriptions. It was built with NeMo Data Designer to better reflect real-world Korean population diversity.
β¨ Why it's interesting
Notable as the first large-scale Korean-language persona dataset with 1M records and 7M personas.
Size: 1M records, 7M personas, 2.0 GB
Downloads: 2.0k | Likes: 46
π dataset
via @Papers.Data.Code
huggingface.co
nvidia/Nemotron-Personas-Korea Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
π» Repo #Repo #SearchEvaluation #SearchEvaluation #CitationAnalysis
Geo Citation Lab
π€ yaojingang
π― Task
AI search citation analysis
π‘ Idea
It provides 602 prompts, raw citation/search CSVs, page crawls, 72-feature citation records, and scripts to analyze search triggering, source preferences, and citation influence across three platforms.
β¨ Why it's interesting
Includes 21,143 citation records, 18,151 crawled pages, and a full reproducible analysis pipeline.
π» Repo
β yaojingang/geo-citation-lab β 130 stars (+130 3d)
Python
via @Papers.Data.Code
Geo Citation Lab
π€ yaojingang
π― Task
AI search citation analysis
π‘ Idea
It provides 602 prompts, raw citation/search CSVs, page crawls, 72-feature citation records, and scripts to analyze search triggering, source preferences, and citation influence across three platforms.
β¨ Why it's interesting
Includes 21,143 citation records, 18,151 crawled pages, and a full reproducible analysis pipeline.
π» Repo
β yaojingang/geo-citation-lab β 130 stars (+130 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - yaojingang/geo-citation-lab: A dataset and analysis pipeline for studying how AI search engines select and use citations.
A dataset and analysis pipeline for studying how AI search engines select and use citations. - yaojingang/geo-citation-lab
π Paper #Paper #NLP #LLMAgents #WorkflowOrchestration
AgentSPEX: An Agent SPecification and EXecution Language
π€ Pengcheng Wang, Jerry Huang, Jiarui Yao et al.
π― Task
LLM agent workflow specification
π‘ Idea
It defines agent workflows declaratively with typed steps, branching, loops, parallelism, reusable submodules, and explicit context/state management, then runs them in a harness with tools, sandboxing, checkpointing, logging, and replay.
β¨ Why it's interesting
It outperformed compared baselines on 7 benchmarks and was rated more interpretable in a user study.
π» Repo
β ScaleML/AgentSPEX β 43 stars
π paper
via @Papers.Data.Code
AgentSPEX: An Agent SPecification and EXecution Language
π€ Pengcheng Wang, Jerry Huang, Jiarui Yao et al.
π― Task
LLM agent workflow specification
π‘ Idea
It defines agent workflows declaratively with typed steps, branching, loops, parallelism, reusable submodules, and explicit context/state management, then runs them in a harness with tools, sandboxing, checkpointing, logging, and replay.
β¨ Why it's interesting
It outperformed compared baselines on 7 benchmarks and was rated more interpretable in a user study.
π» Repo
β ScaleML/AgentSPEX β 43 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - ScaleML/AgentSPEX: This is the official implementation for AgentSPEX: An Agent SPecification and EXecution Language
This is the official implementation for AgentSPEX: An Agent SPecification and EXecution Language - ScaleML/AgentSPEX
π Weekly Digest | Apr 18 β Apr 25
#WeeklyDigest
π Papers
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM βΆ unifies understanding and generation
β Learn more...
AgentSPEX: An Agent SPecification and EXecution Language
#LLMAgents #WorkflowOrchestration #ProgramSynthesis
YAML agent workflows βΆ beats baselines on 7 benchmarks
β Learn more...
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
#AutonomousDriving #TrajectoryPrediction #WorldModels
Latent CoT VLA βΆ beats explicit CoT
β Learn more...
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
CoInteract diffusion framework βΆ more stable realistic HOI videos
β Learn more...
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
#TextToImage #FewStepGeneration #FlowMatching
Text-conditioned MeanFlow βΆ 0.90 GenEval in 4 steps
β Learn more...
π» Repos
cosmicstack-labs/mercury-agent β
#AIAgents #TelegramBots #ToolUse
TypeScript CLI Telegram agent βΆ approval-based 24/7 tool use
β Learn more...
yaojingang/geo-citation-lab β
#SearchEvaluation #CitationAnalysis #WebCrawling
Citation analysis dataset βΆ studies search and citation choices
β Learn more...
intertwine/dspy-agent-skills β
#AgentSkills #Dspy #Gepa
DSPy agent skills βΆ coding workflows with tests
β Learn more...
π Datasets
ParseBench
#DocumentParsing #OCR #LayoutDetection
ParseBench dataset βΆ evaluates enterprise document parsers
β Learn more...
Nemotron-Personas-Korea
#SyntheticData #Personas #Korean
Synthetic Korean persona dataset βΆ model training and evaluation
β Learn more...
β‘οΈ Tomorrow β NLP & LLM Monthly
via @Papers.Data.Code
#WeeklyDigest
π Papers
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
#MultimodalLearning #ImageGeneration #DiffusionModels
Discrete diffusion LLM βΆ unifies understanding and generation
β Learn more...
AgentSPEX: An Agent SPecification and EXecution Language
#LLMAgents #WorkflowOrchestration #ProgramSynthesis
YAML agent workflows βΆ beats baselines on 7 benchmarks
β Learn more...
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
#AutonomousDriving #TrajectoryPrediction #WorldModels
Latent CoT VLA βΆ beats explicit CoT
β Learn more...
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
#VideoGeneration #DiffusionTransformers #HumanObjectInteraction
CoInteract diffusion framework βΆ more stable realistic HOI videos
β Learn more...
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
#TextToImage #FewStepGeneration #FlowMatching
Text-conditioned MeanFlow βΆ 0.90 GenEval in 4 steps
β Learn more...
π» Repos
cosmicstack-labs/mercury-agent β
#AIAgents #TelegramBots #ToolUse
TypeScript CLI Telegram agent βΆ approval-based 24/7 tool use
β Learn more...
yaojingang/geo-citation-lab β
#SearchEvaluation #CitationAnalysis #WebCrawling
Citation analysis dataset βΆ studies search and citation choices
β Learn more...
intertwine/dspy-agent-skills β
#AgentSkills #Dspy #Gepa
DSPy agent skills βΆ coding workflows with tests
β Learn more...
π Datasets
ParseBench
#DocumentParsing #OCR #LayoutDetection
ParseBench dataset βΆ evaluates enterprise document parsers
β Learn more...
Nemotron-Personas-Korea
#SyntheticData #Personas #Korean
Synthetic Korean persona dataset βΆ model training and evaluation
β Learn more...
β‘οΈ Tomorrow β NLP & LLM Monthly
via @Papers.Data.Code
π Monthly: NLP & LLM | April 2026
#MonthlyDigest #NLPAndLLM
π Papers
AgentSPEX: An Agent SPecification and EXecution Language
#LLMAgents #WorkflowOrchestration #ProgramSynthesis
YAML agent workflow language βΆ beats baselines on 7 benchmarks
β Learn more...
π» Repos
cosmicstack-labs/mercury-agent β
#AIAgents #TelegramBots #ToolUse
TypeScript AI agent βΆ CLI Telegram with approval actions
β Learn more...
yzhao062/agent-style β
#PromptEngineering #TechnicalWriting #AIAgents
Writing ruleset CLI βΆ reduces style violations
β Learn more...
π Datasets
Nemotron-Personas-Korea
#SyntheticData #Personas #Korean
Korean persona dataset βΆ trains and evaluates persona models
β Learn more...
β‘ Trends
βΈ Agent tooling emphasizes explicit workflows, permissions, and human approval safeguards.
βΈ Recent LLM agent systems package reusable controls for execution and output quality.
βΈ Synthetic, structured persona data is expanding beyond English into localized demographics.
π§ TL;DR
π AgentSPEX: An Agent SPecification and EXecution Language
Declarative agent language improves benchmark performance and interpretability with executable workflows.
β cosmicstack-labs/mercury-agent β
Practical permission-aware agent offers approvals, daemon mode, Telegram, and 31 tools.
π‘ LLM tooling is shifting toward controllable, operationally safe agent systems.
via @Papers.Data.Code
#MonthlyDigest #NLPAndLLM
π Papers
AgentSPEX: An Agent SPecification and EXecution Language
#LLMAgents #WorkflowOrchestration #ProgramSynthesis
YAML agent workflow language βΆ beats baselines on 7 benchmarks
β Learn more...
π» Repos
cosmicstack-labs/mercury-agent β
#AIAgents #TelegramBots #ToolUse
TypeScript AI agent βΆ CLI Telegram with approval actions
β Learn more...
yzhao062/agent-style β
#PromptEngineering #TechnicalWriting #AIAgents
Writing ruleset CLI βΆ reduces style violations
β Learn more...
π Datasets
Nemotron-Personas-Korea
#SyntheticData #Personas #Korean
Korean persona dataset βΆ trains and evaluates persona models
β Learn more...
β‘ Trends
βΈ Agent tooling emphasizes explicit workflows, permissions, and human approval safeguards.
βΈ Recent LLM agent systems package reusable controls for execution and output quality.
βΈ Synthetic, structured persona data is expanding beyond English into localized demographics.
π§ TL;DR
π AgentSPEX: An Agent SPecification and EXecution Language
Declarative agent language improves benchmark performance and interpretability with executable workflows.
β cosmicstack-labs/mercury-agent β
Practical permission-aware agent offers approvals, daemon mode, Telegram, and 31 tools.
π‘ LLM tooling is shifting toward controllable, operationally safe agent systems.
via @Papers.Data.Code
π Paper #Paper #Robotics #HumanoidControl #WorldModels
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
π€ Boyu Chen, Yi Chen, Lu Qiu et al.
π― Task
Human-to-humanoid policy learning and world modeling
π‘ Idea
Tri-branch visual-action-fusion tokenizer with cross-reconstruction maps heterogeneous actions into shared discrete tokens; actions predict vision and vision reconstructs actions to capture embodiment-agnostic physical intent.
β¨ Why it's interesting
Achieves SOTA data efficiency, robust OOD generalization, and zero-shot task transfer on sim and real humanoids.
π» Repo
β xpeng-robotics/UniT β 37 stars
π paper
via @Papers.Data.Code
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
π€ Boyu Chen, Yi Chen, Lu Qiu et al.
π― Task
Human-to-humanoid policy learning and world modeling
π‘ Idea
Tri-branch visual-action-fusion tokenizer with cross-reconstruction maps heterogeneous actions into shared discrete tokens; actions predict vision and vision reconstructs actions to capture embodiment-agnostic physical intent.
β¨ Why it's interesting
Achieves SOTA data efficiency, robust OOD generalization, and zero-shot task transfer on sim and real humanoids.
π» Repo
β xpeng-robotics/UniT β 37 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - xpeng-robotics/UniT
Contribute to xpeng-robotics/UniT development by creating an account on GitHub.
π Dataset #Dataset #3DEditing #3DEditing #InstructionFollowing
HΒ³D: High-quality Holistic 3D Editing Dataset
π€ ART-3D
π― Task
Instruction-following 3D editing
π‘ Idea
~102.7K records across 10 shards: each sample has before/after 3D SLAT latents, one aligned 518Γ518 RGB view per side, and an edit prompt for deletion, addition, modification, scale, material, color, or global style.
β¨ Why it's interesting
Paired latent+image edits across 7 types enable training and evaluation of part-level 3D editors.
Size: 102,704 records across 10 shards, ~54.5 GB total
Downloads: 441 | Likes: 10
π dataset
via @Papers.Data.Code
HΒ³D: High-quality Holistic 3D Editing Dataset
π€ ART-3D
π― Task
Instruction-following 3D editing
π‘ Idea
~102.7K records across 10 shards: each sample has before/after 3D SLAT latents, one aligned 518Γ518 RGB view per side, and an edit prompt for deletion, addition, modification, scale, material, color, or global style.
β¨ Why it's interesting
Paired latent+image edits across 7 types enable training and evaluation of part-level 3D editors.
Size: 102,704 records across 10 shards, ~54.5 GB total
Downloads: 441 | Likes: 10
π dataset
via @Papers.Data.Code
π₯1
π₯ Repo #Repo #MixtureOfExperts #MixtureOfExperts #Quantization
Tile Kernels
π€ deepseek-ai
π― Task
LLM GPU kernel optimization
π‘ Idea
Optimized TileLang kernels for LLM ops, including top-k MoE gating/routing, FP8/FP4/E5M6 quantization, batched transpose, Engram, and Manifold HyperConnection, plus trainable torch.autograd.Function wrappers for higher-level layers.
β¨ Why it's interesting
Authors say most kernels approach hardware limits for compute intensity and memory bandwidth.
π» Repo
β deepseek-ai/TileKernels β 1.2k stars (+1.1k 3d)
Python
via @Papers.Data.Code
Tile Kernels
π€ deepseek-ai
π― Task
LLM GPU kernel optimization
π‘ Idea
Optimized TileLang kernels for LLM ops, including top-k MoE gating/routing, FP8/FP4/E5M6 quantization, batched transpose, Engram, and Manifold HyperConnection, plus trainable torch.autograd.Function wrappers for higher-level layers.
β¨ Why it's interesting
Authors say most kernels approach hardware limits for compute intensity and memory bandwidth.
π» Repo
β deepseek-ai/TileKernels β 1.2k stars (+1.1k 3d)
Python
via @Papers.Data.Code
GitHub
GitHub - deepseek-ai/TileKernels: A kernel library written in tilelang
A kernel library written in tilelang. Contribute to deepseek-ai/TileKernels development by creating an account on GitHub.
π Paper #Paper #CV #NovelViewSynthesis #VideoDiffusion
Vista4D: Video Reshooting with 4D Point Clouds
π€ Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca et al.
π― Task
Video reshooting
π‘ Idea
4D-grounded point clouds with temporally persistent static pixels guide a video diffusion model, plus training on noisy reconstructed multiview data to preserve seen content and improve camera control under real-world artifacts.
β¨ Why it's interesting
Best camera/3D consistency; user study wins 67.06% preservation, 68.17% camera, 77.38% fidelity.
π» Repo
β Eyeline-Labs/Vista4D β 88 stars
π paper
via @Papers.Data.Code
Vista4D: Video Reshooting with 4D Point Clouds
π€ Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca et al.
π― Task
Video reshooting
π‘ Idea
4D-grounded point clouds with temporally persistent static pixels guide a video diffusion model, plus training on noisy reconstructed multiview data to preserve seen content and improve camera control under real-world artifacts.
β¨ Why it's interesting
Best camera/3D consistency; user study wins 67.06% preservation, 68.17% camera, 77.38% fidelity.
π» Repo
β Eyeline-Labs/Vista4D β 88 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - Eyeline-Labs/Vista4D: Official code, models, and data for Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight)
Official code, models, and data for Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) - Eyeline-Labs/Vista4D
β€1
π Paper #Paper #LLM #TimeSeries #VisionLanguageModels
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
π€ Yueyang Ding, HaoPeng Zhang, Rui Dai et al.
π― Task
Time series reasoning
π‘ Idea
Dual-view VLM input uses a time-series plot plus an index-value table for precise numerical grounding, then curriculum fine-tunes across L1-L3 reasoning levels on the 83k-sample HiTSR dataset.
β¨ Why it's interesting
Best OOD results: 86.8% L1, 75.6% local L2, 97.5% global L2, 67.0% L3 accuracy.
π» Repo
β RainingNovember/LLaTiSA β 76 stars
π paper
via @Papers.Data.Code
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
π€ Yueyang Ding, HaoPeng Zhang, Rui Dai et al.
π― Task
Time series reasoning
π‘ Idea
Dual-view VLM input uses a time-series plot plus an index-value table for precise numerical grounding, then curriculum fine-tunes across L1-L3 reasoning levels on the 83k-sample HiTSR dataset.
β¨ Why it's interesting
Best OOD results: 86.8% L1, 75.6% local L2, 97.5% global L2, 67.0% L3 accuracy.
π» Repo
β RainingNovember/LLaTiSA β 76 stars
π paper
via @Papers.Data.Code
GitHub
GitHub - RainingNovember/LLaTiSA: This is the official repository of "LLaTiSA: Towards Difficulty-Stratified Time Series Reasoningβ¦
This is the official repository of "LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics". - RainingNovember/LLaTiSA
π Dataset #Dataset #Classification #Classification #Regression
Sleep Health & Daily Performance Dataset
π€ mohankrishnathalla
π― Task
Sleep health prediction
π‘ Idea
100K records, 32 columns, and 3 targets spanning regression, multiclass, and binary tasks. Structured daily snapshots cover sleep metrics, behaviors, mental state, cognitive outcomes, 12 occupations, and 15 countries with no missing values.
β¨ Why it's interesting
100K rows + 3 targets enable benchmarkable sleep, risk, and cognition models from beginner to expert level.
Size: 100K records, 32 columns, 14.3 MB
Downloads: 2.9k | Likes: 49
π dataset
via @Papers.Data.Code
Sleep Health & Daily Performance Dataset
π€ mohankrishnathalla
π― Task
Sleep health prediction
π‘ Idea
100K records, 32 columns, and 3 targets spanning regression, multiclass, and binary tasks. Structured daily snapshots cover sleep metrics, behaviors, mental state, cognitive outcomes, 12 occupations, and 15 countries with no missing values.
β¨ Why it's interesting
100K rows + 3 targets enable benchmarkable sleep, risk, and cognition models from beginner to expert level.
Size: 100K records, 32 columns, 14.3 MB
Downloads: 2.9k | Likes: 49
π dataset
via @Papers.Data.Code
Kaggle
Sleep Health & Daily Performance Dataset
100K records Β· sleep, lifestyle & cognitive scores across 12 occupations
π€1
