π Upsample Anything π
πUpsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paperπ
πReview https://t.ly/7LE6G
πPaper https://lnkd.in/dsUfdtih
πUpsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paperπ
πReview https://t.ly/7LE6G
πPaper https://lnkd.in/dsUfdtih
π₯8β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Single Synthetic Image per Classπ¦
πMIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo availableπ
πReview https://t.ly/dD3un
πPaper arxiv.org/pdf/2511.16674
πProject linear-gradient-matching.github.io/
πRepo github.com/GeorgeCazenavette/linear-gradient-matching
πMIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo availableπ
πReview https://t.ly/dD3un
πPaper arxiv.org/pdf/2511.16674
πProject linear-gradient-matching.github.io/
πRepo github.com/GeorgeCazenavette/linear-gradient-matching
1β€6π₯2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ͺ EfficientSAM3 is out π§ͺ
πBristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)π
πReview https://t.ly/bfXP2
πPaper arxiv.org/pdf/2511.15833
πProject simonzeng7108.github.io/efficientsam3/
πRepo github.com/SimonZeng7108/efficientsam3
πBristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)π
πReview https://t.ly/bfXP2
πPaper arxiv.org/pdf/2511.15833
πProject simonzeng7108.github.io/efficientsam3/
πRepo github.com/SimonZeng7108/efficientsam3
β€6π2π₯1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π©οΈ Cloud4D in time π©οΈ
πCloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data releasedπ
πReview https://t.ly/w7Zly
πPaper arxiv.org/pdf/2511.19431
πProject cloud4d.jacob-lin.com/
πData https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
πRepo TBA
πCloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data releasedπ
πReview https://t.ly/w7Zly
πPaper arxiv.org/pdf/2511.19431
πProject cloud4d.jacob-lin.com/
πData https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
πRepo TBA
π₯9β€2
This media is not supported in your browser
VIEW IN TELEGRAM
πMotionV2V: Editing Motion in Videoπ
π Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soonπ
πReview https://t.ly/s0sIT
πPaper https://arxiv.org/pdf/2511.20640
πProject https://ryanndagreat.github.io/MotionV2V/
πRepo https://github.com/RyannDaGreat/MotionV2V
π Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soonπ
πReview https://t.ly/s0sIT
πPaper https://arxiv.org/pdf/2511.20640
πProject https://ryanndagreat.github.io/MotionV2V/
πRepo https://github.com/RyannDaGreat/MotionV2V
β€8π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Smell Like Vision Spirit π₯
πNew York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset availableπ
πReview https://t.ly/Ycn_B
πPaper arxiv.org/pdf/2511.20544
πProject smell.cs.columbia.edu/
πNew York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset availableπ
πReview https://t.ly/Ycn_B
πPaper arxiv.org/pdf/2511.20544
πProject smell.cs.columbia.edu/
β€14π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΆοΈ Seeing without Pixels πΆοΈ
πIs it possible to perceive a videoβs content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible questionπ
πReview https://t.ly/Ymd1c
πPaper arxiv.org/pdf/2511.21681
πProject sites.google.com/view/seeing-without-pixels
πIs it possible to perceive a videoβs content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible questionπ
πReview https://t.ly/Ymd1c
πPaper arxiv.org/pdf/2511.21681
πProject sites.google.com/view/seeing-without-pixels
π₯8β€5π1
This media is not supported in your browser
VIEW IN TELEGRAM
π΅Instance-Level Video Generationπ΅
πInstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announcedπ
πReview https://t.ly/y_TBT
πPaper arxiv.org/pdf/2511.23146
πProject aliothchen.github.io/projects/InstanceV/
πRepo TBA
πInstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announcedπ
πReview https://t.ly/y_TBT
πPaper arxiv.org/pdf/2511.23146
πProject aliothchen.github.io/projects/InstanceV/
πRepo TBA
β€9π4
This media is not supported in your browser
VIEW IN TELEGRAM
π₯3D Point Motion Editingπ₯
πEdit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevantπ
πReview https://t.ly/GJHJ5
πPaper arxiv.org/pdf/2512.02015
πProject edit-by-track.github.io/
πEdit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevantπ
πReview https://t.ly/GJHJ5
πPaper arxiv.org/pdf/2512.02015
πProject edit-by-track.github.io/
π₯4β€3π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Native Unified Multimodal π¦
πMETA unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal reviewπ
πReview https://t.ly/7wmKP
πPaper https://lnkd.in/djT4WGEU
πProject https://tuna-ai.org/
πRepo github.com/wren93/tuna
πMETA unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal reviewπ
πReview https://t.ly/7wmKP
πPaper https://lnkd.in/djT4WGEU
πProject https://tuna-ai.org/
πRepo github.com/wren93/tuna
β€6π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈSOTA Generative SLPβοΈ
πStable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data π
πReview https://t.ly/yKZhn
πPaper arxiv.org/pdf/2512.04048
πProject stablesigner.github.io/
πData github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
πStable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data π
πReview https://t.ly/yKZhn
πPaper arxiv.org/pdf/2512.04048
πProject stablesigner.github.io/
πData github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
β€5π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πTTSC for 3D Generativeπ
πSpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announcedπ
πReview https://t.ly/1zrah
πPaper https://lnkd.in/dEWh3vep
πProject https://lnkd.in/dScftUmm
πRepo TBA
πSpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announcedπ
πReview https://t.ly/1zrah
πPaper https://lnkd.in/dEWh3vep
πProject https://lnkd.in/dScftUmm
πRepo TBA
β€8π₯2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π·Layered PSD Diffusionπ·
πOmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demoπ
πReview https://t.ly/YNRAC
πPaper arxiv.org/pdf/2512.09247
πProject showlab.github.io/OmniPSD/
πDemo https://www.lovart.ai/it
πOmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demoπ
πReview https://t.ly/YNRAC
πPaper arxiv.org/pdf/2512.09247
πProject showlab.github.io/OmniPSD/
πDemo https://www.lovart.ai/it
π₯9β€7π2
This media is not supported in your browser
VIEW IN TELEGRAM
π§±Pixel Art Volumetric Renderingπ§±
πVoxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announcedπ
πReview https://t.ly/qPyNl
πPaper https://lnkd.in/du5ikJGN
πProject https://lnkd.in/dpiAjj5m
πRepo TBA
πVoxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announcedπ
πReview https://t.ly/qPyNl
πPaper https://lnkd.in/du5ikJGN
πProject https://lnkd.in/dpiAjj5m
πRepo TBA
β€6π₯4
This media is not supported in your browser
VIEW IN TELEGRAM
π« MoCapAnything is out π«
πMoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced π₯²
πReview https://t.ly/_Tw6t
πPaper arxiv.org/pdf/2512.10881
πProject animotionlab.github.io/MoCapAnything
πMoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced π₯²
πReview https://t.ly/_Tw6t
πPaper arxiv.org/pdf/2512.10881
πProject animotionlab.github.io/MoCapAnything
β€11π4π₯4π1π€―1π’1
This media is not supported in your browser
VIEW IN TELEGRAM
π MatAnyone 2 is out! π
πMatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announcedπ
πReview https://t.ly/vxOBO
πPaper arxiv.org/pdf/2512.11782
πProject pq-yang.github.io/projects/MatAnyone2
πRepo github.com/pq-yang/MatAnyone2
πMatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announcedπ
πReview https://t.ly/vxOBO
πPaper arxiv.org/pdf/2512.11782
πProject pq-yang.github.io/projects/MatAnyone2
πRepo github.com/pq-yang/MatAnyone2
π₯5β€4π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π· SOTA Zero-Shot Stereo Matchingπ·
πFast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announcedπ
πReview https://t.ly/XD6pO
πPaper https://lnkd.in/d9_YKW2A
πProject https://lnkd.in/dKDxm7EX
πRepo https://lnkd.in/dR4-PdsW
πFast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announcedπ
πReview https://t.ly/XD6pO
πPaper https://lnkd.in/d9_YKW2A
πProject https://lnkd.in/dKDxm7EX
πRepo https://lnkd.in/dR4-PdsW
2π₯11β€4
This media is not supported in your browser
VIEW IN TELEGRAM
πDriverGaze360: Driver SOTAπ
πDriverGaze360 is a large-scale 360β¦ field of view driver attention dataset, containing βΌ1M gaze-labeled frames. Code & Dataset announcedπ
πReview https://t.ly/ZcoUw
πPaper arxiv.org/pdf/2512.14266
πProject av.dfki.de/drivergaze360/
πRepo github.com/dfki-av/drivergaze360
πData av.dfki.de/drivergaze360/dataset
πDriverGaze360 is a large-scale 360β¦ field of view driver attention dataset, containing βΌ1M gaze-labeled frames. Code & Dataset announcedπ
πReview https://t.ly/ZcoUw
πPaper arxiv.org/pdf/2512.14266
πProject av.dfki.de/drivergaze360/
πRepo github.com/dfki-av/drivergaze360
πData av.dfki.de/drivergaze360/dataset
π₯11β€4
This media is not supported in your browser
VIEW IN TELEGRAM
π« FlexAvatar: 3D Headsπ«
πTUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announcedπ
πReview https://t.ly/Rkdtd
πPaper arxiv.org/pdf/2512.15599
πProject tobias-kirschstein.github.io/flexavatar/
πRepo TBA
πTUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announcedπ
πReview https://t.ly/Rkdtd
πPaper arxiv.org/pdf/2512.15599
πProject tobias-kirschstein.github.io/flexavatar/
πRepo TBA
π₯7β€3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ Depth Any Panoramas ποΈ
πDAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MITπ
πReview https://t.ly/LaUmd
πPaper arxiv.org/pdf/2512.16913
πProject https://lnkd.in/dvqNV9jx
πRepo https://lnkd.in/dmNzhb-7
πDemo https://lnkd.in/dDwjMF3u
πDAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MITπ
πReview https://t.ly/LaUmd
πPaper arxiv.org/pdf/2512.16913
πProject https://lnkd.in/dvqNV9jx
πRepo https://lnkd.in/dmNzhb-7
πDemo https://lnkd.in/dDwjMF3u
π₯8β€4π2