This media is not supported in your browser
VIEW IN TELEGRAM
๐โโ๏ธ GSTAR: Gaussian Surface Tracking ๐โโ๏ธ
๐ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐
๐Review https://t.ly/udpMq
๐Paper arxiv.org/pdf/2501.10283
๐Project chengwei-zheng.github.io/GSTAR/
๐Repo TBA
๐ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐
๐Review https://t.ly/udpMq
๐Paper arxiv.org/pdf/2501.10283
๐Project chengwei-zheng.github.io/GSTAR/
๐Repo TBA
๐ฅ8๐คฉ3๐2๐2โค1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฝ Diffusion Video Inpainting ๐งฝ
๐#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐
๐Review https://t.ly/7rEll
๐Paper arxiv.org/pdf/2501.10018
๐Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Repo github.com/lixiaowen-xw/DiffuEraser
๐#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐
๐Review https://t.ly/7rEll
๐Paper arxiv.org/pdf/2501.10018
๐Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Repo github.com/lixiaowen-xw/DiffuEraser
๐ฅ14โค3๐2โก1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #Nvidia Foundation ZS-Stereo ๐
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐ฅ6๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ [SOTA] Long-Video Depth Anything ๐ฅ
๐ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐
๐Review https://t.ly/Q4ZZd
๐Paper arxiv.org/pdf/2501.12375
๐Project https://lnkd.in/dKNwJzbM
๐Repo https://lnkd.in/ddfwwpCj
๐ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐
๐Review https://t.ly/Q4ZZd
๐Paper arxiv.org/pdf/2501.12375
๐Project https://lnkd.in/dKNwJzbM
๐Repo https://lnkd.in/ddfwwpCj
๐ฅ9๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งตTime-Aware Pts-Tracking๐งต
๐Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐
๐Review https://t.ly/XAL7G
๐Paper arxiv.orgzpdf/2501.12218
๐Project cvlab-kaist.github.io/Chrono/
๐Repo github.com/cvlab-kaist/Chrono
๐Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐
๐Review https://t.ly/XAL7G
๐Paper arxiv.orgzpdf/2501.12218
๐Project cvlab-kaist.github.io/Chrono/
๐Repo github.com/cvlab-kaist/Chrono
โค5๐ฅ5๐3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐คEMO2: Audio-Driven Avatar๐ค
๐Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐ฅบ
๐Review https://t.ly/x8slQ
๐Paper arxiv.org/pdf/2501.10687
๐Project humanaigc.github.io/emote-portrait-alive-2/
๐Repo ๐ฅบ
๐Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐ฅบ
๐Review https://t.ly/x8slQ
๐Paper arxiv.org/pdf/2501.10687
๐Project humanaigc.github.io/emote-portrait-alive-2/
๐Repo ๐ฅบ
โค6๐คฏ6๐2๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ A-Life with Foundation Models๐ฆ
๐A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐
๐Review https://t.ly/7SZ8A
๐Paper arxiv.org/pdf/2412.17799
๐Project http://pub.sakana.ai/asal/
๐Repo https://lnkd.in/dP5yxKtw
๐A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐
๐Review https://t.ly/7SZ8A
๐Paper arxiv.org/pdf/2412.17799
๐Project http://pub.sakana.ai/asal/
๐Repo https://lnkd.in/dP5yxKtw
โค11โก2๐คฉ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ The code of DynOMo is out ๐ฅ
๐DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input
๐Review https://t.ly/t5pCf
๐Paper https://lnkd.in/dwhzz4_t
๐Repo github.com/dvl-tum/DynOMo
๐Project https://lnkd.in/dMyku2HW
๐DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input
๐Review https://t.ly/t5pCf
๐Paper https://lnkd.in/dwhzz4_t
๐Repo github.com/dvl-tum/DynOMo
๐Project https://lnkd.in/dMyku2HW
๐ฅ7โค5๐5๐2๐คฉ2๐พ2โก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชSOTA Points Segmentation๐ช
๐VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
๐Review https://t.ly/8Bsbt
๐Paper https://arxiv.org/pdf/2501.12392
๐Code https://github.com/karazijal/lrtl
๐Project www.robots.ox.ac.uk/~vgg/research/lrtl/
๐VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
๐Review https://t.ly/8Bsbt
๐Paper https://arxiv.org/pdf/2501.12392
๐Code https://github.com/karazijal/lrtl
๐Project www.robots.ox.ac.uk/~vgg/research/lrtl/
๐ฅ3โค2๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐จMatAnyone: Human Matting๐จ
๐MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & ๐ค-Demo announced๐
๐Review https://t.ly/NVXsT
๐Paper arxiv.org/pdf/2501.14677
๐Project pq-yang.github.io/projects/MatAnyone
๐Repo TBA
๐MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & ๐ค-Demo announced๐
๐Review https://t.ly/NVXsT
๐Paper arxiv.org/pdf/2501.14677
๐Project pq-yang.github.io/projects/MatAnyone
๐Repo TBA
โค15๐2๐คฉ2๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ[SOTA] Visual Grounding VOS๐ฆ
๐ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon๐
๐Review https://t.ly/SDFy9
๐Paper arxiv.org/pdf/2501.14607
๐Project isee-laboratory.github.io/ReferDINO/
๐Repo github.com/iSEE-Laboratory/ReferDINO
๐ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon๐
๐Review https://t.ly/SDFy9
๐Paper arxiv.org/pdf/2501.14607
๐Project isee-laboratory.github.io/ReferDINO/
๐Repo github.com/iSEE-Laboratory/ReferDINO
๐คฏ4โค1๐ฅ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธ Relightable Full-Body Avatars โ๏ธ
๐#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.
๐Review https://t.ly/kx9gf
๐Paper arxiv.org/pdf/2501.14726
๐Project neuralbodies.github.io/RFGCA
๐#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.
๐Review https://t.ly/kx9gf
๐Paper arxiv.org/pdf/2501.14726
๐Project neuralbodies.github.io/RFGCA
โค3๐3๐ฅ3โก1๐คฏ1๐ข1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐
Generative Human Mesh Recovery ๐
๐GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced ๐ฅบ
๐Review https://t.ly/Rrzpj
๐Paper https://arxiv.org/pdf/2412.14444
๐Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
๐GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced ๐ฅบ
๐Review https://t.ly/Rrzpj
๐Paper https://arxiv.org/pdf/2412.14444
๐Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
๐ฅ6๐2โค1๐คฏ1๐พ1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
๐ STOP posting about!
63%
๐ฉ Keep posting. we want more!
๐1
๐AI-driven Docs Conversion๐
๐Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT๐
๐Review https://t.ly/nSCfT
๐Paper https://lnkd.in/dc5Kpc2F
๐Repo https://lnkd.in/d9gvw9bt
๐Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT๐
๐Review https://t.ly/nSCfT
๐Paper https://lnkd.in/dc5Kpc2F
๐Repo https://lnkd.in/d9gvw9bt
โค18๐8๐ฅ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฏ SOTA 0-Shot Multi-View ๐ฏ
๐MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced ๐
๐Review https://t.ly/_ecKl
๐Paper arxiv.org/pdf/2501.18804
๐Project mvgd.github.io/
๐Repo TBA
๐MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced ๐
๐Review https://t.ly/_ecKl
๐Paper arxiv.org/pdf/2501.18804
๐Project mvgd.github.io/
๐Repo TBA
๐ฅ8โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MambaGlue: SOTA feats. matching๐
๐MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released๐
๐Review https://shorturl.at/LxDG1
๐Paper arxiv.org/pdf/2502.00462
๐Repo https://lnkd.in/dAujfGZQ
๐MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released๐
๐Review https://shorturl.at/LxDG1
๐Paper arxiv.org/pdf/2502.00462
๐Repo https://lnkd.in/dAujfGZQ
๐คฉ9โค3๐ฅ2๐2๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ธReal-Time Differentiable Tracing๐ธ
๐ Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced๐
๐Review https://shorturl.at/26U06
๐Paper https://arxiv.org/pdf/2502.01157
๐Project https://radfoam.github.io/
๐Repo https://github.com/theialab/radfoam
๐ Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced๐
๐Review https://shorturl.at/26U06
๐Paper https://arxiv.org/pdf/2502.01157
๐Project https://radfoam.github.io/
๐Repo https://github.com/theialab/radfoam
๐ฅ7โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ VideoJAM: #META's Video-Model (SOTA) ๐ฅ
๐#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced๐ฅฒ
๐Review https://shorturl.at/id7Bt
๐Paper https://arxiv.org/pdf/2502.02492
๐Project https://hila-chefer.github.io/videojam-paper.github.io/
๐#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced๐ฅฒ
๐Review https://shorturl.at/id7Bt
๐Paper https://arxiv.org/pdf/2502.02492
๐Project https://hila-chefer.github.io/videojam-paper.github.io/
๐ฅ9โค4๐1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐3D Dynamic Garments๐
๐UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.
๐Review https://t.ly/qciHV
๐Paper arxiv.org/pdf/2502.03449
๐Project dress-1-to-3.github.io
๐UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.
๐Review https://t.ly/qciHV
๐Paper arxiv.org/pdf/2502.03449
๐Project dress-1-to-3.github.io
๐ฅ8โค3๐3๐2๐คฉ1๐1