This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Depth Any Camera (SOTA) π₯
πDAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360β¦). Code announced (not available yet)π
πReview https://t.ly/1qz4F
πPaper arxiv.org/pdf/2501.02464
πProject yuliangguo.github.io/depth-any-camera/
πRepo github.com/yuliangguo/depth_any_camera
πDAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360β¦). Code announced (not available yet)π
πReview https://t.ly/1qz4F
πPaper arxiv.org/pdf/2501.02464
πProject yuliangguo.github.io/depth-any-camera/
πRepo github.com/yuliangguo/depth_any_camera
π12π₯5π€©4β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
β€οΈβπ₯ Uncommon object in #3D β€οΈβπ₯
π#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360β¦ coverage. Code & data under CCA 4.0π
πReview https://t.ly/Z_tvA
πPaper https://arxiv.org/pdf/2501.07574
πProject https://uco3d.github.io/
πRepo github.com/facebookresearch/uco3d
π#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360β¦ coverage. Code & data under CCA 4.0π
πReview https://t.ly/Z_tvA
πPaper https://arxiv.org/pdf/2501.07574
πProject https://uco3d.github.io/
πRepo github.com/facebookresearch/uco3d
β€11β‘2π2π1π1π€©1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πUniversal Detector-Free Matchπ
πMatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released π
πReview https://t.ly/sx92L
πPaper https://lnkd.in/dWwRwGyY
πProject https://lnkd.in/dCwb2Yte
πRepo https://lnkd.in/dnUXYzQ5
πMatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released π
πReview https://t.ly/sx92L
πPaper https://lnkd.in/dWwRwGyY
πProject https://lnkd.in/dCwb2Yte
πRepo https://lnkd.in/dnUXYzQ5
β€8π€―7π₯4π3β‘1π€©1π1πΎ1
π Help: Looking for Outstanding Speakers π
πWho would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only βhardcoreβ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
βGuaranteed tickets & more for the suggestions that will become invited speakers ;)
πWho would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only βhardcoreβ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
βGuaranteed tickets & more for the suggestions that will become invited speakers ;)
β€5π₯4π3
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈOmni-RGPT: SOTA MLLM Understandingπ§ββοΈ
π #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
πReview https://t.ly/KHnQ7
πPaper arxiv.org/pdf/2501.08326
πProject miranheo.github.io/omni-rgpt/
πRepo TBA soon
π #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
πReview https://t.ly/KHnQ7
πPaper arxiv.org/pdf/2501.08326
πProject miranheo.github.io/omni-rgpt/
πRepo TBA soon
π₯10β€3πΎ2β‘1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ GAGA: Group Any Gaussians π₯
πGAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updatedπ
πReview https://t.ly/Nk_jT
πPaper www.gaga.gallery/static/pdf/Gaga.pdf
πProject www.gaga.gallery/
πRepo github.com/weijielyu/Gaga
πGAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updatedπ
πReview https://t.ly/Nk_jT
πPaper www.gaga.gallery/static/pdf/Gaga.pdf
πProject www.gaga.gallery/
πRepo github.com/weijielyu/Gaga
π₯11β€3π2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πFree Book: LLM Foundationsπ
πA fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
β Chapter 1: basics of pre-training
β Chapter 2: gen-models & LLMs
β Chapter 3: prompting methods
β Chapter 4: alignment methods
πIf you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
πReview https://t.ly/9LGCa
πBook https://lnkd.in/d3VkswZf
πA fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
β Chapter 1: basics of pre-training
β Chapter 2: gen-models & LLMs
β Chapter 3: prompting methods
β Chapter 4: alignment methods
πIf you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
πReview https://t.ly/9LGCa
πBook https://lnkd.in/d3VkswZf
β€17π₯6π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
πββοΈ GSTAR: Gaussian Surface Tracking πββοΈ
πETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announcedπ
πReview https://t.ly/udpMq
πPaper arxiv.org/pdf/2501.10283
πProject chengwei-zheng.github.io/GSTAR/
πRepo TBA
πETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announcedπ
πReview https://t.ly/udpMq
πPaper arxiv.org/pdf/2501.10283
πProject chengwei-zheng.github.io/GSTAR/
πRepo TBA
π₯8π€©3π2π2β€1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π§½ Diffusion Video Inpainting π§½
π#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apacheπ
πReview https://t.ly/7rEll
πPaper arxiv.org/pdf/2501.10018
πProject lixiaowen-xw.github.io/DiffuEraser-page/
πRepo github.com/lixiaowen-xw/DiffuEraser
π#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apacheπ
πReview https://t.ly/7rEll
πPaper arxiv.org/pdf/2501.10018
πProject lixiaowen-xw.github.io/DiffuEraser-page/
πRepo github.com/lixiaowen-xw/DiffuEraser
π₯14β€3π2β‘1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #Nvidia Foundation ZS-Stereo π
πNvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπ
πReview https://t.ly/rfBr5
πPaper arxiv.org/pdf/2501.09898
πProject nvlabs.github.io/FoundationStereo/
πRepo github.com/NVlabs/FoundationStereo/tree/master
πNvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπ
πReview https://t.ly/rfBr5
πPaper arxiv.org/pdf/2501.09898
πProject nvlabs.github.io/FoundationStereo/
πRepo github.com/NVlabs/FoundationStereo/tree/master
β€6π₯6π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ [SOTA] Long-Video Depth Anything π₯
πByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0π
πReview https://t.ly/Q4ZZd
πPaper arxiv.org/pdf/2501.12375
πProject https://lnkd.in/dKNwJzbM
πRepo https://lnkd.in/ddfwwpCj
πByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0π
πReview https://t.ly/Q4ZZd
πPaper arxiv.org/pdf/2501.12375
πProject https://lnkd.in/dKNwJzbM
πRepo https://lnkd.in/ddfwwpCj
π₯9π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π§΅Time-Aware Pts-Trackingπ§΅
πChrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announcedπ
πReview https://t.ly/XAL7G
πPaper arxiv.orgzpdf/2501.12218
πProject cvlab-kaist.github.io/Chrono/
πRepo github.com/cvlab-kaist/Chrono
πChrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announcedπ
πReview https://t.ly/XAL7G
πPaper arxiv.orgzpdf/2501.12218
πProject cvlab-kaist.github.io/Chrono/
πRepo github.com/cvlab-kaist/Chrono
β€5π₯5π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€EMO2: Audio-Driven Avatarπ€
πAlibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code π₯Ί
πReview https://t.ly/x8slQ
πPaper arxiv.org/pdf/2501.10687
πProject humanaigc.github.io/emote-portrait-alive-2/
πRepo π₯Ί
πAlibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code π₯Ί
πReview https://t.ly/x8slQ
πPaper arxiv.org/pdf/2501.10687
πProject humanaigc.github.io/emote-portrait-alive-2/
πRepo π₯Ί
β€6π€―6π2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ A-Life with Foundation Modelsπ¦
πA super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0π
πReview https://t.ly/7SZ8A
πPaper arxiv.org/pdf/2412.17799
πProject http://pub.sakana.ai/asal/
πRepo https://lnkd.in/dP5yxKtw
πA super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0π
πReview https://t.ly/7SZ8A
πPaper arxiv.org/pdf/2412.17799
πProject http://pub.sakana.ai/asal/
πRepo https://lnkd.in/dP5yxKtw
β€11β‘2π€©2
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ The code of DynOMo is out π₯
πDynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input
πReview https://t.ly/t5pCf
πPaper https://lnkd.in/dwhzz4_t
πRepo github.com/dvl-tum/DynOMo
πProject https://lnkd.in/dMyku2HW
πDynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input
πReview https://t.ly/t5pCf
πPaper https://lnkd.in/dwhzz4_t
πRepo github.com/dvl-tum/DynOMo
πProject https://lnkd.in/dMyku2HW
π₯7β€5π5π2π€©2πΎ2β‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺSOTA Points Segmentationπͺ
πVGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
πReview https://t.ly/8Bsbt
πPaper https://arxiv.org/pdf/2501.12392
πCode https://github.com/karazijal/lrtl
πProject www.robots.ox.ac.uk/~vgg/research/lrtl/
πVGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
πReview https://t.ly/8Bsbt
πPaper https://arxiv.org/pdf/2501.12392
πCode https://github.com/karazijal/lrtl
πProject www.robots.ox.ac.uk/~vgg/research/lrtl/
π₯3β€2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¨MatAnyone: Human Mattingπ¨
πMatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & π€-Demo announcedπ
πReview https://t.ly/NVXsT
πPaper arxiv.org/pdf/2501.14677
πProject pq-yang.github.io/projects/MatAnyone
πRepo TBA
πMatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & π€-Demo announcedπ
πReview https://t.ly/NVXsT
πPaper arxiv.org/pdf/2501.14677
πProject pq-yang.github.io/projects/MatAnyone
πRepo TBA
β€15π2π€©2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦[SOTA] Visual Grounding VOSπ¦
πReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soonπ
πReview https://t.ly/SDFy9
πPaper arxiv.org/pdf/2501.14607
πProject isee-laboratory.github.io/ReferDINO/
πRepo github.com/iSEE-Laboratory/ReferDINO
πReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soonπ
πReview https://t.ly/SDFy9
πPaper arxiv.org/pdf/2501.14607
πProject isee-laboratory.github.io/ReferDINO/
πRepo github.com/iSEE-Laboratory/ReferDINO
π€―4β€1π₯1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ Relightable Full-Body Avatars βοΈ
π#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.
πReview https://t.ly/kx9gf
πPaper arxiv.org/pdf/2501.14726
πProject neuralbodies.github.io/RFGCA
π#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.
πReview https://t.ly/kx9gf
πPaper arxiv.org/pdf/2501.14726
πProject neuralbodies.github.io/RFGCA
β€3π3π₯3β‘1π€―1π’1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π
Generative Human Mesh Recovery π
πGenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced π₯Ί
πReview https://t.ly/Rrzpj
πPaper https://arxiv.org/pdf/2412.14444
πProject m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
πGenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced π₯Ί
πReview https://t.ly/Rrzpj
πPaper https://arxiv.org/pdf/2412.14444
πProject m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
π₯6π2β€1π€―1πΎ1