What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
9%
Others ( comment here: https://t.ly/chQWq )
AI with Papers - Artificial Intelligence & Deep Learning pinned ยซWhat is your favorite source for the AI updates?ยป
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅฎ SOTA probabilistic tracking๐ฅฎ
๐ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐
๐Review https://t.ly/YY_PH
๐Paper https://arxiv.org/pdf/2501.03220
๐Project michaelszj.github.io/protracker/
๐Code github.com/Michaelszj/pro-tracker
๐ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐
๐Review https://t.ly/YY_PH
๐Paper https://arxiv.org/pdf/2501.03220
๐Project michaelszj.github.io/protracker/
๐Code github.com/Michaelszj/pro-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคWorld-Space Ego 3D Hands๐งค
๐The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐
๐Review https://t.ly/ozJn7
๐Paper arxiv.org/pdf/2501.02973
๐Project hawor-project.github.io/
๐Code github.com/ThunderVVV/HaWoR
๐The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐
๐Review https://t.ly/ozJn7
๐Paper arxiv.org/pdf/2501.02973
๐Project hawor-project.github.io/
๐Code github.com/ThunderVVV/HaWoR
๐ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐ฅ
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
๐คฒPortabile Training Workstation
35%
โ๏ธNuclear energy for AI training
33%
๐ฒ๏ธCheaper Only-inference devices
9%
๐ฐCloud-intensive Only-inference
This media is not supported in your browser
VIEW IN TELEGRAM
โฝ FIFA 3D Human Pose โฝ
๐#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐
๐Review https://t.ly/kvGVQ
๐Paper arxiv.org/pdf/2501.02771
๐Project https://lnkd.in/d5hFWpY2
๐Dataset https://lnkd.in/dAphJ9WA
๐#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐
๐Review https://t.ly/kvGVQ
๐Paper arxiv.org/pdf/2501.02771
๐Project https://lnkd.in/d5hFWpY2
๐Dataset https://lnkd.in/dAphJ9WA
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Depth Any Camera (SOTA) ๐ฅ
๐DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โฆ). Code announced (not available yet)๐
๐Review https://t.ly/1qz4F
๐Paper arxiv.org/pdf/2501.02464
๐Project yuliangguo.github.io/depth-any-camera/
๐Repo github.com/yuliangguo/depth_any_camera
๐DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โฆ). Code announced (not available yet)๐
๐Review https://t.ly/1qz4F
๐Paper arxiv.org/pdf/2501.02464
๐Project yuliangguo.github.io/depth-any-camera/
๐Repo github.com/yuliangguo/depth_any_camera
This media is not supported in your browser
VIEW IN TELEGRAM
โค๏ธโ๐ฅ Uncommon object in #3D โค๏ธโ๐ฅ
๐#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โฆ coverage. Code & data under CCA 4.0๐
๐Review https://t.ly/Z_tvA
๐Paper https://arxiv.org/pdf/2501.07574
๐Project https://uco3d.github.io/
๐Repo github.com/facebookresearch/uco3d
๐#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โฆ coverage. Code & data under CCA 4.0๐
๐Review https://t.ly/Z_tvA
๐Paper https://arxiv.org/pdf/2501.07574
๐Project https://uco3d.github.io/
๐Repo github.com/facebookresearch/uco3d
This media is not supported in your browser
VIEW IN TELEGRAM
๐Universal Detector-Free Match๐
๐MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐
๐Review https://t.ly/sx92L
๐Paper https://lnkd.in/dWwRwGyY
๐Project https://lnkd.in/dCwb2Yte
๐Repo https://lnkd.in/dnUXYzQ5
๐MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐
๐Review https://t.ly/sx92L
๐Paper https://lnkd.in/dWwRwGyY
๐Project https://lnkd.in/dCwb2Yte
๐Repo https://lnkd.in/dnUXYzQ5
๐ Help: Looking for Outstanding Speakers ๐
๐Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only โhardcoreโ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
โญGuaranteed tickets & more for the suggestions that will become invited speakers ;)
๐Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only โhardcoreโ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
โญGuaranteed tickets & more for the suggestions that will become invited speakers ;)
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธOmni-RGPT: SOTA MLLM Understanding๐งโโ๏ธ
๐ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
๐Review https://t.ly/KHnQ7
๐Paper arxiv.org/pdf/2501.08326
๐Project miranheo.github.io/omni-rgpt/
๐Repo TBA soon
๐ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
๐Review https://t.ly/KHnQ7
๐Paper arxiv.org/pdf/2501.08326
๐Project miranheo.github.io/omni-rgpt/
๐Repo TBA soon
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ GAGA: Group Any Gaussians ๐ฅ
๐GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated๐
๐Review https://t.ly/Nk_jT
๐Paper www.gaga.gallery/static/pdf/Gaga.pdf
๐Project www.gaga.gallery/
๐Repo github.com/weijielyu/Gaga
๐GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated๐
๐Review https://t.ly/Nk_jT
๐Paper www.gaga.gallery/static/pdf/Gaga.pdf
๐Project www.gaga.gallery/
๐Repo github.com/weijielyu/Gaga
This media is not supported in your browser
VIEW IN TELEGRAM
๐Free Book: LLM Foundations๐
๐A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
โ Chapter 1: basics of pre-training
โ Chapter 2: gen-models & LLMs
โ Chapter 3: prompting methods
โ Chapter 4: alignment methods
๐If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
๐Review https://t.ly/9LGCa
๐Book https://lnkd.in/d3VkswZf
๐A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
โ Chapter 1: basics of pre-training
โ Chapter 2: gen-models & LLMs
โ Chapter 3: prompting methods
โ Chapter 4: alignment methods
๐If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
๐Review https://t.ly/9LGCa
๐Book https://lnkd.in/d3VkswZf
This media is not supported in your browser
VIEW IN TELEGRAM
๐โโ๏ธ GSTAR: Gaussian Surface Tracking ๐โโ๏ธ
๐ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐
๐Review https://t.ly/udpMq
๐Paper arxiv.org/pdf/2501.10283
๐Project chengwei-zheng.github.io/GSTAR/
๐Repo TBA
๐ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐
๐Review https://t.ly/udpMq
๐Paper arxiv.org/pdf/2501.10283
๐Project chengwei-zheng.github.io/GSTAR/
๐Repo TBA
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฝ Diffusion Video Inpainting ๐งฝ
๐#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐
๐Review https://t.ly/7rEll
๐Paper arxiv.org/pdf/2501.10018
๐Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Repo github.com/lixiaowen-xw/DiffuEraser
๐#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐
๐Review https://t.ly/7rEll
๐Paper arxiv.org/pdf/2501.10018
๐Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Repo github.com/lixiaowen-xw/DiffuEraser
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #Nvidia Foundation ZS-Stereo ๐
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ [SOTA] Long-Video Depth Anything ๐ฅ
๐ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐
๐Review https://t.ly/Q4ZZd
๐Paper arxiv.org/pdf/2501.12375
๐Project https://lnkd.in/dKNwJzbM
๐Repo https://lnkd.in/ddfwwpCj
๐ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐
๐Review https://t.ly/Q4ZZd
๐Paper arxiv.org/pdf/2501.12375
๐Project https://lnkd.in/dKNwJzbM
๐Repo https://lnkd.in/ddfwwpCj
This media is not supported in your browser
VIEW IN TELEGRAM
๐งตTime-Aware Pts-Tracking๐งต
๐Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐
๐Review https://t.ly/XAL7G
๐Paper arxiv.orgzpdf/2501.12218
๐Project cvlab-kaist.github.io/Chrono/
๐Repo github.com/cvlab-kaist/Chrono
๐Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐
๐Review https://t.ly/XAL7G
๐Paper arxiv.orgzpdf/2501.12218
๐Project cvlab-kaist.github.io/Chrono/
๐Repo github.com/cvlab-kaist/Chrono
This media is not supported in your browser
VIEW IN TELEGRAM
๐คEMO2: Audio-Driven Avatar๐ค
๐Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐ฅบ
๐Review https://t.ly/x8slQ
๐Paper arxiv.org/pdf/2501.10687
๐Project humanaigc.github.io/emote-portrait-alive-2/
๐Repo ๐ฅบ
๐Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐ฅบ
๐Review https://t.ly/x8slQ
๐Paper arxiv.org/pdf/2501.10687
๐Project humanaigc.github.io/emote-portrait-alive-2/
๐Repo ๐ฅบ
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ A-Life with Foundation Models๐ฆ
๐A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐
๐Review https://t.ly/7SZ8A
๐Paper arxiv.org/pdf/2412.17799
๐Project http://pub.sakana.ai/asal/
๐Repo https://lnkd.in/dP5yxKtw
๐A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐
๐Review https://t.ly/7SZ8A
๐Paper arxiv.org/pdf/2412.17799
๐Project http://pub.sakana.ai/asal/
๐Repo https://lnkd.in/dP5yxKtw