This media is not supported in your browser
VIEW IN TELEGRAM
🐕 Gaze-LLE: Neural Gaze 🐕
👉Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT 💙
👉Review https://t.ly/SadoF
👉Paper arxiv.org/pdf/2412.09586
👉Repo github.com/fkryan/gazelle
👉Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT 💙
👉Review https://t.ly/SadoF
👉Paper arxiv.org/pdf/2412.09586
👉Repo github.com/fkryan/gazelle
🔥26❤9👍3⚡1🤩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🫶 Dynamic Cam-4D Hands 🫶
👉The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT💙
👉Review https://t.ly/h5vV7
👉Paper arxiv.org/pdf/2412.12861
👉Project dyn-hamr.github.io/
👉Repo github.com/ZhengdiYu/Dyn-HaMR
👉The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT💙
👉Review https://t.ly/h5vV7
👉Paper arxiv.org/pdf/2412.12861
👉Project dyn-hamr.github.io/
👉Repo github.com/ZhengdiYu/Dyn-HaMR
🤩9👍5🔥4❤3😢1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍄 Open-MLLMs Self-Driving 🍄
👉OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0💙
👉Review https://t.ly/waLZI
👉Paper https://arxiv.org/pdf/2412.15208
👉Code https://github.com/taco-group/OpenEMMA
👉OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0💙
👉Review https://t.ly/waLZI
👉Paper https://arxiv.org/pdf/2412.15208
👉Code https://github.com/taco-group/OpenEMMA
❤12👍5🔥5👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔄️ Orient Anything in 3D 🔄️
️
👉Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released💙
👉Review https://t.ly/ro5ep
👉Paper arxiv.org/pdf/2412.18605
👉Project orient-anything.github.io/
👉Code https://lnkd.in/d_3k6Nxz
️
👉Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released💙
👉Review https://t.ly/ro5ep
👉Paper arxiv.org/pdf/2412.18605
👉Project orient-anything.github.io/
👉Code https://lnkd.in/d_3k6Nxz
👍9❤7🔥3⚡1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⭐TOP 10 Papers you loved - 2024⭐
👉Here the list of my posts you liked the most in 2024, thank you all 💙
𝐏𝐚𝐩𝐞𝐫𝐬:
⭐"Look Ma, no markers"
⭐T-Rex 2 Detector
⭐Models at Any Resolution
👉The full list with links: https://t.ly/GvQVy
👉Here the list of my posts you liked the most in 2024, thank you all 💙
𝐏𝐚𝐩𝐞𝐫𝐬:
⭐"Look Ma, no markers"
⭐T-Rex 2 Detector
⭐Models at Any Resolution
👉The full list with links: https://t.ly/GvQVy
❤12🔥4👍1🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 HD Video Object Insertion 🌳
👉VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
👉Review https://t.ly/hyvRq
👉Paper arxiv.org/pdf/2501.01427
👉Project videoanydoor.github.io/
👉Repo TBA
👉VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
👉Review https://t.ly/hyvRq
👉Paper arxiv.org/pdf/2501.01427
👉Project videoanydoor.github.io/
👉Repo TBA
🔥8❤2💩2👍1🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
⭐ Poll Alert!! ⭐
[EDIT] see below
[EDIT] see below
❤3👍2🔥1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
9%
Others ( comment here: https://t.ly/chQWq )
👍11🔥2❤1😍1
AI with Papers - Artificial Intelligence & Deep Learning pinned «What is your favorite source for the AI updates?»
This media is not supported in your browser
VIEW IN TELEGRAM
🥮 SOTA probabilistic tracking🥮
👉ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial💙
👉Review https://t.ly/YY_PH
👉Paper https://arxiv.org/pdf/2501.03220
👉Project michaelszj.github.io/protracker/
👉Code github.com/Michaelszj/pro-tracker
👉ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial💙
👉Review https://t.ly/YY_PH
👉Paper https://arxiv.org/pdf/2501.03220
👉Project michaelszj.github.io/protracker/
👉Code github.com/Michaelszj/pro-tracker
❤6🔥5👍2🤩2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧤World-Space Ego 3D Hands🧤
👉The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.💙
👉Review https://t.ly/ozJn7
👉Paper arxiv.org/pdf/2501.02973
👉Project hawor-project.github.io/
👉Code github.com/ThunderVVV/HaWoR
👉The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.💙
👉Review https://t.ly/ozJn7
👉Paper arxiv.org/pdf/2501.02973
👉Project hawor-project.github.io/
👉Code github.com/ThunderVVV/HaWoR
🔥4😢1🤩1
🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥
⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
23%
🤲Portabile Training Workstation
35%
⚛️Nuclear energy for AI training
33%
🖲️Cheaper Only-inference devices
9%
💰Cloud-intensive Only-inference
👍4❤1🔥1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ FIFA 3D Human Pose ⚽
👉#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released 💙
👉Review https://t.ly/kvGVQ
👉Paper arxiv.org/pdf/2501.02771
👉Project https://lnkd.in/d5hFWpY2
👉Dataset https://lnkd.in/dAphJ9WA
👉#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released 💙
👉Review https://t.ly/kvGVQ
👉Paper arxiv.org/pdf/2501.02771
👉Project https://lnkd.in/d5hFWpY2
👉Dataset https://lnkd.in/dAphJ9WA
🤩7❤6🤯3👏1💩1😍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Depth Any Camera (SOTA) 🔥
👉DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360◦). Code announced (not available yet)💙
👉Review https://t.ly/1qz4F
👉Paper arxiv.org/pdf/2501.02464
👉Project yuliangguo.github.io/depth-any-camera/
👉Repo github.com/yuliangguo/depth_any_camera
👉DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360◦). Code announced (not available yet)💙
👉Review https://t.ly/1qz4F
👉Paper arxiv.org/pdf/2501.02464
👉Project yuliangguo.github.io/depth-any-camera/
👉Repo github.com/yuliangguo/depth_any_camera
👍12🔥5🤩4❤2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
❤️🔥 Uncommon object in #3D ❤️🔥
👉#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360◦ coverage. Code & data under CCA 4.0💙
👉Review https://t.ly/Z_tvA
👉Paper https://arxiv.org/pdf/2501.07574
👉Project https://uco3d.github.io/
👉Repo github.com/facebookresearch/uco3d
👉#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360◦ coverage. Code & data under CCA 4.0💙
👉Review https://t.ly/Z_tvA
👉Paper https://arxiv.org/pdf/2501.07574
👉Project https://uco3d.github.io/
👉Repo github.com/facebookresearch/uco3d
❤11⚡2😍2👍1👏1🤩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🏆Universal Detector-Free Match🏆
👉MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released 💙
👉Review https://t.ly/sx92L
👉Paper https://lnkd.in/dWwRwGyY
👉Project https://lnkd.in/dCwb2Yte
👉Repo https://lnkd.in/dnUXYzQ5
👉MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released 💙
👉Review https://t.ly/sx92L
👉Paper https://lnkd.in/dWwRwGyY
👉Project https://lnkd.in/dCwb2Yte
👉Repo https://lnkd.in/dnUXYzQ5
❤8🤯7🔥4👏3⚡1🤩1😍1🍾1
🆘 Help: Looking for Outstanding Speakers 🆘
👉Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only “hardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)
👉Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only “hardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)
❤5🔥4👍3
This media is not supported in your browser
VIEW IN TELEGRAM
🧞♂️Omni-RGPT: SOTA MLLM Understanding🧞♂️
👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
🔥10❤3🍾2⚡1👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 GAGA: Group Any Gaussians 🔥
👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙
👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga
👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙
👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga
🔥11❤3👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁Free Book: LLM Foundations🎁
👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
✅Chapter 1: basics of pre-training
✅Chapter 2: gen-models & LLMs
✅Chapter 3: prompting methods
✅Chapter 4: alignment methods
👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf
👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
✅Chapter 1: basics of pre-training
✅Chapter 2: gen-models & LLMs
✅Chapter 3: prompting methods
✅Chapter 4: alignment methods
👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf
❤17🔥6👏3😍1