This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆAniGS: Single Pic Animatable Avatar๐ฆ
๐#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released๐
๐Review https://t.ly/4yfzn
๐Paper arxiv.org/pdf/2412.02684
๐Project lingtengqiu.github.io/2024/AniGS/
๐Repo github.com/aigc3d/AniGS
๐#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released๐
๐Review https://t.ly/4yfzn
๐Paper arxiv.org/pdf/2412.02684
๐Project lingtengqiu.github.io/2024/AniGS/
๐Repo github.com/aigc3d/AniGS
1โค11๐ฅ7๐3๐คฉ2๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคGigaHands: Massive #3D Hands๐งค
๐Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
๐Review https://t.ly/SA0HG
๐Paper www.arxiv.org/pdf/2412.04244
๐Repo github.com/brown-ivl/gigahands
๐Project ivl.cs.brown.edu/research/gigahands.html
๐Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
๐Review https://t.ly/SA0HG
๐Paper www.arxiv.org/pdf/2412.04244
๐Repo github.com/brown-ivl/gigahands
๐Project ivl.cs.brown.edu/research/gigahands.html
โค7๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆข Track4Gen: Diffusion + Tracking ๐ฆข
๐Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced๐ข
๐Review https://t.ly/9ujhc
๐Paper arxiv.org/pdf/2412.06016
๐Project hyeonho99.github.io/track4gen/
๐Gallery hyeonho99.github.io/track4gen/full.html
๐Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced๐ข
๐Review https://t.ly/9ujhc
๐Paper arxiv.org/pdf/2412.06016
๐Project hyeonho99.github.io/track4gen/
๐Gallery hyeonho99.github.io/track4gen/full.html
โค3๐ฅ3๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐น 4D Neural Templates ๐น
๐#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born๐
๐Review https://t.ly/ka_Qf
๐Paper https://arxiv.org/pdf/2412.05278
๐Project https://chen-geng.com/rose4d#toi
๐#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born๐
๐Review https://t.ly/ka_Qf
๐Paper https://arxiv.org/pdf/2412.05278
๐Project https://chen-geng.com/rose4d#toi
๐ฅ8โค2โก1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Gaze-LLE: Neural Gaze ๐
๐Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT ๐
๐Review https://t.ly/SadoF
๐Paper arxiv.org/pdf/2412.09586
๐Repo github.com/fkryan/gazelle
๐Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT ๐
๐Review https://t.ly/SadoF
๐Paper arxiv.org/pdf/2412.09586
๐Repo github.com/fkryan/gazelle
๐ฅ26โค9๐3โก1๐คฉ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซถ Dynamic Cam-4D Hands ๐ซถ
๐The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT๐
๐Review https://t.ly/h5vV7
๐Paper arxiv.org/pdf/2412.12861
๐Project dyn-hamr.github.io/
๐Repo github.com/ZhengdiYu/Dyn-HaMR
๐The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT๐
๐Review https://t.ly/h5vV7
๐Paper arxiv.org/pdf/2412.12861
๐Project dyn-hamr.github.io/
๐Repo github.com/ZhengdiYu/Dyn-HaMR
๐คฉ9๐5๐ฅ4โค3๐ข1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Open-MLLMs Self-Driving ๐
๐OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0๐
๐Review https://t.ly/waLZI
๐Paper https://arxiv.org/pdf/2412.15208
๐Code https://github.com/taco-group/OpenEMMA
๐OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0๐
๐Review https://t.ly/waLZI
๐Paper https://arxiv.org/pdf/2412.15208
๐Code https://github.com/taco-group/OpenEMMA
โค12๐5๐ฅ5๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธ Orient Anything in 3D ๐๏ธ
๏ธ
๐Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐
๐Review https://t.ly/ro5ep
๐Paper arxiv.org/pdf/2412.18605
๐Project orient-anything.github.io/
๐Code https://lnkd.in/d_3k6Nxz
๏ธ
๐Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐
๐Review https://t.ly/ro5ep
๐Paper arxiv.org/pdf/2412.18605
๐Project orient-anything.github.io/
๐Code https://lnkd.in/d_3k6Nxz
๐9โค7๐ฅ3โก1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 10 Papers you loved - 2024โญ
๐Here the list of my posts you liked the most in 2024, thank you all ๐
๐๐๐ฉ๐๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution
๐The full list with links: https://t.ly/GvQVy
๐Here the list of my posts you liked the most in 2024, thank you all ๐
๐๐๐ฉ๐๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution
๐The full list with links: https://t.ly/GvQVy
โค12๐ฅ4๐1๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ HD Video Object Insertion ๐ณ
๐VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
๐Review https://t.ly/hyvRq
๐Paper arxiv.org/pdf/2501.01427
๐Project videoanydoor.github.io/
๐Repo TBA
๐VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
๐Review https://t.ly/hyvRq
๐Paper arxiv.org/pdf/2501.01427
๐Project videoanydoor.github.io/
๐Repo TBA
๐ฅ8โค2๐ฉ2๐1๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
โญ Poll Alert!! โญ
[EDIT] see below
[EDIT] see below
โค3๐2๐ฅ1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
9%
Others ( comment here: https://t.ly/chQWq )
๐11๐ฅ2โค1๐1
AI with Papers - Artificial Intelligence & Deep Learning pinned ยซWhat is your favorite source for the AI updates?ยป
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅฎ SOTA probabilistic tracking๐ฅฎ
๐ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐
๐Review https://t.ly/YY_PH
๐Paper https://arxiv.org/pdf/2501.03220
๐Project michaelszj.github.io/protracker/
๐Code github.com/Michaelszj/pro-tracker
๐ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐
๐Review https://t.ly/YY_PH
๐Paper https://arxiv.org/pdf/2501.03220
๐Project michaelszj.github.io/protracker/
๐Code github.com/Michaelszj/pro-tracker
โค6๐ฅ5๐2๐คฉ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคWorld-Space Ego 3D Hands๐งค
๐The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐
๐Review https://t.ly/ozJn7
๐Paper arxiv.org/pdf/2501.02973
๐Project hawor-project.github.io/
๐Code github.com/ThunderVVV/HaWoR
๐The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐
๐Review https://t.ly/ozJn7
๐Paper arxiv.org/pdf/2501.02973
๐Project hawor-project.github.io/
๐Code github.com/ThunderVVV/HaWoR
๐ฅ4๐ข1๐คฉ1
๐ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐ฅ
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
23%
๐คฒPortabile Training Workstation
35%
โ๏ธNuclear energy for AI training
33%
๐ฒ๏ธCheaper Only-inference devices
9%
๐ฐCloud-intensive Only-inference
๐4โค1๐ฅ1๐คฏ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โฝ FIFA 3D Human Pose โฝ
๐#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐
๐Review https://t.ly/kvGVQ
๐Paper arxiv.org/pdf/2501.02771
๐Project https://lnkd.in/d5hFWpY2
๐Dataset https://lnkd.in/dAphJ9WA
๐#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐
๐Review https://t.ly/kvGVQ
๐Paper arxiv.org/pdf/2501.02771
๐Project https://lnkd.in/d5hFWpY2
๐Dataset https://lnkd.in/dAphJ9WA
๐คฉ7โค6๐คฏ3๐1๐ฉ1๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Depth Any Camera (SOTA) ๐ฅ
๐DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โฆ). Code announced (not available yet)๐
๐Review https://t.ly/1qz4F
๐Paper arxiv.org/pdf/2501.02464
๐Project yuliangguo.github.io/depth-any-camera/
๐Repo github.com/yuliangguo/depth_any_camera
๐DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โฆ). Code announced (not available yet)๐
๐Review https://t.ly/1qz4F
๐Paper arxiv.org/pdf/2501.02464
๐Project yuliangguo.github.io/depth-any-camera/
๐Repo github.com/yuliangguo/depth_any_camera
๐12๐ฅ5๐คฉ4โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
โค๏ธโ๐ฅ Uncommon object in #3D โค๏ธโ๐ฅ
๐#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โฆ coverage. Code & data under CCA 4.0๐
๐Review https://t.ly/Z_tvA
๐Paper https://arxiv.org/pdf/2501.07574
๐Project https://uco3d.github.io/
๐Repo github.com/facebookresearch/uco3d
๐#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โฆ coverage. Code & data under CCA 4.0๐
๐Review https://t.ly/Z_tvA
๐Paper https://arxiv.org/pdf/2501.07574
๐Project https://uco3d.github.io/
๐Repo github.com/facebookresearch/uco3d
โค11โก2๐2๐1๐1๐คฉ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Universal Detector-Free Match๐
๐MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐
๐Review https://t.ly/sx92L
๐Paper https://lnkd.in/dWwRwGyY
๐Project https://lnkd.in/dCwb2Yte
๐Repo https://lnkd.in/dnUXYzQ5
๐MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐
๐Review https://t.ly/sx92L
๐Paper https://lnkd.in/dWwRwGyY
๐Project https://lnkd.in/dCwb2Yte
๐Repo https://lnkd.in/dnUXYzQ5
โค8๐คฏ7๐ฅ4๐3โก1๐คฉ1๐1๐พ1