This media is not supported in your browser
VIEW IN TELEGRAM
⚔️SAMurai: SAM for Tracking⚔️
👉UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0💙
👉Review https://t.ly/yGU0P
👉Paper https://arxiv.org/pdf/2411.11922
👉Repo https://github.com/yangchris11/samurai
👉Project https://yangchris11.github.io/samurai/
👉UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0💙
👉Review https://t.ly/yGU0P
👉Paper https://arxiv.org/pdf/2411.11922
👉Repo https://github.com/yangchris11/samurai
👉Project https://yangchris11.github.io/samurai/
🔥20❤6😍2⚡1👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦖Dino-X: Unified Obj-Centric LVM🦖
👉Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced 💙
👉Review https://t.ly/CSQon
👉Paper https://lnkd.in/dc44ZM8v
👉Project https://lnkd.in/dehKJVvC
👉Repo https://lnkd.in/df8Kb6iz
👉Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced 💙
👉Review https://t.ly/CSQon
👉Paper https://lnkd.in/dc44ZM8v
👉Project https://lnkd.in/dehKJVvC
👉Repo https://lnkd.in/df8Kb6iz
🔥12🤯8❤4👍3🤩1
🌎All Languages Matter: LMMs vs. 100 Lang.🌎
👉ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset 💙
👉Review https://t.ly/VsoJB
👉Paper https://lnkd.in/ddVVZfi2
👉Project https://lnkd.in/dpssaeRq
👉Code https://lnkd.in/dnbaJJE4
👉Dataset https://lnkd.in/drw-_95v
👉ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset 💙
👉Review https://t.ly/VsoJB
👉Paper https://lnkd.in/ddVVZfi2
👉Project https://lnkd.in/dpssaeRq
👉Code https://lnkd.in/dnbaJJE4
👉Dataset https://lnkd.in/drw-_95v
❤3👍1👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦙 EdgeCape: SOTA Agnostic Pose 🦙
👉EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released💙
👉Review https://t.ly/4TpAs
👉Paper https://arxiv.org/pdf/2411.16665
👉Project https://orhir.github.io/edge_cape/
👉Code https://github.com/orhir/EdgeCape
👉EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released💙
👉Review https://t.ly/4TpAs
👉Paper https://arxiv.org/pdf/2411.16665
👉Project https://orhir.github.io/edge_cape/
👉Code https://github.com/orhir/EdgeCape
🔥10👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🛟 StableAnimator: ID-aware Humans 🛟
👉StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
👉Review https://t.ly/JDtL3
👉Paper https://arxiv.org/pdf/2411.17697
👉Project francis-rings.github.io/StableAnimator/
👉Code github.com/Francis-Rings/StableAnimator
👉StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
👉Review https://t.ly/JDtL3
👉Paper https://arxiv.org/pdf/2411.17697
👉Project francis-rings.github.io/StableAnimator/
👉Code github.com/Francis-Rings/StableAnimator
👍12❤3🤯2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧶SOTA track-by-propagation🧶
👉SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 💙
👉Review https://t.ly/QSQ8L
👉Paper arxiv.org/pdf/2410.01806
👉Project sambamotr.github.io/
👉Repo https://lnkd.in/dRDX6nk2
👉SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 💙
👉Review https://t.ly/QSQ8L
👉Paper arxiv.org/pdf/2410.01806
👉Project sambamotr.github.io/
👉Repo https://lnkd.in/dRDX6nk2
❤5🔥2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
👺HiFiVFS: Extreme Face Swapping👺
👉HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced😢
👉Review https://t.ly/ea8dU
👉Paper https://arxiv.org/pdf/2411.18293
👉Project https://cxcx1996.github.io/HiFiVFS
👉HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced😢
👉Review https://t.ly/ea8dU
👉Paper https://arxiv.org/pdf/2411.18293
👉Project https://cxcx1996.github.io/HiFiVFS
🤯13❤2🔥2👍1👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Video Depth without Video Models🔥
👉RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth 🤯 Code under Apache💙
👉Review https://t.ly/R4LqS
👉Paper https://arxiv.org/pdf/2411.19189
👉Project https://rollingdepth.github.io/
👉Repo https://github.com/prs-eth/rollingdepth
👉RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth 🤯 Code under Apache💙
👉Review https://t.ly/R4LqS
👉Paper https://arxiv.org/pdf/2411.19189
👉Project https://rollingdepth.github.io/
👉Repo https://github.com/prs-eth/rollingdepth
🔥14🤯4👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽Universal Soccer Foundation Model⚽
👉Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released💙
👉Review https://t.ly/-X90B
👉Paper https://arxiv.org/pdf/2412.01820
👉Project https://jyrao.github.io/UniSoccer/
👉Repo https://github.com/jyrao/UniSoccer
👉Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released💙
👉Review https://t.ly/-X90B
👉Paper https://arxiv.org/pdf/2412.01820
👉Project https://jyrao.github.io/UniSoccer/
👉Repo https://github.com/jyrao/UniSoccer
🔥8❤2👍2🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Motion Prompting Video Generation🌈
👉DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced 😢
👉Review https://t.ly/VyKbv
👉Paper arxiv.org/pdf/2412.02700
👉Project motion-prompting.github.io
👉DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced 😢
👉Review https://t.ly/VyKbv
👉Paper arxiv.org/pdf/2412.02700
👉Project motion-prompting.github.io
🔥13❤5👏1😢1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦘AniGS: Single Pic Animatable Avatar🦘
👉#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released💙
👉Review https://t.ly/4yfzn
👉Paper arxiv.org/pdf/2412.02684
👉Project lingtengqiu.github.io/2024/AniGS/
👉Repo github.com/aigc3d/AniGS
👉#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released💙
👉Review https://t.ly/4yfzn
👉Paper arxiv.org/pdf/2412.02684
👉Project lingtengqiu.github.io/2024/AniGS/
👉Repo github.com/aigc3d/AniGS
1❤11🔥7👍3🤩2👏1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🧤GigaHands: Massive #3D Hands🧤
👉Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
👉Review https://t.ly/SA0HG
👉Paper www.arxiv.org/pdf/2412.04244
👉Repo github.com/brown-ivl/gigahands
👉Project ivl.cs.brown.edu/research/gigahands.html
👉Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
👉Review https://t.ly/SA0HG
👉Paper www.arxiv.org/pdf/2412.04244
👉Repo github.com/brown-ivl/gigahands
👉Project ivl.cs.brown.edu/research/gigahands.html
❤7👍1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦢 Track4Gen: Diffusion + Tracking 🦢
👉Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced😢
👉Review https://t.ly/9ujhc
👉Paper arxiv.org/pdf/2412.06016
👉Project hyeonho99.github.io/track4gen/
👉Gallery hyeonho99.github.io/track4gen/full.html
👉Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced😢
👉Review https://t.ly/9ujhc
👉Paper arxiv.org/pdf/2412.06016
👉Project hyeonho99.github.io/track4gen/
👉Gallery hyeonho99.github.io/track4gen/full.html
❤3🔥3🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹 4D Neural Templates 🌹
👉#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born💙
👉Review https://t.ly/ka_Qf
👉Paper https://arxiv.org/pdf/2412.05278
👉Project https://chen-geng.com/rose4d#toi
👉#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born💙
👉Review https://t.ly/ka_Qf
👉Paper https://arxiv.org/pdf/2412.05278
👉Project https://chen-geng.com/rose4d#toi
🔥8❤2⚡1👍1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐕 Gaze-LLE: Neural Gaze 🐕
👉Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT 💙
👉Review https://t.ly/SadoF
👉Paper arxiv.org/pdf/2412.09586
👉Repo github.com/fkryan/gazelle
👉Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT 💙
👉Review https://t.ly/SadoF
👉Paper arxiv.org/pdf/2412.09586
👉Repo github.com/fkryan/gazelle
🔥26❤9👍3⚡1🤩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🫶 Dynamic Cam-4D Hands 🫶
👉The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT💙
👉Review https://t.ly/h5vV7
👉Paper arxiv.org/pdf/2412.12861
👉Project dyn-hamr.github.io/
👉Repo github.com/ZhengdiYu/Dyn-HaMR
👉The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT💙
👉Review https://t.ly/h5vV7
👉Paper arxiv.org/pdf/2412.12861
👉Project dyn-hamr.github.io/
👉Repo github.com/ZhengdiYu/Dyn-HaMR
🤩9👍5🔥4❤3😢1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍄 Open-MLLMs Self-Driving 🍄
👉OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0💙
👉Review https://t.ly/waLZI
👉Paper https://arxiv.org/pdf/2412.15208
👉Code https://github.com/taco-group/OpenEMMA
👉OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0💙
👉Review https://t.ly/waLZI
👉Paper https://arxiv.org/pdf/2412.15208
👉Code https://github.com/taco-group/OpenEMMA
❤12👍5🔥5👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔄️ Orient Anything in 3D 🔄️
️
👉Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released💙
👉Review https://t.ly/ro5ep
👉Paper arxiv.org/pdf/2412.18605
👉Project orient-anything.github.io/
👉Code https://lnkd.in/d_3k6Nxz
️
👉Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released💙
👉Review https://t.ly/ro5ep
👉Paper arxiv.org/pdf/2412.18605
👉Project orient-anything.github.io/
👉Code https://lnkd.in/d_3k6Nxz
👍9❤7🔥3⚡1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⭐TOP 10 Papers you loved - 2024⭐
👉Here the list of my posts you liked the most in 2024, thank you all 💙
𝐏𝐚𝐩𝐞𝐫𝐬:
⭐"Look Ma, no markers"
⭐T-Rex 2 Detector
⭐Models at Any Resolution
👉The full list with links: https://t.ly/GvQVy
👉Here the list of my posts you liked the most in 2024, thank you all 💙
𝐏𝐚𝐩𝐞𝐫𝐬:
⭐"Look Ma, no markers"
⭐T-Rex 2 Detector
⭐Models at Any Resolution
👉The full list with links: https://t.ly/GvQVy
❤12🔥4👍1🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 HD Video Object Insertion 🌳
👉VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
👉Review https://t.ly/hyvRq
👉Paper arxiv.org/pdf/2501.01427
👉Project videoanydoor.github.io/
👉Repo TBA
👉VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
👉Review https://t.ly/hyvRq
👉Paper arxiv.org/pdf/2501.01427
👉Project videoanydoor.github.io/
👉Repo TBA
🔥8❤2💩2👍1🤩1😍1