๐All Languages Matter: LMMs vs. 100 Lang.๐
๐ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset ๐
๐Review https://t.ly/VsoJB
๐Paper https://lnkd.in/ddVVZfi2
๐Project https://lnkd.in/dpssaeRq
๐Code https://lnkd.in/dnbaJJE4
๐Dataset https://lnkd.in/drw-_95v
๐ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset ๐
๐Review https://t.ly/VsoJB
๐Paper https://lnkd.in/ddVVZfi2
๐Project https://lnkd.in/dpssaeRq
๐Code https://lnkd.in/dnbaJJE4
๐Dataset https://lnkd.in/drw-_95v
โค3๐1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ EdgeCape: SOTA Agnostic Pose ๐ฆ
๐EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released๐
๐Review https://t.ly/4TpAs
๐Paper https://arxiv.org/pdf/2411.16665
๐Project https://orhir.github.io/edge_cape/
๐Code https://github.com/orhir/EdgeCape
๐EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released๐
๐Review https://t.ly/4TpAs
๐Paper https://arxiv.org/pdf/2411.16665
๐Project https://orhir.github.io/edge_cape/
๐Code https://github.com/orhir/EdgeCape
๐ฅ10๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ StableAnimator: ID-aware Humans ๐
๐StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
๐Review https://t.ly/JDtL3
๐Paper https://arxiv.org/pdf/2411.17697
๐Project francis-rings.github.io/StableAnimator/
๐Code github.com/Francis-Rings/StableAnimator
๐StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
๐Review https://t.ly/JDtL3
๐Paper https://arxiv.org/pdf/2411.17697
๐Project francis-rings.github.io/StableAnimator/
๐Code github.com/Francis-Rings/StableAnimator
๐12โค3๐คฏ2๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งถSOTA track-by-propagation๐งถ
๐SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐
๐Review https://t.ly/QSQ8L
๐Paper arxiv.org/pdf/2410.01806
๐Project sambamotr.github.io/
๐Repo https://lnkd.in/dRDX6nk2
๐SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐
๐Review https://t.ly/QSQ8L
๐Paper arxiv.org/pdf/2410.01806
๐Project sambamotr.github.io/
๐Repo https://lnkd.in/dRDX6nk2
โค5๐ฅ2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐บHiFiVFS: Extreme Face Swapping๐บ
๐HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐ข
๐Review https://t.ly/ea8dU
๐Paper https://arxiv.org/pdf/2411.18293
๐Project https://cxcx1996.github.io/HiFiVFS
๐HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐ข
๐Review https://t.ly/ea8dU
๐Paper https://arxiv.org/pdf/2411.18293
๐Project https://cxcx1996.github.io/HiFiVFS
๐คฏ13โค2๐ฅ2๐1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅVideo Depth without Video Models๐ฅ
๐RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐คฏ Code under Apache๐
๐Review https://t.ly/R4LqS
๐Paper https://arxiv.org/pdf/2411.19189
๐Project https://rollingdepth.github.io/
๐Repo https://github.com/prs-eth/rollingdepth
๐RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐คฏ Code under Apache๐
๐Review https://t.ly/R4LqS
๐Paper https://arxiv.org/pdf/2411.19189
๐Project https://rollingdepth.github.io/
๐Repo https://github.com/prs-eth/rollingdepth
๐ฅ14๐คฏ4๐2๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โฝUniversal Soccer Foundation Modelโฝ
๐Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released๐
๐Review https://t.ly/-X90B
๐Paper https://arxiv.org/pdf/2412.01820
๐Project https://jyrao.github.io/UniSoccer/
๐Repo https://github.com/jyrao/UniSoccer
๐Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released๐
๐Review https://t.ly/-X90B
๐Paper https://arxiv.org/pdf/2412.01820
๐Project https://jyrao.github.io/UniSoccer/
๐Repo https://github.com/jyrao/UniSoccer
๐ฅ8โค2๐2๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Motion Prompting Video Generation๐
๐DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced ๐ข
๐Review https://t.ly/VyKbv
๐Paper arxiv.org/pdf/2412.02700
๐Project motion-prompting.github.io
๐DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced ๐ข
๐Review https://t.ly/VyKbv
๐Paper arxiv.org/pdf/2412.02700
๐Project motion-prompting.github.io
๐ฅ13โค5๐1๐ข1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆAniGS: Single Pic Animatable Avatar๐ฆ
๐#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released๐
๐Review https://t.ly/4yfzn
๐Paper arxiv.org/pdf/2412.02684
๐Project lingtengqiu.github.io/2024/AniGS/
๐Repo github.com/aigc3d/AniGS
๐#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released๐
๐Review https://t.ly/4yfzn
๐Paper arxiv.org/pdf/2412.02684
๐Project lingtengqiu.github.io/2024/AniGS/
๐Repo github.com/aigc3d/AniGS
1โค11๐ฅ7๐3๐คฉ2๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคGigaHands: Massive #3D Hands๐งค
๐Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
๐Review https://t.ly/SA0HG
๐Paper www.arxiv.org/pdf/2412.04244
๐Repo github.com/brown-ivl/gigahands
๐Project ivl.cs.brown.edu/research/gigahands.html
๐Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
๐Review https://t.ly/SA0HG
๐Paper www.arxiv.org/pdf/2412.04244
๐Repo github.com/brown-ivl/gigahands
๐Project ivl.cs.brown.edu/research/gigahands.html
โค7๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆข Track4Gen: Diffusion + Tracking ๐ฆข
๐Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced๐ข
๐Review https://t.ly/9ujhc
๐Paper arxiv.org/pdf/2412.06016
๐Project hyeonho99.github.io/track4gen/
๐Gallery hyeonho99.github.io/track4gen/full.html
๐Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced๐ข
๐Review https://t.ly/9ujhc
๐Paper arxiv.org/pdf/2412.06016
๐Project hyeonho99.github.io/track4gen/
๐Gallery hyeonho99.github.io/track4gen/full.html
โค3๐ฅ3๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐น 4D Neural Templates ๐น
๐#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born๐
๐Review https://t.ly/ka_Qf
๐Paper https://arxiv.org/pdf/2412.05278
๐Project https://chen-geng.com/rose4d#toi
๐#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born๐
๐Review https://t.ly/ka_Qf
๐Paper https://arxiv.org/pdf/2412.05278
๐Project https://chen-geng.com/rose4d#toi
๐ฅ8โค2โก1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Gaze-LLE: Neural Gaze ๐
๐Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT ๐
๐Review https://t.ly/SadoF
๐Paper arxiv.org/pdf/2412.09586
๐Repo github.com/fkryan/gazelle
๐Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT ๐
๐Review https://t.ly/SadoF
๐Paper arxiv.org/pdf/2412.09586
๐Repo github.com/fkryan/gazelle
๐ฅ26โค9๐3โก1๐คฉ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซถ Dynamic Cam-4D Hands ๐ซถ
๐The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT๐
๐Review https://t.ly/h5vV7
๐Paper arxiv.org/pdf/2412.12861
๐Project dyn-hamr.github.io/
๐Repo github.com/ZhengdiYu/Dyn-HaMR
๐The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT๐
๐Review https://t.ly/h5vV7
๐Paper arxiv.org/pdf/2412.12861
๐Project dyn-hamr.github.io/
๐Repo github.com/ZhengdiYu/Dyn-HaMR
๐คฉ9๐5๐ฅ4โค3๐ข1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Open-MLLMs Self-Driving ๐
๐OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0๐
๐Review https://t.ly/waLZI
๐Paper https://arxiv.org/pdf/2412.15208
๐Code https://github.com/taco-group/OpenEMMA
๐OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0๐
๐Review https://t.ly/waLZI
๐Paper https://arxiv.org/pdf/2412.15208
๐Code https://github.com/taco-group/OpenEMMA
โค12๐5๐ฅ5๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธ Orient Anything in 3D ๐๏ธ
๏ธ
๐Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐
๐Review https://t.ly/ro5ep
๐Paper arxiv.org/pdf/2412.18605
๐Project orient-anything.github.io/
๐Code https://lnkd.in/d_3k6Nxz
๏ธ
๐Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐
๐Review https://t.ly/ro5ep
๐Paper arxiv.org/pdf/2412.18605
๐Project orient-anything.github.io/
๐Code https://lnkd.in/d_3k6Nxz
๐9โค7๐ฅ3โก1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 10 Papers you loved - 2024โญ
๐Here the list of my posts you liked the most in 2024, thank you all ๐
๐๐๐ฉ๐๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution
๐The full list with links: https://t.ly/GvQVy
๐Here the list of my posts you liked the most in 2024, thank you all ๐
๐๐๐ฉ๐๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution
๐The full list with links: https://t.ly/GvQVy
โค12๐ฅ4๐1๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ HD Video Object Insertion ๐ณ
๐VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
๐Review https://t.ly/hyvRq
๐Paper arxiv.org/pdf/2501.01427
๐Project videoanydoor.github.io/
๐Repo TBA
๐VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.
๐Review https://t.ly/hyvRq
๐Paper arxiv.org/pdf/2501.01427
๐Project videoanydoor.github.io/
๐Repo TBA
๐ฅ8โค2๐ฉ2๐1๐คฉ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
โญ Poll Alert!! โญ
[EDIT] see below
[EDIT] see below
โค3๐2๐ฅ1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
9%
Others ( comment here: https://t.ly/chQWq )
๐11๐ฅ2โค1๐1