AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
236 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
๐ŸŒŽAll Languages Matter: LMMs vs. 100 Lang.๐ŸŒŽ

๐Ÿ‘‰ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/VsoJB
๐Ÿ‘‰Paper https://lnkd.in/ddVVZfi2
๐Ÿ‘‰Project https://lnkd.in/dpssaeRq
๐Ÿ‘‰Code https://lnkd.in/dnbaJJE4
๐Ÿ‘‰Dataset https://lnkd.in/drw-_95v
โค3๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ™ EdgeCape: SOTA Agnostic Pose ๐Ÿฆ™

๐Ÿ‘‰EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/4TpAs
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.16665
๐Ÿ‘‰Project https://orhir.github.io/edge_cape/
๐Ÿ‘‰Code https://github.com/orhir/EdgeCape
๐Ÿ”ฅ10๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›Ÿ StableAnimator: ID-aware Humans ๐Ÿ›Ÿ

๐Ÿ‘‰StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!

๐Ÿ‘‰Review https://t.ly/JDtL3
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.17697
๐Ÿ‘‰Project francis-rings.github.io/StableAnimator/
๐Ÿ‘‰Code github.com/Francis-Rings/StableAnimator
๐Ÿ‘12โค3๐Ÿคฏ2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงถSOTA track-by-propagation๐Ÿงถ

๐Ÿ‘‰SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QSQ8L
๐Ÿ‘‰Paper arxiv.org/pdf/2410.01806
๐Ÿ‘‰Project sambamotr.github.io/
๐Ÿ‘‰Repo https://lnkd.in/dRDX6nk2
โค5๐Ÿ”ฅ2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘บHiFiVFS: Extreme Face Swapping๐Ÿ‘บ

๐Ÿ‘‰HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/ea8dU
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.18293
๐Ÿ‘‰Project https://cxcx1996.github.io/HiFiVFS
๐Ÿคฏ13โค2๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅVideo Depth without Video Models๐Ÿ”ฅ

๐Ÿ‘‰RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐Ÿคฏ Code under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/R4LqS
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.19189
๐Ÿ‘‰Project https://rollingdepth.github.io/
๐Ÿ‘‰Repo https://github.com/prs-eth/rollingdepth
๐Ÿ”ฅ14๐Ÿคฏ4๐Ÿ‘2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝUniversal Soccer Foundation Modelโšฝ

๐Ÿ‘‰Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-X90B
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.01820
๐Ÿ‘‰Project https://jyrao.github.io/UniSoccer/
๐Ÿ‘‰Repo https://github.com/jyrao/UniSoccer
๐Ÿ”ฅ8โค2๐Ÿ‘2๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆMotion Prompting Video Generation๐ŸŒˆ

๐Ÿ‘‰DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced ๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/VyKbv
๐Ÿ‘‰Paper arxiv.org/pdf/2412.02700
๐Ÿ‘‰Project motion-prompting.github.io
๐Ÿ”ฅ13โค5๐Ÿ‘1๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ˜AniGS: Single Pic Animatable Avatar๐Ÿฆ˜

๐Ÿ‘‰#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/4yfzn
๐Ÿ‘‰Paper arxiv.org/pdf/2412.02684
๐Ÿ‘‰Project lingtengqiu.github.io/2024/AniGS/
๐Ÿ‘‰Repo github.com/aigc3d/AniGS
1โค11๐Ÿ”ฅ7๐Ÿ‘3๐Ÿคฉ2๐Ÿ‘1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคGigaHands: Massive #3D Hands๐Ÿงค

๐Ÿ‘‰Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images

๐Ÿ‘‰Review https://t.ly/SA0HG
๐Ÿ‘‰Paper www.arxiv.org/pdf/2412.04244
๐Ÿ‘‰Repo github.com/brown-ivl/gigahands
๐Ÿ‘‰Project ivl.cs.brown.edu/research/gigahands.html
โค7๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆข Track4Gen: Diffusion + Tracking ๐Ÿฆข

๐Ÿ‘‰Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/9ujhc
๐Ÿ‘‰Paper arxiv.org/pdf/2412.06016
๐Ÿ‘‰Project hyeonho99.github.io/track4gen/
๐Ÿ‘‰Gallery hyeonho99.github.io/track4gen/full.html
โค3๐Ÿ”ฅ3๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒน 4D Neural Templates ๐ŸŒน

๐Ÿ‘‰#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ka_Qf
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.05278
๐Ÿ‘‰Project https://chen-geng.com/rose4d#toi
๐Ÿ”ฅ8โค2โšก1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ• Gaze-LLE: Neural Gaze ๐Ÿ•

๐Ÿ‘‰Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SadoF
๐Ÿ‘‰Paper arxiv.org/pdf/2412.09586
๐Ÿ‘‰Repo github.com/fkryan/gazelle
๐Ÿ”ฅ26โค9๐Ÿ‘3โšก1๐Ÿคฉ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซถ Dynamic Cam-4D Hands ๐Ÿซถ

๐Ÿ‘‰The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/h5vV7
๐Ÿ‘‰Paper arxiv.org/pdf/2412.12861
๐Ÿ‘‰Project dyn-hamr.github.io/
๐Ÿ‘‰Repo github.com/ZhengdiYu/Dyn-HaMR
๐Ÿคฉ9๐Ÿ‘5๐Ÿ”ฅ4โค3๐Ÿ˜ข1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ„ Open-MLLMs Self-Driving ๐Ÿ„

๐Ÿ‘‰OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/waLZI
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.15208
๐Ÿ‘‰Code https://github.com/taco-group/OpenEMMA
โค12๐Ÿ‘5๐Ÿ”ฅ5๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”„๏ธ Orient Anything in 3D ๐Ÿ”„๏ธ
๏ธ
๐Ÿ‘‰Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ro5ep
๐Ÿ‘‰Paper arxiv.org/pdf/2412.18605
๐Ÿ‘‰Project orient-anything.github.io/
๐Ÿ‘‰Code https://lnkd.in/d_3k6Nxz
๐Ÿ‘9โค7๐Ÿ”ฅ3โšก1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 10 Papers you loved - 2024โญ

๐Ÿ‘‰Here the list of my posts you liked the most in 2024, thank you all ๐Ÿ’™

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution

๐Ÿ‘‰The full list with links: https://t.ly/GvQVy
โค12๐Ÿ”ฅ4๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒณ HD Video Object Insertion ๐ŸŒณ

๐Ÿ‘‰VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.

๐Ÿ‘‰Review https://t.ly/hyvRq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.01427
๐Ÿ‘‰Project videoanydoor.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค2๐Ÿ’ฉ2๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โญ Poll Alert!! โญ

[EDIT] see below
โค3๐Ÿ‘2๐Ÿ”ฅ1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
๐Ÿ‘11๐Ÿ”ฅ2โค1๐Ÿ˜1