AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงถSOTA track-by-propagation๐Ÿงถ

๐Ÿ‘‰SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QSQ8L
๐Ÿ‘‰Paper arxiv.org/pdf/2410.01806
๐Ÿ‘‰Project sambamotr.github.io/
๐Ÿ‘‰Repo https://lnkd.in/dRDX6nk2
โค5๐Ÿ”ฅ2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘บHiFiVFS: Extreme Face Swapping๐Ÿ‘บ

๐Ÿ‘‰HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/ea8dU
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.18293
๐Ÿ‘‰Project https://cxcx1996.github.io/HiFiVFS
๐Ÿคฏ13โค2๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅVideo Depth without Video Models๐Ÿ”ฅ

๐Ÿ‘‰RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐Ÿคฏ Code under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/R4LqS
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.19189
๐Ÿ‘‰Project https://rollingdepth.github.io/
๐Ÿ‘‰Repo https://github.com/prs-eth/rollingdepth
๐Ÿ”ฅ14๐Ÿคฏ4๐Ÿ‘2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝUniversal Soccer Foundation Modelโšฝ

๐Ÿ‘‰Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-X90B
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.01820
๐Ÿ‘‰Project https://jyrao.github.io/UniSoccer/
๐Ÿ‘‰Repo https://github.com/jyrao/UniSoccer
๐Ÿ”ฅ8โค2๐Ÿ‘2๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆMotion Prompting Video Generation๐ŸŒˆ

๐Ÿ‘‰DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced ๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/VyKbv
๐Ÿ‘‰Paper arxiv.org/pdf/2412.02700
๐Ÿ‘‰Project motion-prompting.github.io
๐Ÿ”ฅ13โค5๐Ÿ‘1๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ˜AniGS: Single Pic Animatable Avatar๐Ÿฆ˜

๐Ÿ‘‰#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/4yfzn
๐Ÿ‘‰Paper arxiv.org/pdf/2412.02684
๐Ÿ‘‰Project lingtengqiu.github.io/2024/AniGS/
๐Ÿ‘‰Repo github.com/aigc3d/AniGS
1โค11๐Ÿ”ฅ7๐Ÿ‘3๐Ÿคฉ2๐Ÿ‘1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคGigaHands: Massive #3D Hands๐Ÿงค

๐Ÿ‘‰Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images

๐Ÿ‘‰Review https://t.ly/SA0HG
๐Ÿ‘‰Paper www.arxiv.org/pdf/2412.04244
๐Ÿ‘‰Repo github.com/brown-ivl/gigahands
๐Ÿ‘‰Project ivl.cs.brown.edu/research/gigahands.html
โค7๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆข Track4Gen: Diffusion + Tracking ๐Ÿฆข

๐Ÿ‘‰Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/9ujhc
๐Ÿ‘‰Paper arxiv.org/pdf/2412.06016
๐Ÿ‘‰Project hyeonho99.github.io/track4gen/
๐Ÿ‘‰Gallery hyeonho99.github.io/track4gen/full.html
โค3๐Ÿ”ฅ3๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒน 4D Neural Templates ๐ŸŒน

๐Ÿ‘‰#Stanford unveils Neural Templates, generating HQ temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, at any time of their lifespan. A novel task in vision is born๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ka_Qf
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.05278
๐Ÿ‘‰Project https://chen-geng.com/rose4d#toi
๐Ÿ”ฅ8โค2โšก1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ• Gaze-LLE: Neural Gaze ๐Ÿ•

๐Ÿ‘‰Gaze-LLE: novel transformer framework that streamlines gaze target by leveraging features from frozen DINOv2 encoder. Code & models under MIT ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SadoF
๐Ÿ‘‰Paper arxiv.org/pdf/2412.09586
๐Ÿ‘‰Repo github.com/fkryan/gazelle
๐Ÿ”ฅ26โค9๐Ÿ‘3โšก1๐Ÿคฉ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซถ Dynamic Cam-4D Hands ๐Ÿซถ

๐Ÿ‘‰The Imperial College unveils Dyn-HaMR, the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. Code announced under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/h5vV7
๐Ÿ‘‰Paper arxiv.org/pdf/2412.12861
๐Ÿ‘‰Project dyn-hamr.github.io/
๐Ÿ‘‰Repo github.com/ZhengdiYu/Dyn-HaMR
๐Ÿคฉ9๐Ÿ‘5๐Ÿ”ฅ4โค3๐Ÿ˜ข1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ„ Open-MLLMs Self-Driving ๐Ÿ„

๐Ÿ‘‰OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/waLZI
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.15208
๐Ÿ‘‰Code https://github.com/taco-group/OpenEMMA
โค12๐Ÿ‘5๐Ÿ”ฅ5๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”„๏ธ Orient Anything in 3D ๐Ÿ”„๏ธ
๏ธ
๐Ÿ‘‰Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ro5ep
๐Ÿ‘‰Paper arxiv.org/pdf/2412.18605
๐Ÿ‘‰Project orient-anything.github.io/
๐Ÿ‘‰Code https://lnkd.in/d_3k6Nxz
๐Ÿ‘9โค7๐Ÿ”ฅ3โšก1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 10 Papers you loved - 2024โญ

๐Ÿ‘‰Here the list of my posts you liked the most in 2024, thank you all ๐Ÿ’™

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution

๐Ÿ‘‰The full list with links: https://t.ly/GvQVy
โค12๐Ÿ”ฅ4๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒณ HD Video Object Insertion ๐ŸŒณ

๐Ÿ‘‰VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.

๐Ÿ‘‰Review https://t.ly/hyvRq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.01427
๐Ÿ‘‰Project videoanydoor.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค2๐Ÿ’ฉ2๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โญ Poll Alert!! โญ

[EDIT] see below
โค3๐Ÿ‘2๐Ÿ”ฅ1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
๐Ÿ‘11๐Ÿ”ฅ2โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅฎ SOTA probabilistic tracking๐Ÿฅฎ

๐Ÿ‘‰ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/YY_PH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.03220
๐Ÿ‘‰Project michaelszj.github.io/protracker/
๐Ÿ‘‰Code github.com/Michaelszj/pro-tracker
โค6๐Ÿ”ฅ5๐Ÿ‘2๐Ÿคฉ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคWorld-Space Ego 3D Hands๐Ÿงค

๐Ÿ‘‰The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ozJn7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02973
๐Ÿ‘‰Project hawor-project.github.io/
๐Ÿ‘‰Code github.com/ThunderVVV/HaWoR
๐Ÿ”ฅ4๐Ÿ˜ข1๐Ÿคฉ1