AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 10 Papers you loved - 2024โญ

๐Ÿ‘‰Here the list of my posts you liked the most in 2024, thank you all ๐Ÿ’™

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution

๐Ÿ‘‰The full list with links: https://t.ly/GvQVy
โค12๐Ÿ”ฅ4๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒณ HD Video Object Insertion ๐ŸŒณ

๐Ÿ‘‰VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.

๐Ÿ‘‰Review https://t.ly/hyvRq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.01427
๐Ÿ‘‰Project videoanydoor.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค2๐Ÿ’ฉ2๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โญ Poll Alert!! โญ

[EDIT] see below
โค3๐Ÿ‘2๐Ÿ”ฅ1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
๐Ÿ‘11๐Ÿ”ฅ2โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅฎ SOTA probabilistic tracking๐Ÿฅฎ

๐Ÿ‘‰ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/YY_PH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.03220
๐Ÿ‘‰Project michaelszj.github.io/protracker/
๐Ÿ‘‰Code github.com/Michaelszj/pro-tracker
โค6๐Ÿ”ฅ5๐Ÿ‘2๐Ÿคฉ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคWorld-Space Ego 3D Hands๐Ÿงค

๐Ÿ‘‰The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ozJn7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02973
๐Ÿ‘‰Project hawor-project.github.io/
๐Ÿ‘‰Code github.com/ThunderVVV/HaWoR
๐Ÿ”ฅ4๐Ÿ˜ข1๐Ÿคฉ1
๐Ÿ”ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐Ÿ”ฅ

โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
23%
๐ŸคฒPortabile Training Workstation
35%
โš›๏ธNuclear energy for AI training
33%
๐Ÿ–ฒ๏ธCheaper Only-inference devices
9%
๐Ÿ’ฐCloud-intensive Only-inference
๐Ÿ‘4โค1๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ FIFA 3D Human Pose โšฝ

๐Ÿ‘‰#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/kvGVQ
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02771
๐Ÿ‘‰Project https://lnkd.in/d5hFWpY2
๐Ÿ‘‰Dataset https://lnkd.in/dAphJ9WA
๐Ÿคฉ7โค6๐Ÿคฏ3๐Ÿ‘1๐Ÿ’ฉ1๐Ÿ˜1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Depth Any Camera (SOTA) ๐Ÿ”ฅ

๐Ÿ‘‰DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โ—ฆ). Code announced (not available yet)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/1qz4F
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02464
๐Ÿ‘‰Project yuliangguo.github.io/depth-any-camera/
๐Ÿ‘‰Repo github.com/yuliangguo/depth_any_camera
๐Ÿ‘12๐Ÿ”ฅ5๐Ÿคฉ4โค2๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โค๏ธโ€๐Ÿ”ฅ Uncommon object in #3D โค๏ธโ€๐Ÿ”ฅ

๐Ÿ‘‰#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โ—ฆ coverage. Code & data under CCA 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Z_tvA
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.07574
๐Ÿ‘‰Project https://uco3d.github.io/
๐Ÿ‘‰Repo github.com/facebookresearch/uco3d
โค11โšก2๐Ÿ˜2๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ†Universal Detector-Free Match๐Ÿ†

๐Ÿ‘‰MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/sx92L
๐Ÿ‘‰Paper https://lnkd.in/dWwRwGyY
๐Ÿ‘‰Project https://lnkd.in/dCwb2Yte
๐Ÿ‘‰Repo https://lnkd.in/dnUXYzQ5
โค8๐Ÿคฏ7๐Ÿ”ฅ4๐Ÿ‘3โšก1๐Ÿคฉ1๐Ÿ˜1๐Ÿพ1
๐Ÿ†˜ Help: Looking for Outstanding Speakers ๐Ÿ†˜

๐Ÿ‘‰Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only โ€œhardcoreโ€ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

โญGuaranteed tickets & more for the suggestions that will become invited speakers ;)
โค5๐Ÿ”ฅ4๐Ÿ‘3
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™‚๏ธOmni-RGPT: SOTA MLLM Understanding๐Ÿงžโ€โ™‚๏ธ

๐Ÿ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

๐Ÿ‘‰Review https://t.ly/KHnQ7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.08326
๐Ÿ‘‰Project miranheo.github.io/omni-rgpt/
๐Ÿ‘‰Repo TBA soon
๐Ÿ”ฅ10โค3๐Ÿพ2โšก1๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ GAGA: Group Any Gaussians ๐Ÿ”ฅ

๐Ÿ‘‰GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Nk_jT
๐Ÿ‘‰Paper www.gaga.gallery/static/pdf/Gaga.pdf
๐Ÿ‘‰Project www.gaga.gallery/
๐Ÿ‘‰Repo github.com/weijielyu/Gaga
๐Ÿ”ฅ11โค3๐Ÿ‘2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽFree Book: LLM Foundations๐ŸŽ

๐Ÿ‘‰A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

โœ…Chapter 1: basics of pre-training
โœ…Chapter 2: gen-models & LLMs
โœ…Chapter 3: prompting methods
โœ…Chapter 4: alignment methods

๐Ÿ‘‰If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

๐Ÿ‘‰Review https://t.ly/9LGCa
๐Ÿ‘‰Book https://lnkd.in/d3VkswZf
โค17๐Ÿ”ฅ6๐Ÿ‘3๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ„โ€โ™€๏ธ GSTAR: Gaussian Surface Tracking ๐Ÿ„โ€โ™€๏ธ

๐Ÿ‘‰ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/udpMq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10283
๐Ÿ‘‰Project chengwei-zheng.github.io/GSTAR/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8๐Ÿคฉ3๐Ÿ‘2๐Ÿ˜2โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฝ Diffusion Video Inpainting ๐Ÿงฝ

๐Ÿ‘‰#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7rEll
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10018
๐Ÿ‘‰Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Ÿ‘‰Repo github.com/lixiaowen-xw/DiffuEraser
๐Ÿ”ฅ14โค3๐Ÿ‘2โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ #Nvidia Foundation ZS-Stereo ๐ŸŒˆ

๐Ÿ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rfBr5
๐Ÿ‘‰Paper arxiv.org/pdf/2501.09898
๐Ÿ‘‰Project nvlabs.github.io/FoundationStereo/
๐Ÿ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐Ÿ”ฅ6๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ [SOTA] Long-Video Depth Anything ๐Ÿ”ฅ

๐Ÿ‘‰ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q4ZZd
๐Ÿ‘‰Paper arxiv.org/pdf/2501.12375
๐Ÿ‘‰Project https://lnkd.in/dKNwJzbM
๐Ÿ‘‰Repo https://lnkd.in/ddfwwpCj
๐Ÿ”ฅ9๐Ÿคฏ1