AI with Papers - Artificial Intelligence & Deep Learning
14.6K subscribers
95 photos
235 videos
11 files
1.25K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅฎ SOTA probabilistic tracking๐Ÿฅฎ

๐Ÿ‘‰ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/YY_PH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.03220
๐Ÿ‘‰Project michaelszj.github.io/protracker/
๐Ÿ‘‰Code github.com/Michaelszj/pro-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคWorld-Space Ego 3D Hands๐Ÿงค

๐Ÿ‘‰The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ozJn7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02973
๐Ÿ‘‰Project hawor-project.github.io/
๐Ÿ‘‰Code github.com/ThunderVVV/HaWoR
๐Ÿ”ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐Ÿ”ฅ

โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
๐ŸคฒPortabile Training Workstation
35%
โš›๏ธNuclear energy for AI training
33%
๐Ÿ–ฒ๏ธCheaper Only-inference devices
9%
๐Ÿ’ฐCloud-intensive Only-inference
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ FIFA 3D Human Pose โšฝ

๐Ÿ‘‰#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/kvGVQ
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02771
๐Ÿ‘‰Project https://lnkd.in/d5hFWpY2
๐Ÿ‘‰Dataset https://lnkd.in/dAphJ9WA
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Depth Any Camera (SOTA) ๐Ÿ”ฅ

๐Ÿ‘‰DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โ—ฆ). Code announced (not available yet)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/1qz4F
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02464
๐Ÿ‘‰Project yuliangguo.github.io/depth-any-camera/
๐Ÿ‘‰Repo github.com/yuliangguo/depth_any_camera
This media is not supported in your browser
VIEW IN TELEGRAM
โค๏ธโ€๐Ÿ”ฅ Uncommon object in #3D โค๏ธโ€๐Ÿ”ฅ

๐Ÿ‘‰#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โ—ฆ coverage. Code & data under CCA 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Z_tvA
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.07574
๐Ÿ‘‰Project https://uco3d.github.io/
๐Ÿ‘‰Repo github.com/facebookresearch/uco3d
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ†Universal Detector-Free Match๐Ÿ†

๐Ÿ‘‰MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/sx92L
๐Ÿ‘‰Paper https://lnkd.in/dWwRwGyY
๐Ÿ‘‰Project https://lnkd.in/dCwb2Yte
๐Ÿ‘‰Repo https://lnkd.in/dnUXYzQ5
๐Ÿ†˜ Help: Looking for Outstanding Speakers ๐Ÿ†˜

๐Ÿ‘‰Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only โ€œhardcoreโ€ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

โญGuaranteed tickets & more for the suggestions that will become invited speakers ;)
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™‚๏ธOmni-RGPT: SOTA MLLM Understanding๐Ÿงžโ€โ™‚๏ธ

๐Ÿ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

๐Ÿ‘‰Review https://t.ly/KHnQ7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.08326
๐Ÿ‘‰Project miranheo.github.io/omni-rgpt/
๐Ÿ‘‰Repo TBA soon
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ GAGA: Group Any Gaussians ๐Ÿ”ฅ

๐Ÿ‘‰GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Nk_jT
๐Ÿ‘‰Paper www.gaga.gallery/static/pdf/Gaga.pdf
๐Ÿ‘‰Project www.gaga.gallery/
๐Ÿ‘‰Repo github.com/weijielyu/Gaga
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽFree Book: LLM Foundations๐ŸŽ

๐Ÿ‘‰A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

โœ…Chapter 1: basics of pre-training
โœ…Chapter 2: gen-models & LLMs
โœ…Chapter 3: prompting methods
โœ…Chapter 4: alignment methods

๐Ÿ‘‰If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

๐Ÿ‘‰Review https://t.ly/9LGCa
๐Ÿ‘‰Book https://lnkd.in/d3VkswZf
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ„โ€โ™€๏ธ GSTAR: Gaussian Surface Tracking ๐Ÿ„โ€โ™€๏ธ

๐Ÿ‘‰ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/udpMq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10283
๐Ÿ‘‰Project chengwei-zheng.github.io/GSTAR/
๐Ÿ‘‰Repo TBA
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฝ Diffusion Video Inpainting ๐Ÿงฝ

๐Ÿ‘‰#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7rEll
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10018
๐Ÿ‘‰Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Ÿ‘‰Repo github.com/lixiaowen-xw/DiffuEraser
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ #Nvidia Foundation ZS-Stereo ๐ŸŒˆ

๐Ÿ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rfBr5
๐Ÿ‘‰Paper arxiv.org/pdf/2501.09898
๐Ÿ‘‰Project nvlabs.github.io/FoundationStereo/
๐Ÿ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ [SOTA] Long-Video Depth Anything ๐Ÿ”ฅ

๐Ÿ‘‰ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q4ZZd
๐Ÿ‘‰Paper arxiv.org/pdf/2501.12375
๐Ÿ‘‰Project https://lnkd.in/dKNwJzbM
๐Ÿ‘‰Repo https://lnkd.in/ddfwwpCj
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตTime-Aware Pts-Tracking๐Ÿงต

๐Ÿ‘‰Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XAL7G
๐Ÿ‘‰Paper arxiv.orgzpdf/2501.12218
๐Ÿ‘‰Project cvlab-kaist.github.io/Chrono/
๐Ÿ‘‰Repo github.com/cvlab-kaist/Chrono
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽคEMO2: Audio-Driven Avatar๐ŸŽค

๐Ÿ‘‰Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/x8slQ
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10687
๐Ÿ‘‰Project humanaigc.github.io/emote-portrait-alive-2/
๐Ÿ‘‰Repo ๐Ÿฅบ
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ A-Life with Foundation Models๐Ÿฆ 

๐Ÿ‘‰A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7SZ8A
๐Ÿ‘‰Paper arxiv.org/pdf/2412.17799
๐Ÿ‘‰Project http://pub.sakana.ai/asal/
๐Ÿ‘‰Repo https://lnkd.in/dP5yxKtw