AI with Papers - Artificial Intelligence & Deep Learning
14.8K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
πŸ”₯ "Nuclear" AI vs. Hyper-Cheap Inference πŸ”₯

⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
🀲Portabile Training Workstation
35%
βš›οΈNuclear energy for AI training
33%
πŸ–²οΈCheaper Only-inference devices
9%
πŸ’°Cloud-intensive Only-inference
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ FIFA 3D Human Pose ⚽

πŸ‘‰#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released πŸ’™

πŸ‘‰Review https://t.ly/kvGVQ
πŸ‘‰Paper arxiv.org/pdf/2501.02771
πŸ‘‰Project https://lnkd.in/d5hFWpY2
πŸ‘‰Dataset https://lnkd.in/dAphJ9WA
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Depth Any Camera (SOTA) πŸ”₯

πŸ‘‰DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360β—¦). Code announced (not available yet)πŸ’™

πŸ‘‰Review https://t.ly/1qz4F
πŸ‘‰Paper arxiv.org/pdf/2501.02464
πŸ‘‰Project yuliangguo.github.io/depth-any-camera/
πŸ‘‰Repo github.com/yuliangguo/depth_any_camera
This media is not supported in your browser
VIEW IN TELEGRAM
❀️‍πŸ”₯ Uncommon object in #3D ❀️‍πŸ”₯

πŸ‘‰#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360β—¦ coverage. Code & data under CCA 4.0πŸ’™

πŸ‘‰Review https://t.ly/Z_tvA
πŸ‘‰Paper https://arxiv.org/pdf/2501.07574
πŸ‘‰Project https://uco3d.github.io/
πŸ‘‰Repo github.com/facebookresearch/uco3d
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ†Universal Detector-Free MatchπŸ†

πŸ‘‰MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released πŸ’™

πŸ‘‰Review https://t.ly/sx92L
πŸ‘‰Paper https://lnkd.in/dWwRwGyY
πŸ‘‰Project https://lnkd.in/dCwb2Yte
πŸ‘‰Repo https://lnkd.in/dnUXYzQ5
πŸ†˜ Help: Looking for Outstanding Speakers πŸ†˜

πŸ‘‰Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only β€œhardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™‚οΈOmni-RGPT: SOTA MLLM UnderstandingπŸ§žβ€β™‚οΈ

πŸ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

πŸ‘‰Review https://t.ly/KHnQ7
πŸ‘‰Paper arxiv.org/pdf/2501.08326
πŸ‘‰Project miranheo.github.io/omni-rgpt/
πŸ‘‰Repo TBA soon
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ GAGA: Group Any Gaussians πŸ”₯

πŸ‘‰GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updatedπŸ’™

πŸ‘‰Review https://t.ly/Nk_jT
πŸ‘‰Paper www.gaga.gallery/static/pdf/Gaga.pdf
πŸ‘‰Project www.gaga.gallery/
πŸ‘‰Repo github.com/weijielyu/Gaga
This media is not supported in your browser
VIEW IN TELEGRAM
🎁Free Book: LLM Foundations🎁

πŸ‘‰A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

βœ…Chapter 1: basics of pre-training
βœ…Chapter 2: gen-models & LLMs
βœ…Chapter 3: prompting methods
βœ…Chapter 4: alignment methods

πŸ‘‰If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

πŸ‘‰Review https://t.ly/9LGCa
πŸ‘‰Book https://lnkd.in/d3VkswZf
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ„β€β™€οΈ GSTAR: Gaussian Surface Tracking πŸ„β€β™€οΈ

πŸ‘‰ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/udpMq
πŸ‘‰Paper arxiv.org/pdf/2501.10283
πŸ‘‰Project chengwei-zheng.github.io/GSTAR/
πŸ‘‰Repo TBA
This media is not supported in your browser
VIEW IN TELEGRAM
🧽 Diffusion Video Inpainting 🧽

πŸ‘‰#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under ApacheπŸ’™

πŸ‘‰Review https://t.ly/7rEll
πŸ‘‰Paper arxiv.org/pdf/2501.10018
πŸ‘‰Project lixiaowen-xw.github.io/DiffuEraser-page/
πŸ‘‰Repo github.com/lixiaowen-xw/DiffuEraser
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

πŸ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπŸ’™

πŸ‘‰Review https://t.ly/rfBr5
πŸ‘‰Paper arxiv.org/pdf/2501.09898
πŸ‘‰Project nvlabs.github.io/FoundationStereo/
πŸ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ [SOTA] Long-Video Depth Anything πŸ”₯

πŸ‘‰ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/Q4ZZd
πŸ‘‰Paper arxiv.org/pdf/2501.12375
πŸ‘‰Project https://lnkd.in/dKNwJzbM
πŸ‘‰Repo https://lnkd.in/ddfwwpCj
This media is not supported in your browser
VIEW IN TELEGRAM
🧡Time-Aware Pts-Tracking🧡

πŸ‘‰Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/XAL7G
πŸ‘‰Paper arxiv.orgzpdf/2501.12218
πŸ‘‰Project cvlab-kaist.github.io/Chrono/
πŸ‘‰Repo github.com/cvlab-kaist/Chrono
This media is not supported in your browser
VIEW IN TELEGRAM
🎀EMO2: Audio-Driven Avatar🎀

πŸ‘‰Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code πŸ₯Ί

πŸ‘‰Review https://t.ly/x8slQ
πŸ‘‰Paper arxiv.org/pdf/2501.10687
πŸ‘‰Project humanaigc.github.io/emote-portrait-alive-2/
πŸ‘‰Repo πŸ₯Ί
This media is not supported in your browser
VIEW IN TELEGRAM
🦠A-Life with Foundation Models🦠

πŸ‘‰A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/7SZ8A
πŸ‘‰Paper arxiv.org/pdf/2412.17799
πŸ‘‰Project http://pub.sakana.ai/asal/
πŸ‘‰Repo https://lnkd.in/dP5yxKtw
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ The code of DynOMo is out πŸ”₯

πŸ‘‰DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input

πŸ‘‰Review https://t.ly/t5pCf
πŸ‘‰Paper https://lnkd.in/dwhzz4_t
πŸ‘‰Repo github.com/dvl-tum/DynOMo
πŸ‘‰Project https://lnkd.in/dMyku2HW
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ†SOTA Points SegmentationπŸͺ†

πŸ‘‰VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!

πŸ‘‰Review https://t.ly/8Bsbt
πŸ‘‰Paper https://arxiv.org/pdf/2501.12392
πŸ‘‰Code https://github.com/karazijal/lrtl
πŸ‘‰Project www.robots.ox.ac.uk/~vgg/research/lrtl/
This media is not supported in your browser
VIEW IN TELEGRAM
🎨MatAnyone: Human Matting🎨

πŸ‘‰MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & πŸ€—-Demo announcedπŸ’™

πŸ‘‰Review https://t.ly/NVXsT
πŸ‘‰Paper arxiv.org/pdf/2501.14677
πŸ‘‰Project pq-yang.github.io/projects/MatAnyone
πŸ‘‰Repo TBA
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦•[SOTA] Visual Grounding VOSπŸ¦•

πŸ‘‰ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soonπŸ’™

πŸ‘‰Review https://t.ly/SDFy9
πŸ‘‰Paper arxiv.org/pdf/2501.14607
πŸ‘‰Project isee-laboratory.github.io/ReferDINO/
πŸ‘‰Repo github.com/iSEE-Laboratory/ReferDINO