AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
🔥103🍾21👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 GAGA: Group Any Gaussians 🔥

👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙

👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga
🔥113👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁Free Book: LLM Foundations🎁

👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

Chapter 1: basics of pre-training
Chapter 2: gen-models & LLMs
Chapter 3: prompting methods
Chapter 4: alignment methods

👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf
17🔥6👏3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🏄‍♀️ GSTAR: Gaussian Surface Tracking 🏄‍♀️

👉ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced💙

👉Review https://t.ly/udpMq
👉Paper arxiv.org/pdf/2501.10283
👉Project chengwei-zheng.github.io/GSTAR/
👉Repo TBA
🔥8🤩3👍2😍21🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧽 Diffusion Video Inpainting 🧽

👉#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache💙

👉Review https://t.ly/7rEll
👉Paper arxiv.org/pdf/2501.10018
👉Project lixiaowen-xw.github.io/DiffuEraser-page/
👉Repo github.com/lixiaowen-xw/DiffuEraser
🔥143👍21👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙

👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master
6🔥6🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 [SOTA] Long-Video Depth Anything 🔥

👉ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0💙

👉Review https://t.ly/Q4ZZd
👉Paper arxiv.org/pdf/2501.12375
👉Project https://lnkd.in/dKNwJzbM
👉Repo https://lnkd.in/ddfwwpCj
🔥9🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧵Time-Aware Pts-Tracking🧵

👉Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced💙

👉Review https://t.ly/XAL7G
👉Paper arxiv.orgzpdf/2501.12218
👉Project cvlab-kaist.github.io/Chrono/
👉Repo github.com/cvlab-kaist/Chrono
5🔥5👍3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🎤EMO2: Audio-Driven Avatar🎤

👉Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code 🥺

👉Review https://t.ly/x8slQ
👉Paper arxiv.org/pdf/2501.10687
👉Project humanaigc.github.io/emote-portrait-alive-2/
👉Repo 🥺
6🤯6👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠A-Life with Foundation Models🦠

👉A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0💙

👉Review https://t.ly/7SZ8A
👉Paper arxiv.org/pdf/2412.17799
👉Project http://pub.sakana.ai/asal/
👉Repo https://lnkd.in/dP5yxKtw
112🤩2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 The code of DynOMo is out 🔥

👉DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input

👉Review https://t.ly/t5pCf
👉Paper https://lnkd.in/dwhzz4_t
👉Repo github.com/dvl-tum/DynOMo
👉Project https://lnkd.in/dMyku2HW
🔥75😍5👍2🤩2🍾21
This media is not supported in your browser
VIEW IN TELEGRAM
🪆SOTA Points Segmentation🪆

👉VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!

👉Review https://t.ly/8Bsbt
👉Paper https://arxiv.org/pdf/2501.12392
👉Code https://github.com/karazijal/lrtl
👉Project www.robots.ox.ac.uk/~vgg/research/lrtl/
🔥32🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎨MatAnyone: Human Matting🎨

👉MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & 🤗-Demo announced💙

👉Review https://t.ly/NVXsT
👉Paper arxiv.org/pdf/2501.14677
👉Project pq-yang.github.io/projects/MatAnyone
👉Repo TBA
15👏2🤩2👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🦕[SOTA] Visual Grounding VOS🦕

👉ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon💙

👉Review https://t.ly/SDFy9
👉Paper arxiv.org/pdf/2501.14607
👉Project isee-laboratory.github.io/ReferDINO/
👉Repo github.com/iSEE-Laboratory/ReferDINO
🤯41🔥1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
☀️ Relightable Full-Body Avatars ☀️

👉#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.

👉Review https://t.ly/kx9gf
👉Paper arxiv.org/pdf/2501.14726
👉Project neuralbodies.github.io/RFGCA
3👍3🔥31🤯1😢1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌅 Generative Human Mesh Recovery 🌅

👉GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced 🥺

👉Review https://t.ly/Rrzpj
👉Paper https://arxiv.org/pdf/2412.14444
👉Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
🔥6👍21🤯1🍾1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
🛑 STOP posting about!
63%
🟩 Keep posting. we want more!
👍1
💎AI-driven Docs Conversion💎

👉Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT💙

👉Review https://t.ly/nSCfT
👉Paper https://lnkd.in/dc5Kpc2F
👉Repo https://lnkd.in/d9gvw9bt
18👍8🔥1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🈯 SOTA 0-Shot Multi-View 🈯

👉MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced 💙

👉Review https://t.ly/_ecKl
👉Paper arxiv.org/pdf/2501.18804
👉Project mvgd.github.io/
👉Repo TBA
🔥81😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🐙MambaGlue: SOTA feats. matching🐙

👉MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released💙

👉Review https://shorturl.at/LxDG1
👉Paper arxiv.org/pdf/2502.00462
👉Repo https://lnkd.in/dAujfGZQ
🤩93🔥2👏2👍1🍾1