AI with Papers - Artificial Intelligence & Deep Learning – Telegram

AI with Papers - Artificial Intelligence & Deep Learning

@AI_DeepLearning

15K subscribers

95 photos

237 videos

11 files

1.27K links

All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

Download Telegram

About

Blog

Apps

Platform

AI with Papers - Artificial Intelligence & Deep Learning

15K subscribers

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon

🔥10❤3🍾2⚡1👍1👏1

7.56K viewsedited 07:55

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 GAGA: Group Any Gaussians 🔥

👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙

👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga

🔥11❤3👍2🤩1

7.44K views13:09

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🎁Free Book: LLM Foundations🎁

👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

✅Chapter 1: basics of pre-training
✅Chapter 2: gen-models & LLMs
✅Chapter 3: prompting methods
✅Chapter 4: alignment methods

👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf

❤17🔥6👏3😍1

7.66K viewsedited 08:03

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🏄‍♀️ GSTAR: Gaussian Surface Tracking 🏄‍♀️

👉ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced💙

👉Review https://t.ly/udpMq
👉Paper arxiv.org/pdf/2501.10283
👉Project chengwei-zheng.github.io/GSTAR/
👉Repo TBA

🔥8🤩3👍2😍2❤1🤯1

6.87K viewsedited 13:19

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧽 Diffusion Video Inpainting 🧽

👉#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache💙

👉Review https://t.ly/7rEll
👉Paper arxiv.org/pdf/2501.10018
👉Project lixiaowen-xw.github.io/DiffuEraser-page/
👉Repo github.com/lixiaowen-xw/DiffuEraser

🔥14❤3👍2⚡1👏1

14.9K views09:26

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌈 #Nvidia Foundation ZS-Stereo 🌈

👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙

👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master

❤6🔥6🤩1

7.01K views07:01

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 [SOTA] Long-Video Depth Anything 🔥

👉ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0💙

👉Review https://t.ly/Q4ZZd
👉Paper arxiv.org/pdf/2501.12375
👉Project https://lnkd.in/dKNwJzbM
👉Repo https://lnkd.in/ddfwwpCj

🔥9🤯1

7.06K views11:38

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧵Time-Aware Pts-Tracking🧵

👉Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced💙

👉Review https://t.ly/XAL7G
👉Paper arxiv.orgzpdf/2501.12218
👉Project cvlab-kaist.github.io/Chrono/
👉Repo github.com/cvlab-kaist/Chrono

❤5🔥5👍3😍1

6.67K viewsedited 07:37

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🎤EMO2: Audio-Driven Avatar🎤

👉Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code 🥺

👉Review https://t.ly/x8slQ
👉Paper arxiv.org/pdf/2501.10687
👉Project humanaigc.github.io/emote-portrait-alive-2/
👉Repo 🥺

❤6🤯6👍2🤩1

7.05K viewsedited 10:24

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦠A-Life with Foundation Models🦠

👉A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0💙

👉Review https://t.ly/7SZ8A
👉Paper arxiv.org/pdf/2412.17799
👉Project http://pub.sakana.ai/asal/
👉Repo https://lnkd.in/dP5yxKtw

❤11⚡2🤩2

6.65K views14:08

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 The code of DynOMo is out 🔥

👉DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input

👉Review https://t.ly/t5pCf
👉Paper https://lnkd.in/dwhzz4_t
👉Repo github.com/dvl-tum/DynOMo
👉Project https://lnkd.in/dMyku2HW

🔥7❤5😍5👍2🤩2🍾2⚡1

7.95K viewsedited 07:49

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪆SOTA Points Segmentation🪆

👉VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!

👉Review https://t.ly/8Bsbt
👉Paper https://arxiv.org/pdf/2501.12392
👉Code https://github.com/karazijal/lrtl
👉Project www.robots.ox.ac.uk/~vgg/research/lrtl/

🔥3❤2🤩1

7.7K viewsedited 12:57

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🎨MatAnyone: Human Matting🎨

👉MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & 🤗-Demo announced💙

👉Review https://t.ly/NVXsT
👉Paper arxiv.org/pdf/2501.14677
👉Project pq-yang.github.io/projects/MatAnyone
👉Repo TBA

❤15👏2🤩2👍1🔥1

7.27K views09:50

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦕[SOTA] Visual Grounding VOS🦕

👉ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon💙

👉Review https://t.ly/SDFy9
👉Paper arxiv.org/pdf/2501.14607
👉Project isee-laboratory.github.io/ReferDINO/
👉Repo github.com/iSEE-Laboratory/ReferDINO

🤯4❤1🔥1🤩1

6.95K viewsedited 13:52

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

☀️ Relightable Full-Body Avatars ☀️

👉#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.

👉Review https://t.ly/kx9gf
👉Paper arxiv.org/pdf/2501.14726
👉Project neuralbodies.github.io/RFGCA

❤3👍3🔥3⚡1🤯1😢1🤩1

7.33K viewsedited 07:50

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌅 Generative Human Mesh Recovery 🌅

👉GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced 🥺

👉Review https://t.ly/Rrzpj
👉Paper https://arxiv.org/pdf/2412.14444
👉Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html

🔥6👍2❤1🤯1🍾1

8.46K views07:54

AI with Papers - Artificial Intelligence & Deep Learning

Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:

Anonymous Poll

🛑 STOP posting about!

🟩 Keep posting. we want more!

👍1

410 voters8.19K views16:05

AI with Papers - Artificial Intelligence & Deep Learning

💎AI-driven Docs Conversion💎

👉Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT💙

👉Review https://t.ly/nSCfT
👉Paper https://lnkd.in/dc5Kpc2F
👉Repo https://lnkd.in/d9gvw9bt

❤18👍8🔥1🍾1

8.18K viewsedited 07:15

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🈯 SOTA 0-Shot Multi-View 🈯

👉MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced 💙

👉Review https://t.ly/_ecKl
👉Paper arxiv.org/pdf/2501.18804
👉Project mvgd.github.io/
👉Repo TBA

🔥8❤1😍1

7.9K viewsedited 12:54

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐙MambaGlue: SOTA feats. matching🐙

👉MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released💙

👉Review https://shorturl.at/LxDG1
👉Paper arxiv.org/pdf/2502.00462
👉Repo https://lnkd.in/dAujfGZQ

🤩9❤3🔥2👏2👍1🍾1

7.72K viewsedited 08:10