AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ„โ€โ™€๏ธ GSTAR: Gaussian Surface Tracking ๐Ÿ„โ€โ™€๏ธ

๐Ÿ‘‰ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/udpMq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10283
๐Ÿ‘‰Project chengwei-zheng.github.io/GSTAR/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8๐Ÿคฉ3๐Ÿ‘2๐Ÿ˜2โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฝ Diffusion Video Inpainting ๐Ÿงฝ

๐Ÿ‘‰#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7rEll
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10018
๐Ÿ‘‰Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Ÿ‘‰Repo github.com/lixiaowen-xw/DiffuEraser
๐Ÿ”ฅ14โค3๐Ÿ‘2โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ #Nvidia Foundation ZS-Stereo ๐ŸŒˆ

๐Ÿ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rfBr5
๐Ÿ‘‰Paper arxiv.org/pdf/2501.09898
๐Ÿ‘‰Project nvlabs.github.io/FoundationStereo/
๐Ÿ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐Ÿ”ฅ6๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ [SOTA] Long-Video Depth Anything ๐Ÿ”ฅ

๐Ÿ‘‰ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q4ZZd
๐Ÿ‘‰Paper arxiv.org/pdf/2501.12375
๐Ÿ‘‰Project https://lnkd.in/dKNwJzbM
๐Ÿ‘‰Repo https://lnkd.in/ddfwwpCj
๐Ÿ”ฅ9๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตTime-Aware Pts-Tracking๐Ÿงต

๐Ÿ‘‰Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XAL7G
๐Ÿ‘‰Paper arxiv.orgzpdf/2501.12218
๐Ÿ‘‰Project cvlab-kaist.github.io/Chrono/
๐Ÿ‘‰Repo github.com/cvlab-kaist/Chrono
โค5๐Ÿ”ฅ5๐Ÿ‘3๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽคEMO2: Audio-Driven Avatar๐ŸŽค

๐Ÿ‘‰Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/x8slQ
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10687
๐Ÿ‘‰Project humanaigc.github.io/emote-portrait-alive-2/
๐Ÿ‘‰Repo ๐Ÿฅบ
โค6๐Ÿคฏ6๐Ÿ‘2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ A-Life with Foundation Models๐Ÿฆ 

๐Ÿ‘‰A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7SZ8A
๐Ÿ‘‰Paper arxiv.org/pdf/2412.17799
๐Ÿ‘‰Project http://pub.sakana.ai/asal/
๐Ÿ‘‰Repo https://lnkd.in/dP5yxKtw
โค11โšก2๐Ÿคฉ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ The code of DynOMo is out ๐Ÿ”ฅ

๐Ÿ‘‰DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input

๐Ÿ‘‰Review https://t.ly/t5pCf
๐Ÿ‘‰Paper https://lnkd.in/dwhzz4_t
๐Ÿ‘‰Repo github.com/dvl-tum/DynOMo
๐Ÿ‘‰Project https://lnkd.in/dMyku2HW
๐Ÿ”ฅ7โค5๐Ÿ˜5๐Ÿ‘2๐Ÿคฉ2๐Ÿพ2โšก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช†SOTA Points Segmentation๐Ÿช†

๐Ÿ‘‰VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!

๐Ÿ‘‰Review https://t.ly/8Bsbt
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.12392
๐Ÿ‘‰Code https://github.com/karazijal/lrtl
๐Ÿ‘‰Project www.robots.ox.ac.uk/~vgg/research/lrtl/
๐Ÿ”ฅ3โค2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽจMatAnyone: Human Matting๐ŸŽจ

๐Ÿ‘‰MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & ๐Ÿค—-Demo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/NVXsT
๐Ÿ‘‰Paper arxiv.org/pdf/2501.14677
๐Ÿ‘‰Project pq-yang.github.io/projects/MatAnyone
๐Ÿ‘‰Repo TBA
โค15๐Ÿ‘2๐Ÿคฉ2๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ•[SOTA] Visual Grounding VOS๐Ÿฆ•

๐Ÿ‘‰ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SDFy9
๐Ÿ‘‰Paper arxiv.org/pdf/2501.14607
๐Ÿ‘‰Project isee-laboratory.github.io/ReferDINO/
๐Ÿ‘‰Repo github.com/iSEE-Laboratory/ReferDINO
๐Ÿคฏ4โค1๐Ÿ”ฅ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜€๏ธ Relightable Full-Body Avatars โ˜€๏ธ

๐Ÿ‘‰#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.

๐Ÿ‘‰Review https://t.ly/kx9gf
๐Ÿ‘‰Paper arxiv.org/pdf/2501.14726
๐Ÿ‘‰Project neuralbodies.github.io/RFGCA
โค3๐Ÿ‘3๐Ÿ”ฅ3โšก1๐Ÿคฏ1๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ… Generative Human Mesh Recovery ๐ŸŒ…

๐Ÿ‘‰GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced ๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/Rrzpj
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.14444
๐Ÿ‘‰Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
๐Ÿ”ฅ6๐Ÿ‘2โค1๐Ÿคฏ1๐Ÿพ1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
๐Ÿ›‘ STOP posting about!
63%
๐ŸŸฉ Keep posting. we want more!
๐Ÿ‘1
๐Ÿ’ŽAI-driven Docs Conversion๐Ÿ’Ž

๐Ÿ‘‰Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/nSCfT
๐Ÿ‘‰Paper https://lnkd.in/dc5Kpc2F
๐Ÿ‘‰Repo https://lnkd.in/d9gvw9bt
โค18๐Ÿ‘8๐Ÿ”ฅ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿˆฏ SOTA 0-Shot Multi-View ๐Ÿˆฏ

๐Ÿ‘‰MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_ecKl
๐Ÿ‘‰Paper arxiv.org/pdf/2501.18804
๐Ÿ‘‰Project mvgd.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ™MambaGlue: SOTA feats. matching๐Ÿ™

๐Ÿ‘‰MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://shorturl.at/LxDG1
๐Ÿ‘‰Paper arxiv.org/pdf/2502.00462
๐Ÿ‘‰Repo https://lnkd.in/dAujfGZQ
๐Ÿคฉ9โค3๐Ÿ”ฅ2๐Ÿ‘2๐Ÿ‘1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ธReal-Time Differentiable Tracing๐Ÿ›ธ

๐Ÿ‘‰ Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://shorturl.at/26U06
๐Ÿ‘‰Paper https://arxiv.org/pdf/2502.01157
๐Ÿ‘‰Project https://radfoam.github.io/
๐Ÿ‘‰Repo https://github.com/theialab/radfoam
๐Ÿ”ฅ7โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ VideoJAM: #META's Video-Model (SOTA) ๐Ÿ”ฅ

๐Ÿ‘‰#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced๐Ÿฅฒ

๐Ÿ‘‰Review https://shorturl.at/id7Bt
๐Ÿ‘‰Paper https://arxiv.org/pdf/2502.02492
๐Ÿ‘‰Project https://hila-chefer.github.io/videojam-paper.github.io/
๐Ÿ”ฅ9โค4๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘—3D Dynamic Garments๐Ÿ‘—

๐Ÿ‘‰UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.

๐Ÿ‘‰Review https://t.ly/qciHV
๐Ÿ‘‰Paper arxiv.org/pdf/2502.03449
๐Ÿ‘‰Project dress-1-to-3.github.io
๐Ÿ”ฅ8โค3๐Ÿ‘3๐Ÿ‘2๐Ÿคฉ1๐Ÿ˜1