AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗

👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU
🔥10👍51
This media is not supported in your browser
VIEW IN TELEGRAM
📍Moving Points -> Depth📍

👉KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released💙

👉Review https://t.ly/qA2P5
👉Paper https://lnkd.in/dpXDaQtM
👉Project https://lnkd.in/d9qWYsjP
👉Repo https://lnkd.in/dZEMDiJh
8🔥3👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌼SOTA Textured 3D-Guided VTON🌼

👉#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released💙

👉Review https://t.ly/0tjdC
👉Paper https://lnkd.in/dFseYSXz
👉Project https://lnkd.in/djtqzrzs
👉Repo TBA
🤯9👍74🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙

👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m
🔥4👍21🤯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 S3MOT: SOTA 3D MOT 🔥

👉S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license💙

👉Review https://t.ly/H_JPv
👉Paper https://arxiv.org/pdf/2504.18068
👉Repo https://github.com/bytepioneerX/s3mot
🔥7😍2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Diffusion Model <-> Depth 🔥

👉ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available💙

👉Review https://t.ly/sP9ma
👉Paper arxiv.org/pdf/2411.19189
👉Project rollingdepth.github.io/
👉Repo github.com/prs-eth/rollingdepth
🤗Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
12🔥6👍3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🩷Dance vs. #ComputerVision🩷

👉The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released💙

👉Review https://t.ly/JEdM3
👉Paper arxiv.org/pdf/2505.07249
👉Project https://lnkd.in/dD5dsMv5
9👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞‍♀️GENMO: Generalist Human Motion 🧞‍♀️

👉#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment🥲

👉Review https://t.ly/Q5T_Y
👉Paper https://lnkd.in/ds36BY49
👉Project https://lnkd.in/dAYHhuFU
🔥133👍2😢1😍1
Dear friends,
I’m truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.

I’m going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I don’t have right now. I’m sorry, be patient. I’ll be back.

Love u all,
Alessandro.
393👍28😢27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.

I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.

Thanks again, you rock!
Alessandro.
1185👍16🔥14👏5😢2🍾2💩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦖 DINOv3 is out 🦖

👉#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License💙

👉Review https://t.ly/-S3ZL
👉Paper https://t.ly/ervOT
👉Project https://lnkd.in/dHFf3esd
👉Repo https://lnkd.in/dPxhDxAq
🤗HF https://lnkd.in/dWGudY2i
38🔥11👍2😍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 Impact of SuperHuman AI 🤖

👉The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy 💙

👉Review https://t.ly/EgmfJ
👉Project https://ai-2027.com/
6🤯2🔥1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🏓TOTNet: Occlusion-aware Tracking🏓

👉TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT💙

👉Review https://t.ly/Q0jAf
👉Paper https://lnkd.in/dUYsa-GC
👉Repo https://lnkd.in/d3QGUHYb
🔥104👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔀Feed-Forward 4D video🔀

👉4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced 💙

👉Review https://t.ly/SpkD-
👉Paper arxiv.org/pdf/2508.13154
👉Project https://4dnex.github.io/
👉Repo github.com/3DTopia/4DNeX
👉Data https://lnkd.in/dh4_3Ghf
👉Demo https://lnkd.in/dztyzwgg
9🔥7👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈DAViD: Synthetic Depth-Normal-Segmentation🌈

👉#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT💙

👉Review https://t.ly/-SlO_
👉Paper https://lnkd.in/eCmMXpTg
👉Project https://lnkd.in/eurCSWkm
👉Repo https://lnkd.in/e7PWFgP2
👍64🔥2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
👠 OmniTry: Virtual Try-On Anything 👠

👉OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark released💙

👉Review https://t.ly/wMBGQ
👉Paper https://lnkd.in/dQe9MchS
👉Project https://omnitry.github.io/
👉Repo https://lnkd.in/d3QwAXY2
🤗Demo https://lnkd.in/duUcZpVA
🔥144😢1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
📡 ROVR Open Dataset is out 📡

👉A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercial💙

👉Review https://t.ly/iDcvg
👉Paper https://arxiv.org/pdf/2508.13977
👉Project https://xiandaguo.net/ROVR-Open-Dataset
11🔥3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧉 YOPO: SOTA 9-DoF Pose🧉

👉Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)💙

👉Review https://t.ly/cf_Cl
👉Paper https://arxiv.org/pdf/2508.14965
👉Project mikigom.github.io/YOPO-project-page/
👉Repo TBA
6🔥1🤩1
🔬Intern-S1: SOTA MM-MoE 🔬

👉InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0💙

👉Review https://t.ly/3l5UW
👉Paper arxiv.org/pdf/2508.15763
👉Repo github.com/InternLM/Intern-S1
🤗HF huggingface.co/internlm/Intern-S1
6🔥1