AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ #Apple Co-Motion is out! πŸ”₯

πŸ‘‰Apple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for researchπŸ’™

πŸ‘‰Review https://t.ly/-86CO
πŸ‘‰Paper https://lnkd.in/dQsVGY7q
πŸ‘‰Repo https://lnkd.in/dh7j7N89
πŸ‘7🀣6❀5πŸ”₯2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊TAP in Persistent 3D Geometry🧊

πŸ‘‰TAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under ApacheπŸ’™

πŸ‘‰Review https://t.ly/oooMy
πŸ‘‰Paper https://lnkd.in/d8uqjdE4
πŸ‘‰Project https://tapip3d.github.io/
πŸ‘‰Repo https://lnkd.in/dsvHP_8u
πŸ”₯7❀2😍2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

πŸ‘‰Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on πŸ€—

πŸ‘‰Review https://t.ly/la4JD
πŸ‘‰Paper https://lnkd.in/dZh82xtV
πŸ‘‰Project https://lnkd.in/dcv9V2ZF
πŸ‘‰Repo https://lnkd.in/dJB9Ehtb
πŸ€—Demo https://lnkd.in/dXDb2MWU
πŸ”₯10πŸ‘5❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Moving Points -> DepthπŸ“

πŸ‘‰KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be releasedπŸ’™

πŸ‘‰Review https://t.ly/qA2P5
πŸ‘‰Paper https://lnkd.in/dpXDaQtM
πŸ‘‰Project https://lnkd.in/d9qWYsjP
πŸ‘‰Repo https://lnkd.in/dZEMDiJh
❀8πŸ”₯3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌼SOTA Textured 3D-Guided VTON🌼

πŸ‘‰#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπŸ’™

πŸ‘‰Review https://t.ly/0tjdC
πŸ‘‰Paper https://lnkd.in/dFseYSXz
πŸ‘‰Project https://lnkd.in/djtqzrzs
πŸ‘‰Repo TBA
🀯9πŸ‘7❀4πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

πŸ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπŸ’™

πŸ‘‰Review https://t.ly/wrcb0
πŸ‘‰Paper https://lnkd.in/dycGjAyy
πŸ‘‰Project https://lnkd.in/dDZ2Ej_Q
πŸ€—Data https://lnkd.in/d8yUSB7m
πŸ”₯4πŸ‘2❀1🀯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ S3MOT: SOTA 3D MOT πŸ”₯

πŸ‘‰S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/H_JPv
πŸ‘‰Paper https://arxiv.org/pdf/2504.18068
πŸ‘‰Repo https://github.com/bytepioneerX/s3mot
πŸ”₯7😍2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Diffusion Model <-> Depth πŸ”₯

πŸ‘‰ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo availableπŸ’™

πŸ‘‰Review https://t.ly/sP9ma
πŸ‘‰Paper arxiv.org/pdf/2411.19189
πŸ‘‰Project rollingdepth.github.io/
πŸ‘‰Repo github.com/prs-eth/rollingdepth
πŸ€—Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
❀12πŸ”₯6πŸ‘3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩷Dance vs. #ComputerVision🩷

πŸ‘‰The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo releasedπŸ’™

πŸ‘‰Review https://t.ly/JEdM3
πŸ‘‰Paper arxiv.org/pdf/2505.07249
πŸ‘‰Project https://lnkd.in/dD5dsMv5
❀9πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™€οΈGENMO: Generalist Human Motion πŸ§žβ€β™€οΈ

πŸ‘‰#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπŸ₯²

πŸ‘‰Review https://t.ly/Q5T_Y
πŸ‘‰Paper https://lnkd.in/ds36BY49
πŸ‘‰Project https://lnkd.in/dAYHhuFU
πŸ”₯13❀3πŸ‘2😒1😍1
Dear friends,
I’m truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.

I’m going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I don’t have right now. I’m sorry, be patient. I’ll be back.

Love u all,
Alessandro.
❀392πŸ‘28😒27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.

I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.

Thanks again, you rock!
Alessandro.
1❀182πŸ‘16πŸ”₯14πŸ‘5😒2🍾2πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦– DINOv3 is out πŸ¦–

πŸ‘‰#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 LicenseπŸ’™

πŸ‘‰Review https://t.ly/-S3ZL
πŸ‘‰Paper https://t.ly/ervOT
πŸ‘‰Project https://lnkd.in/dHFf3esd
πŸ‘‰Repo https://lnkd.in/dPxhDxAq
πŸ€—HF https://lnkd.in/dWGudY2i
❀37πŸ”₯11πŸ‘2😍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€– Impact of SuperHuman AI πŸ€–

πŸ‘‰The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy πŸ’™

πŸ‘‰Review https://t.ly/EgmfJ
πŸ‘‰Project https://ai-2027.com/
❀6🀯2πŸ”₯1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“TOTNet: Occlusion-aware TrackingπŸ“

πŸ‘‰TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπŸ’™

πŸ‘‰Review https://t.ly/Q0jAf
πŸ‘‰Paper https://lnkd.in/dUYsa-GC
πŸ‘‰Repo https://lnkd.in/d3QGUHYb
πŸ”₯9❀4πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”€Feed-Forward 4D videoπŸ”€

πŸ‘‰4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced πŸ’™

πŸ‘‰Review https://t.ly/SpkD-
πŸ‘‰Paper arxiv.org/pdf/2508.13154
πŸ‘‰Project https://4dnex.github.io/
πŸ‘‰Repo github.com/3DTopia/4DNeX
πŸ‘‰Data https://lnkd.in/dh4_3Ghf
πŸ‘‰Demo https://lnkd.in/dztyzwgg
❀8πŸ”₯6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈DAViD: Synthetic Depth-Normal-Segmentation🌈

πŸ‘‰#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπŸ’™

πŸ‘‰Review https://t.ly/-SlO_
πŸ‘‰Paper https://lnkd.in/eCmMXpTg
πŸ‘‰Project https://lnkd.in/eurCSWkm
πŸ‘‰Repo https://lnkd.in/e7PWFgP2
πŸ‘6❀3πŸ”₯2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘  OmniTry: Virtual Try-On Anything πŸ‘ 

πŸ‘‰OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπŸ’™

πŸ‘‰Review https://t.ly/wMBGQ
πŸ‘‰Paper https://lnkd.in/dQe9MchS
πŸ‘‰Project https://omnitry.github.io/
πŸ‘‰Repo https://lnkd.in/d3QwAXY2
πŸ€—Demo https://lnkd.in/duUcZpVA
πŸ”₯11❀4😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“‘ ROVR Open Dataset is out πŸ“‘

πŸ‘‰A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπŸ’™

πŸ‘‰Review https://t.ly/iDcvg
πŸ‘‰Paper https://arxiv.org/pdf/2508.13977
πŸ‘‰Project https://xiandaguo.net/ROVR-Open-Dataset
❀10πŸ”₯3