AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
98 photos
243 videos
12 files
1.29K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“TOTNet: Occlusion-aware TrackingπŸ“

πŸ‘‰TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπŸ’™

πŸ‘‰Review https://t.ly/Q0jAf
πŸ‘‰Paper https://lnkd.in/dUYsa-GC
πŸ‘‰Repo https://lnkd.in/d3QGUHYb
πŸ”₯10❀5πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”€Feed-Forward 4D videoπŸ”€

πŸ‘‰4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced πŸ’™

πŸ‘‰Review https://t.ly/SpkD-
πŸ‘‰Paper arxiv.org/pdf/2508.13154
πŸ‘‰Project https://4dnex.github.io/
πŸ‘‰Repo github.com/3DTopia/4DNeX
πŸ‘‰Data https://lnkd.in/dh4_3Ghf
πŸ‘‰Demo https://lnkd.in/dztyzwgg
❀9πŸ”₯7πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈DAViD: Synthetic Depth-Normal-Segmentation🌈

πŸ‘‰#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπŸ’™

πŸ‘‰Review https://t.ly/-SlO_
πŸ‘‰Paper https://lnkd.in/eCmMXpTg
πŸ‘‰Project https://lnkd.in/eurCSWkm
πŸ‘‰Repo https://lnkd.in/e7PWFgP2
πŸ‘7❀4πŸ”₯2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘  OmniTry: Virtual Try-On Anything πŸ‘ 

πŸ‘‰OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπŸ’™

πŸ‘‰Review https://t.ly/wMBGQ
πŸ‘‰Paper https://lnkd.in/dQe9MchS
πŸ‘‰Project https://omnitry.github.io/
πŸ‘‰Repo https://lnkd.in/d3QwAXY2
πŸ€—Demo https://lnkd.in/duUcZpVA
πŸ”₯15❀4😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“‘ ROVR Open Dataset is out πŸ“‘

πŸ‘‰A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπŸ’™

πŸ‘‰Review https://t.ly/iDcvg
πŸ‘‰Paper https://arxiv.org/pdf/2508.13977
πŸ‘‰Project https://xiandaguo.net/ROVR-Open-Dataset
❀12πŸ”₯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§‰ YOPO: SOTA 9-DoF PoseπŸ§‰

πŸ‘‰Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)πŸ’™

πŸ‘‰Review https://t.ly/cf_Cl
πŸ‘‰Paper https://arxiv.org/pdf/2508.14965
πŸ‘‰Project mikigom.github.io/YOPO-project-page/
πŸ‘‰Repo TBA
❀7πŸ”₯1🀩1
πŸ”¬Intern-S1: SOTA MM-MoE πŸ”¬

πŸ‘‰InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/3l5UW
πŸ‘‰Paper arxiv.org/pdf/2508.15763
πŸ‘‰Repo github.com/InternLM/Intern-S1
πŸ€—HF huggingface.co/internlm/Intern-S1
❀6πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«”ATLAS: SOTA Human ModelπŸ«”

πŸ‘‰#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπŸ’™

πŸ‘‰Review https://t.ly/0hHud
πŸ‘‰Paper arxiv.org/pdf/2508.15767
πŸ‘‰Project jindapark.github.io/projects/atlas/
πŸ‘‰Repo TBA
❀7πŸ”₯7πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀Diffusive Hand from Signs🧀

πŸ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released πŸ’™

πŸ‘‰Review https://t.ly/HonX_
πŸ‘‰Paper https://arxiv.org/pdf/2508.15902
πŸ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πŸ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πŸ‘‰Repo TBA
❀3πŸ”₯3πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🏎️ VROOM: F1 Reconstruction 🏎️

πŸ‘‰Berkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπŸ’™

πŸ‘‰Review https://t.ly/uuHdT
πŸ‘‰Paper arxiv.org/pdf/2508.17172
πŸ‘‰Repo github.com/yajatyadav/vroom
πŸ‘‰Project varun-bharadwaj.github.io/vroom/
1❀18πŸ”₯5πŸ‘1
ezgif-8120c4563e81c3.mp4
510.6 KB
πŸ₯Ά OmniHuman-1.5 πŸ₯Ά

πŸ‘‰#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code πŸ₯Ί

πŸ‘‰Review https://t.ly/CnRmX
πŸ‘‰Paper arxiv.org/pdf/2508.19209
πŸ‘‰Project omnihuman-lab.github.io/v1_5/
πŸ‘‰Repo πŸ₯Ί
❀4🀯2πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽SoccerNet 2025 results!⚽

πŸ‘‰SoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available πŸ’™

πŸ‘‰Review https://t.ly/MfHKg
πŸ‘‰Paper https://arxiv.org/pdf/2508.19182
πŸ‘‰Project https://www.soccer-net.org/
πŸ‘‰Repo https://github.com/SoccerNet
❀14πŸ”₯6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹ROSE: Remove Objects & Effects🌹

πŸ‘‰Fix the object’s effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging FaceπŸ’™

πŸ‘‰Review https://t.ly/_KFM0
πŸ‘‰Paper https://lnkd.in/dNcTXQAE
πŸ‘‰Project https://lnkd.in/dFGmYT5h
πŸ‘‰Model https://lnkd.in/dhTT-VkN
πŸ‘‰Demo https://lnkd.in/dimgXZT6
πŸ‘‰Data https://lnkd.in/da7Jv667
❀15πŸ‘3😍2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‰ Dress-up & Dance πŸ‰

πŸ‘‰Novel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ—720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπŸ₯Ί

πŸ‘‰Review https://t.ly/7NeTL
πŸ‘‰Paper arxiv.org/pdf/2508.21070
πŸ‘‰Project immortalco.github.io/DressAndDance/
πŸ‘‰Repo πŸ₯Ί
❀7πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 Multi-View 3D Tracking 🌈

πŸ‘‰MVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπŸ’™

πŸ‘‰Review https://t.ly/rISMR
πŸ‘‰Paper arxiv.org/pdf/2508.21060
πŸ‘‰Project https://lnkd.in/drHtAmRC
πŸ‘‰Repo https://lnkd.in/d4k8mg3B
❀10πŸ”₯5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
❀️‍πŸ”₯PHD: Personalized 3D Humans❀️‍πŸ”₯

πŸ‘‰ETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπŸ’™

πŸ‘‰Review https://t.ly/IeRhH
πŸ‘‰Paper https://arxiv.org/pdf/2508.21257
πŸ‘‰Project https://phd-pose.github.io/
πŸ‘‰Repo TBA
❀7πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ΄ Pixie: Physics from Pixels πŸͺ΄

πŸ‘‰UPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling real‑time physics simulations. Repo & Dataset under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/1W0n5
πŸ‘‰Paper https://lnkd.in/dsHAHDqM
πŸ‘‰Project https://lnkd.in/dwrHRbRc
πŸ‘‰Repo https://lnkd.in/dy7bvjsK
❀5πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«›TMR: Few-Shot Template-matchingπŸ«›

πŸ‘‰POSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπŸ’™

πŸ‘‰Review https://t.ly/WWAcL
πŸ‘‰Paper https://lnkd.in/dJbSu5vk
πŸ‘‰Project https://lnkd.in/dwcDnHHQ
πŸ‘‰Repo https://lnkd.in/dp7aw8Cs
πŸ”₯5❀3πŸ‘1
🧬 OpenVision 2 is out! 🧬

πŸ‘‰UCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/Oma3w
πŸ‘‰Paper https://arxiv.org/pdf/2509.01644
πŸ‘‰Project https://ucsc-vlaa.github.io/OpenVision2/
πŸ‘‰Repo https://github.com/UCSC-VLAA/OpenVision
πŸ”₯7❀1πŸ‘1