AI with Papers - Artificial Intelligence & Deep Learning
17.1K subscribers
159 photos
277 videos
14 files
1.45K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“±3D Human-Object ContactπŸ“±

πŸ‘‰Pi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/TAgG1
πŸ‘‰Paper https://arxiv.org/pdf/2604.12923
πŸ‘‰Project https://pi-hoc.github.io/
πŸ‘‰Repo https://github.com/SravanChittupalli/Pi-HOC
πŸ”₯3❀2πŸ‘2πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐞GCT 3D Reconstruction🐞

πŸ‘‰ANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 InternationalπŸ’™

πŸ‘‰Review https://t.ly/ExodA
πŸ‘‰Paper https://arxiv.org/pdf/2604.14141
πŸ‘‰Project https://arxiv.org/pdf/2604.14141
πŸ‘‰Repo github.com/robbyant/lingbot-map
πŸ”₯9❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘©β€πŸ¦°Deformable 3D HairπŸ‘©β€πŸ¦°

πŸ‘‰Xi’an Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/kWZdd
πŸ‘‰Paper https://arxiv.org/pdf/2604.14782
πŸ‘‰Project yuansun-xjtu.github.io/CompHairHead.io/
πŸ‘‰Repo yuansun-xjtu.github.io/CompHairHead.io/
❀6πŸ”₯3πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŒ—Mobile Ultra-detailed AvatarsπŸŒ—

πŸ‘‰Given skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/QPCy6
πŸ‘‰Paper https://arxiv.org/pdf/2604.18583
πŸ‘‰Project https://vcai.mpi-inf.mpg.de/projects/MUA/
πŸ‘‰Repo TBA
❀11πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎈Face Anything 4D (SOTA)🎈

πŸ‘‰A novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/zItie
πŸ‘‰Paper https://arxiv.org/pdf/2604.19702
πŸ‘‰Project kocasariumut.github.io/FaceAnything
πŸ‘‰Repo TBA
❀5πŸ”₯2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’™ PY4AI 2026: here we are! πŸ’™

πŸ‘‰The third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
❀10πŸ‘1🀯1😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›’ Reshoot-Anything is out πŸ›’

πŸ‘‰Reshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 πŸ’™

πŸ‘‰Review https://t.ly/MIqAc
πŸ‘‰Paper https://arxiv.org/pdf/2604.21776
πŸ‘‰Project adithyaiyer1999.github.io/reshoot-anything/
πŸ‘‰Repo github.com/morphicfilms/video-to-video
❀5πŸ”₯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§˜β€β™€οΈHolistic Shot Boundary DetectionπŸ§˜β€β™€οΈ

πŸ‘‰OmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmarkπŸ’™

πŸ‘‰Review https://t.ly/sTi7N
πŸ‘‰Paper https://arxiv.org/pdf/2604.24762
πŸ‘‰Project uva-computer-vision-lab.github.io/OmniShotCut_website/
πŸ‘‰Repo github.com/UVA-Computer-Vision-Lab/OmniShotCut
πŸ”₯6❀3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺSyn4D: Multiview Synthetic 4D DatasetπŸͺ

πŸ‘‰Syn4D is novel multi-view synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotationsπŸ’™

πŸ‘‰Review https://t.ly/SL1mk
πŸ‘‰Paper https://arxiv.org/pdf/2605.05207
πŸ‘‰Project https://jzr99.github.io/Syn4D/
πŸ‘‰Repo https://github.com/jzr99/Syn4D
πŸ‘‰Data huggingface.co/datasets/Syn4D/Syn4D_RGBD/tree/main
❀7πŸ”₯5πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦„Unified Correspondence TransformerπŸ¦„

πŸ‘‰UniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0πŸ’™

πŸ‘‰Review https://t.ly/2OBdq
πŸ‘‰Paper https://arxiv.org/pdf/2605.04044
πŸ‘‰Project https://neu-vi.github.io/UniCorrn/
πŸ‘‰Repo https://github.com/neu-vi/UniCorrn
πŸ‘5πŸ”₯5❀4🀯4πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’Count Anything, Any GranularityπŸ’

πŸ‘‰Open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under ApacheπŸ’™

πŸ‘‰Review https://t.ly/nqz80
πŸ‘‰Paper https://lnkd.in/dp7khTRU
πŸ‘‰Project https://lnkd.in/d_jfX_Yn
πŸ‘‰Repo https://lnkd.in/dkTRGZkG
πŸ‘‰Data https://lnkd.in/dB83jRyT
1❀15πŸ‘6πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ”Latent Decoding Pixel DiffusionπŸͺ”

πŸ‘‰PiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/y19mA
πŸ‘‰Paper https://lnkd.in/duVC25C2
πŸ‘‰Project https://lnkd.in/dW6TkzCB
πŸ‘‰Repo https://lnkd.in/dnGdgKRr
❀8πŸ”₯6πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ” Nvidia Locate Anything πŸ”

πŸ‘‰Diverse localization tasks under a unified vision-language model, including document understanding, GUI grounding, dense detection, and OCR. Repo releasedπŸ’™

πŸ‘‰Review https://t.ly/PvwFo
πŸ‘‰Paper https://lnkd.in/dWfNpzPZ
πŸ‘‰Project https://lnkd.in/dM89BX-8
πŸ‘‰Repo https://lnkd.in/dC4KCQSM
❀13πŸ”₯13πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ•·οΈHuman Universal GraspingπŸ•·οΈ

πŸ‘‰HUG is a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera.

πŸ‘‰Review https://t.ly/VG1Eu
πŸ‘‰Paper https://arxiv.org/pdf/2606.17054
πŸ‘‰Repo https://github.com/KevinyWu/hug
πŸ‘‰Project https://grasping.io/
❀10πŸ”₯4πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”ŠVolHuMe - Volumetric Human MeshesπŸ”Š

πŸ‘‰VolHuMe (H/T @Martinella_94) is a novel, high-resolution large-scale dataset of volumetric human meshes with complete 4D GT: multi-view RGB-D, textured meshes, dense point clouds, normal maps, rigged assets, garment segmentation, and SMPL-X fittings in one dataset. InsaneπŸ’™

πŸ‘‰Review https://t.ly/b5vxy
πŸ‘‰Paper https://arxiv.org/pdf/2606.23062
πŸ‘‰Project giuli13.github.io/volhume-website/#
πŸ‘‰Repo TBA soon
❀4πŸ”₯2⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘‹ Hi everyone!

Over the past few weeks, the number of join requests has increased dramatically, which unfortunately also means a much higher number of spam and bots (in the last days around five hundreds been cut off)

To help me distinguish real people from fake profiles - and avoid rejecting genuine requests by mistake - I'd really appreciate if your profile includes:
πŸ“· A real profile photo
πŸ‘€ Your full name (or something reasonably identifiable)
πŸ’¬ If you contact me, please use English if possible.

I don't speak Russian, Arabic, or Chinese, so if your profile and messages are only in those languages, it's very difficult for me to tell whether you're a real person or an automated account. Thank you for your understanding and for helping keep this damn community welcoming and spam-free!

With love,
Alessandro 😈
❀18πŸ‘14⚑2πŸ”₯1
Media is too big
VIEW IN TELEGRAM
πŸ€OctoSense: Open SensingπŸ€

πŸ‘‰OctoSense is an open-source sensor platform with stereo RGB and event cameras, LiDAR, a thermal camera, an inertial measurement unit, RTK-corrected global positioning system, and proprioception.

πŸ‘‰Review https://t.ly/oFN8L
πŸ‘‰Paper https://lnkd.in/dM3zpyju
πŸ‘‰Project https://lnkd.in/ddrQ3uJ6
πŸ‘‰Repo https://lnkd.in/dhSDjSfG
❀11πŸ”₯5πŸ’©3
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›ΈPriorEye: Geospatial Self-DrivingπŸ›Έ

πŸ‘‰MRG (Oxford) introduces geospatial visual priors to leverage the street-level images in autonomous driving. Consistent improvement in performance. Repo under ApacheπŸ’™

πŸ‘‰Review https://t.ly/7Jgav
πŸ‘‰Paper https://lnkd.in/dYeD2m7n
πŸ‘‰Project https://lnkd.in/dWJvNemr
πŸ‘‰Repo https://lnkd.in/dNExGGtx
πŸ”₯5❀4πŸ‘2πŸ‘1