This media is not supported in your browser
VIEW IN TELEGRAM
π±3D Human-Object Contactπ±
πPi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπ
πReview https://t.ly/TAgG1
πPaper https://arxiv.org/pdf/2604.12923
πProject https://pi-hoc.github.io/
πRepo https://github.com/SravanChittupalli/Pi-HOC
πPi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπ
πReview https://t.ly/TAgG1
πPaper https://arxiv.org/pdf/2604.12923
πProject https://pi-hoc.github.io/
πRepo https://github.com/SravanChittupalli/Pi-HOC
π₯3β€2π2π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πGCT 3D Reconstructionπ
πANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/ExodA
πPaper https://arxiv.org/pdf/2604.14141
πProject https://arxiv.org/pdf/2604.14141
πRepo github.com/robbyant/lingbot-map
πANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/ExodA
πPaper https://arxiv.org/pdf/2604.14141
πProject https://arxiv.org/pdf/2604.14141
πRepo github.com/robbyant/lingbot-map
π₯9β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©βπ¦°Deformable 3D Hairπ©βπ¦°
πXiβan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπ
πReview https://t.ly/kWZdd
πPaper https://arxiv.org/pdf/2604.14782
πProject yuansun-xjtu.github.io/CompHairHead.io/
πRepo yuansun-xjtu.github.io/CompHairHead.io/
πXiβan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπ
πReview https://t.ly/kWZdd
πPaper https://arxiv.org/pdf/2604.14782
πProject yuansun-xjtu.github.io/CompHairHead.io/
πRepo yuansun-xjtu.github.io/CompHairHead.io/
β€6π₯3π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πMobile Ultra-detailed Avatarsπ
πGiven skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπ
πReview https://t.ly/QPCy6
πPaper https://arxiv.org/pdf/2604.18583
πProject https://vcai.mpi-inf.mpg.de/projects/MUA/
πRepo TBA
πGiven skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπ
πReview https://t.ly/QPCy6
πPaper https://arxiv.org/pdf/2604.18583
πProject https://vcai.mpi-inf.mpg.de/projects/MUA/
πRepo TBA
β€11π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πFace Anything 4D (SOTA)π
πA novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπ
πReview https://t.ly/zItie
πPaper https://arxiv.org/pdf/2604.19702
πProject kocasariumut.github.io/FaceAnything
πRepo TBA
πA novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπ
πReview https://t.ly/zItie
πPaper https://arxiv.org/pdf/2604.19702
πProject kocasariumut.github.io/FaceAnything
πRepo TBA
β€5π₯2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π PY4AI 2026: here we are! π
πThe third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
πThe third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
β€10π1π€―1π’1π€©1
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
π Reshoot-Anything is out π
πReshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 π
πReview https://t.ly/MIqAc
πPaper https://arxiv.org/pdf/2604.21776
πProject adithyaiyer1999.github.io/reshoot-anything/
πRepo github.com/morphicfilms/video-to-video
πReshoot-Anything reshoots dynamic monocular videos under novel camera trajectories. Code under Apache 2.0 π
πReview https://t.ly/MIqAc
πPaper https://arxiv.org/pdf/2604.21776
πProject adithyaiyer1999.github.io/reshoot-anything/
πRepo github.com/morphicfilms/video-to-video
β€5π₯4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈHolistic Shot Boundary Detectionπ§ββοΈ
πOmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmarkπ
πReview https://t.ly/sTi7N
πPaper https://arxiv.org/pdf/2604.24762
πProject uva-computer-vision-lab.github.io/OmniShotCut_website/
πRepo github.com/UVA-Computer-Vision-Lab/OmniShotCut
πOmniShotCut detects shot changes of the video in diverse sources (anime, vlog, game, shorts, sports, screen recording, etc.), and recognize Sudden Jump and Transitions (dissolve, fade, wipe, etc.) by proposing a Shot-Query-based Video Transformer. Repo, demo & benchmarkπ
πReview https://t.ly/sTi7N
πPaper https://arxiv.org/pdf/2604.24762
πProject uva-computer-vision-lab.github.io/OmniShotCut_website/
πRepo github.com/UVA-Computer-Vision-Lab/OmniShotCut
π₯6β€3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺSyn4D: Multiview Synthetic 4D Datasetπͺ
πSyn4D is novel multi-view synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotationsπ
πReview https://t.ly/SL1mk
πPaper https://arxiv.org/pdf/2605.05207
πProject https://jzr99.github.io/Syn4D/
πRepo https://github.com/jzr99/Syn4D
πData huggingface.co/datasets/Syn4D/Syn4D_RGBD/tree/main
πSyn4D is novel multi-view synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotationsπ
πReview https://t.ly/SL1mk
πPaper https://arxiv.org/pdf/2605.05207
πProject https://jzr99.github.io/Syn4D/
πRepo https://github.com/jzr99/Syn4D
πData huggingface.co/datasets/Syn4D/Syn4D_RGBD/tree/main
β€7π₯5π2π1
About the frequency of posting in the channel:
Anonymous Poll
62%
π 1 per day is great
38%
π a few posts per day (such as breaking news with less details) would be better
β€4π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Unified Correspondence Transformerπ¦
πUniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0π
πReview https://t.ly/2OBdq
πPaper https://arxiv.org/pdf/2605.04044
πProject https://neu-vi.github.io/UniCorrn/
πRepo https://github.com/neu-vi/UniCorrn
πUniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0π
πReview https://t.ly/2OBdq
πPaper https://arxiv.org/pdf/2605.04044
πProject https://neu-vi.github.io/UniCorrn/
πRepo https://github.com/neu-vi/UniCorrn
π5π₯5β€4π€―4π2
This media is not supported in your browser
VIEW IN TELEGRAM
πCount Anything, Any Granularityπ
πOpen-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under Apacheπ
πReview https://t.ly/nqz80
πPaper https://lnkd.in/dp7khTRU
πProject https://lnkd.in/d_jfX_Yn
πRepo https://lnkd.in/dkTRGZkG
πData https://lnkd.in/dB83jRyT
πOpen-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under Apacheπ
πReview https://t.ly/nqz80
πPaper https://lnkd.in/dp7khTRU
πProject https://lnkd.in/d_jfX_Yn
πRepo https://lnkd.in/dkTRGZkG
πData https://lnkd.in/dB83jRyT
1β€15π6π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺLatent Decoding Pixel Diffusionπͺ
πPiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0π
πReview https://t.ly/y19mA
πPaper https://lnkd.in/duVC25C2
πProject https://lnkd.in/dW6TkzCB
πRepo https://lnkd.in/dnGdgKRr
πPiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0π
πReview https://t.ly/y19mA
πPaper https://lnkd.in/duVC25C2
πProject https://lnkd.in/dW6TkzCB
πRepo https://lnkd.in/dnGdgKRr
β€8π₯6π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Nvidia Locate Anything π
πDiverse localization tasks under a unified vision-language model, including document understanding, GUI grounding, dense detection, and OCR. Repo releasedπ
πReview https://t.ly/PvwFo
πPaper https://lnkd.in/dWfNpzPZ
πProject https://lnkd.in/dM89BX-8
πRepo https://lnkd.in/dC4KCQSM
πDiverse localization tasks under a unified vision-language model, including document understanding, GUI grounding, dense detection, and OCR. Repo releasedπ
πReview https://t.ly/PvwFo
πPaper https://lnkd.in/dWfNpzPZ
πProject https://lnkd.in/dM89BX-8
πRepo https://lnkd.in/dC4KCQSM
β€13π₯13π1
This media is not supported in your browser
VIEW IN TELEGRAM
π·οΈHuman Universal Graspingπ·οΈ
πHUG is a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera.
πReview https://t.ly/VG1Eu
πPaper https://arxiv.org/pdf/2606.17054
πRepo https://github.com/KevinyWu/hug
πProject https://grasping.io/
πHUG is a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera.
πReview https://t.ly/VG1Eu
πPaper https://arxiv.org/pdf/2606.17054
πRepo https://github.com/KevinyWu/hug
πProject https://grasping.io/
β€10π₯4π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πVolHuMe - Volumetric Human Meshesπ
πVolHuMe (H/T @Martinella_94) is a novel, high-resolution large-scale dataset of volumetric human meshes with complete 4D GT: multi-view RGB-D, textured meshes, dense point clouds, normal maps, rigged assets, garment segmentation, and SMPL-X fittings in one dataset. Insaneπ
πReview https://t.ly/b5vxy
πPaper https://arxiv.org/pdf/2606.23062
πProject giuli13.github.io/volhume-website/#
πRepo TBA soon
πVolHuMe (H/T @Martinella_94) is a novel, high-resolution large-scale dataset of volumetric human meshes with complete 4D GT: multi-view RGB-D, textured meshes, dense point clouds, normal maps, rigged assets, garment segmentation, and SMPL-X fittings in one dataset. Insaneπ
πReview https://t.ly/b5vxy
πPaper https://arxiv.org/pdf/2606.23062
πProject giuli13.github.io/volhume-website/#
πRepo TBA soon
β€4π₯2β‘1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Hi everyone!
Over the past few weeks, the number of join requests has increased dramatically, which unfortunately also means a much higher number of spam and bots (in the last days around five hundreds been cut off)
To help me distinguish real people from fake profiles - and avoid rejecting genuine requests by mistake - I'd really appreciate if your profile includes:
π· A real profile photo
π€ Your full name (or something reasonably identifiable)
π¬ If you contact me, please use English if possible.
I don't speak Russian, Arabic, or Chinese, so if your profile and messages are only in those languages, it's very difficult for me to tell whether you're a real person or an automated account. Thank you for your understanding and for helping keep this damn community welcoming and spam-free!
With love,
Alessandro π
Over the past few weeks, the number of join requests has increased dramatically, which unfortunately also means a much higher number of spam and bots (in the last days around five hundreds been cut off)
To help me distinguish real people from fake profiles - and avoid rejecting genuine requests by mistake - I'd really appreciate if your profile includes:
π· A real profile photo
π€ Your full name (or something reasonably identifiable)
π¬ If you contact me, please use English if possible.
I don't speak Russian, Arabic, or Chinese, so if your profile and messages are only in those languages, it's very difficult for me to tell whether you're a real person or an automated account. Thank you for your understanding and for helping keep this damn community welcoming and spam-free!
With love,
Alessandro π
β€18π14β‘2π₯1
Media is too big
VIEW IN TELEGRAM
πOctoSense: Open Sensingπ
πOctoSense is an open-source sensor platform with stereo RGB and event cameras, LiDAR, a thermal camera, an inertial measurement unit, RTK-corrected global positioning system, and proprioception.
πReview https://t.ly/oFN8L
πPaper https://lnkd.in/dM3zpyju
πProject https://lnkd.in/ddrQ3uJ6
πRepo https://lnkd.in/dhSDjSfG
πOctoSense is an open-source sensor platform with stereo RGB and event cameras, LiDAR, a thermal camera, an inertial measurement unit, RTK-corrected global positioning system, and proprioception.
πReview https://t.ly/oFN8L
πPaper https://lnkd.in/dM3zpyju
πProject https://lnkd.in/ddrQ3uJ6
πRepo https://lnkd.in/dhSDjSfG
β€11π₯5π©3
This media is not supported in your browser
VIEW IN TELEGRAM
πΈPriorEye: Geospatial Self-DrivingπΈ
πMRG (Oxford) introduces geospatial visual priors to leverage the street-level images in autonomous driving. Consistent improvement in performance. Repo under Apacheπ
πReview https://t.ly/7Jgav
πPaper https://lnkd.in/dYeD2m7n
πProject https://lnkd.in/dWJvNemr
πRepo https://lnkd.in/dNExGGtx
πMRG (Oxford) introduces geospatial visual priors to leverage the street-level images in autonomous driving. Consistent improvement in performance. Repo under Apacheπ
πReview https://t.ly/7Jgav
πPaper https://lnkd.in/dYeD2m7n
πProject https://lnkd.in/dWJvNemr
πRepo https://lnkd.in/dNExGGtx
π₯5β€4π2π1