This media is not supported in your browser
VIEW IN TELEGRAM
π₯ GaussianGPT 3D GSCπ₯
πFrom TUM, GaussianGPT: transformer-based 3D Gaussians generation via next-token prediction -> full 3D complex indoor scene. Repo announcedπ
πReview https://t.ly/bj-lL
πPaper arxiv.org/pdf/2603.26661
πProject nicolasvonluetzow.github.io/GaussianGPT/
πRepo TBA
πFrom TUM, GaussianGPT: transformer-based 3D Gaussians generation via next-token prediction -> full 3D complex indoor scene. Repo announcedπ
πReview https://t.ly/bj-lL
πPaper arxiv.org/pdf/2603.26661
πProject nicolasvonluetzow.github.io/GaussianGPT/
πRepo TBA
π₯8β€2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πHandX: Scaling Hands Motionπ
π HandX is a unified foundation spanning data, annotation, and evaluation: novel large-scale dataset of bimanual & dexterous motions with fine-grained textual. Around 6M frames. Repo availableπ
πReview https://t.ly/1nGxw
πPaper https://arxiv.org/pdf/2603.28766
πProject https://handx-project.github.io/
πRepo github.com/handx-project/HandX
π HandX is a unified foundation spanning data, annotation, and evaluation: novel large-scale dataset of bimanual & dexterous motions with fine-grained textual. Around 6M frames. Repo availableπ
πReview https://t.ly/1nGxw
πPaper https://arxiv.org/pdf/2603.28766
πProject https://handx-project.github.io/
πRepo github.com/handx-project/HandX
π₯9β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π΅SOTA Training-Free In-Context Segmentationπ΅
πINSID3 is the new SOTA, training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. Repo under Apache 2.0π
πReview https://t.ly/NVWHN
πPaper arxiv.org/pdf/2603.28480
πProject visinf.github.io/INSID3/
πRepo github.com/visinf/INSID3
πINSID3 is the new SOTA, training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. Repo under Apache 2.0π
πReview https://t.ly/NVWHN
πPaper arxiv.org/pdf/2603.28480
πProject visinf.github.io/INSID3/
πRepo github.com/visinf/INSID3
β€16π₯2π€©2π1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ¬Camera Raw Image Generationπͺ¬
πRawGen by #Samsung is a generative approach that learns the complex distribution of raw sensor data directly, enabling high-fidelity generation from either text descriptions or standard sRGB images across arbitrary camera sensors. Linear raw image once, then apply any ISP operation. Repo announcedπ
πReview https://t.ly/_QVKP
πPaper https://arxiv.org/pdf/2604.00093
πProject https://dy112.github.io/rawgen-page/
πRepo TBA
πRawGen by #Samsung is a generative approach that learns the complex distribution of raw sensor data directly, enabling high-fidelity generation from either text descriptions or standard sRGB images across arbitrary camera sensors. Linear raw image once, then apply any ISP operation. Repo announcedπ
πReview https://t.ly/_QVKP
πPaper https://arxiv.org/pdf/2604.00093
πProject https://dy112.github.io/rawgen-page/
πRepo TBA
β€4π₯2π1
If you have to invest TODAY 1B$ on a frontier tech for the next decade, would you invest in space, agentic, quantum or frugal GPUs? Vote here: https://t.ly/hSx6i
π€£3β€1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πVideo Object Deletionπ
πVoid by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0π
πReview https://t.ly/cMVny
πPaper https://arxiv.org/pdf/2604.02296
πProject https://void-model.github.io/
πRepo https://github.com/Netflix/void-model
πVoid by Netflix is a novel video object removal framework designed to perform physically-plausible inpainting in very complex scenarios. Repo under Apache 2.0π
πReview https://t.ly/cMVny
πPaper https://arxiv.org/pdf/2604.02296
πProject https://void-model.github.io/
πRepo https://github.com/Netflix/void-model
β€4π€―3π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Vanast: VTON w/ Human Animationπ₯
πSNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπ
πReview https://t.ly/c0t79
πPaper arxiv.org/pdf/2604.04934
πProject hyunsoocha.github.io/vanast/
πRepo github.com/snuvclab/vanast
πSNU unveils a novel unified framework that generates garment-transferred human animation videos directly from a single human/garment images, and pose guidance clip. Repo announcedπ
πReview https://t.ly/c0t79
πPaper arxiv.org/pdf/2604.04934
πProject hyunsoocha.github.io/vanast/
πRepo github.com/snuvclab/vanast
β€7π2π€―2π₯1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯BoxerNet: SOTA 2D->3D BBsπ₯
πBoxer by META: transformer-based network to lift 2D BB proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/mlmV1
πPaper https://arxiv.org/pdf/2604.05212
πProject facebookresearch.github.io/boxer/
πRepo github.com/facebookresearch/boxer
πBoxer by META: transformer-based network to lift 2D BB proposals into 3D, followed by multi-view fusion and geometric filtering to produce globally consistent de-duplicated 3DBBs in metric world space. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/mlmV1
πPaper https://arxiv.org/pdf/2604.05212
πProject facebookresearch.github.io/boxer/
πRepo github.com/facebookresearch/boxer
π€―9π1π₯1
Media is too big
VIEW IN TELEGRAM
Here the preview, tomorrow the full clip from official source :)
β€5π₯1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ1.1M Metric VTON Datasetπͺ
πGoogle's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announcedπ
πReview https://t.ly/cs-pt
πPaper arxiv.org/pdf/2604.08526
πProject johannakarras.github.io/FIT/
πRepo TBA
πGoogle's Fit-Inclusive Try-on: large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. Repo & dataset announcedπ
πReview https://t.ly/cs-pt
πPaper arxiv.org/pdf/2604.08526
πProject johannakarras.github.io/FIT/
πRepo TBA
π₯8β€2π1
π6D Object Pose w/ Deformationπ
πDeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announcedπ
πReview https://t.ly/M5VgX
πPaper https://arxiv.org/pdf/2604.06720
πProject https://desope-6d.github.io/
πRepo TBA
πDeSOPE by Xidian & #MagicLeap is a novel large-scale dataset for 6DoF deformed objects: 665K pose annotations produced via a semiautomatic pipeline. Repo & Dataset announcedπ
πReview https://t.ly/M5VgX
πPaper https://arxiv.org/pdf/2604.06720
πProject https://desope-6d.github.io/
πRepo TBA
π₯8β€3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯SOTA 3D Detection in the wildπ₯
πWildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone π
πReview https://t.ly/8NxBN
πPaper arxiv.org/pdf/2604.08626
πProject allenai.github.io/WildDet3D/
πRepo github.com/allenai/WildDet3D
πWildDet3D is a novel unified geometry-aware architecture for 3D detection that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference time. New SOTA! Repo, models and iphone π
πReview https://t.ly/8NxBN
πPaper arxiv.org/pdf/2604.08626
πProject allenai.github.io/WildDet3D/
πRepo github.com/allenai/WildDet3D
π₯7β€4π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π§΄OmniShow Content Creationπ§΄
πOmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announcedπ
πReview https://t.ly/Pm-7U
πPaper arxiv.org/pdf/2604.11804
πProject correr-zhou.github.io/OmniShow/
πRepo github.com/Correr-Zhou/OmniShow
πOmniShow is the novel SOTA in content creation with industry-grade performance. Impressive results, best with audio. Repo announcedπ
πReview https://t.ly/Pm-7U
πPaper arxiv.org/pdf/2604.11804
πProject correr-zhou.github.io/OmniShow/
πRepo github.com/Correr-Zhou/OmniShow
β€7π€―6π’1
This media is not supported in your browser
VIEW IN TELEGRAM
πInteractive Objects from EgoVideoπ
πEgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & datasetπ
πReview https://t.ly/YhGN7
πPaper arxiv.org/pdf/2604.11038
πProject 3dlg-hcvc.github.io/EgoFun3D/
πRepo github.com/3dlg-hcvc/EgoFun3D
πDemo bc79fea884062374b3.gradio.live/
πEgoFun3D by Simon Fraser University is a coordinated task, dataset and benchmark for modeling interactive 3D objects from egocentric videos. Repo (TBA), demo & datasetπ
πReview https://t.ly/YhGN7
πPaper arxiv.org/pdf/2604.11038
πProject 3dlg-hcvc.github.io/EgoFun3D/
πRepo github.com/3dlg-hcvc/EgoFun3D
πDemo bc79fea884062374b3.gradio.live/
β€2π€―2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π±3D Human-Object Contactπ±
πPi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπ
πReview https://t.ly/TAgG1
πPaper https://arxiv.org/pdf/2604.12923
πProject https://pi-hoc.github.io/
πRepo https://github.com/SravanChittupalli/Pi-HOC
πPi-HOC by CMU + NREC is a novel single-pass, instance-aware framework for dense 3D semantic contact prediction of all human-object pairs. Repo announcedπ
πReview https://t.ly/TAgG1
πPaper https://arxiv.org/pdf/2604.12923
πProject https://pi-hoc.github.io/
πRepo https://github.com/SravanChittupalli/Pi-HOC
π₯3β€2π2π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πGCT 3D Reconstructionπ
πANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/ExodA
πPaper https://arxiv.org/pdf/2604.14141
πProject https://arxiv.org/pdf/2604.14141
πRepo github.com/robbyant/lingbot-map
πANT unveils LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. Repo under A-NC 4.0 Internationalπ
πReview https://t.ly/ExodA
πPaper https://arxiv.org/pdf/2604.14141
πProject https://arxiv.org/pdf/2604.14141
πRepo github.com/robbyant/lingbot-map
π₯9β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©βπ¦°Deformable 3D Hairπ©βπ¦°
πXiβan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπ
πReview https://t.ly/kWZdd
πPaper https://arxiv.org/pdf/2604.14782
πProject yuansun-xjtu.github.io/CompHairHead.io/
πRepo yuansun-xjtu.github.io/CompHairHead.io/
πXiβan Jiaotong University unveils a novel method that reconstructs decoupled 3D Gaussian head avatars from a single input image: effortless hairstyle transfer with natural dynamic hair motion. Code announcedπ
πReview https://t.ly/kWZdd
πPaper https://arxiv.org/pdf/2604.14782
πProject yuansun-xjtu.github.io/CompHairHead.io/
πRepo yuansun-xjtu.github.io/CompHairHead.io/
β€6π₯3π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πMobile Ultra-detailed Avatarsπ
πGiven skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπ
πReview https://t.ly/QPCy6
πPaper https://arxiv.org/pdf/2604.18583
πProject https://vcai.mpi-inf.mpg.de/projects/MUA/
πRepo TBA
πGiven skeletal poses and a virtual camera as inputs, MUA by Max Planck Institute produces photorealistic renderings and hyper-detailed geometry of animatable clothed humans. Repo announcedπ
πReview https://t.ly/QPCy6
πPaper https://arxiv.org/pdf/2604.18583
πProject https://vcai.mpi-inf.mpg.de/projects/MUA/
πRepo TBA
β€11π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πFace Anything 4D (SOTA)π
πA novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπ
πReview https://t.ly/zItie
πPaper https://arxiv.org/pdf/2604.19702
πProject kocasariumut.github.io/FaceAnything
πRepo TBA
πA novel unified 4D facial reconstruction and dense tracking from image sequences: new SOTA in facial single-image and mono-video depth estimation, dense 4D reconstruction, and 3D point tracking. Repo & Dataset announcedπ
πReview https://t.ly/zItie
πPaper https://arxiv.org/pdf/2604.19702
πProject kocasariumut.github.io/FaceAnything
πRepo TBA
β€5π₯2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π PY4AI 2026: here we are! π
πThe third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
πThe third edition of our conference is official! Speaker list and (free) tickets: https://t.ly/L4_52
β€10π1π€―1π’1π€©1