This media is not supported in your browser
VIEW IN TELEGRAM
πTOTNet: Occlusion-aware Trackingπ
πTOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπ
πReview https://t.ly/Q0jAf
πPaper https://lnkd.in/dUYsa-GC
πRepo https://lnkd.in/d3QGUHYb
πTOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπ
πReview https://t.ly/Q0jAf
πPaper https://lnkd.in/dUYsa-GC
πRepo https://lnkd.in/d3QGUHYb
π₯10β€5π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πFeed-Forward 4D videoπ
π4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced π
πReview https://t.ly/SpkD-
πPaper arxiv.org/pdf/2508.13154
πProject https://4dnex.github.io/
πRepo github.com/3DTopia/4DNeX
πData https://lnkd.in/dh4_3Ghf
πDemo https://lnkd.in/dztyzwgg
π4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced π
πReview https://t.ly/SpkD-
πPaper arxiv.org/pdf/2508.13154
πProject https://4dnex.github.io/
πRepo github.com/3DTopia/4DNeX
πData https://lnkd.in/dh4_3Ghf
πDemo https://lnkd.in/dztyzwgg
β€9π₯7π1
This media is not supported in your browser
VIEW IN TELEGRAM
πDAViD: Synthetic Depth-Normal-Segmentationπ
π#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπ
πReview https://t.ly/-SlO_
πPaper https://lnkd.in/eCmMXpTg
πProject https://lnkd.in/eurCSWkm
πRepo https://lnkd.in/e7PWFgP2
π#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπ
πReview https://t.ly/-SlO_
πPaper https://lnkd.in/eCmMXpTg
πProject https://lnkd.in/eurCSWkm
πRepo https://lnkd.in/e7PWFgP2
π7β€4π₯2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π OmniTry: Virtual Try-On Anything π
πOmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπ
πReview https://t.ly/wMBGQ
πPaper https://lnkd.in/dQe9MchS
πProject https://omnitry.github.io/
πRepo https://lnkd.in/d3QwAXY2
π€Demo https://lnkd.in/duUcZpVA
πOmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπ
πReview https://t.ly/wMBGQ
πPaper https://lnkd.in/dQe9MchS
πProject https://omnitry.github.io/
πRepo https://lnkd.in/d3QwAXY2
π€Demo https://lnkd.in/duUcZpVA
π₯15β€4π’1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘ ROVR Open Dataset is out π‘
πA novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπ
πReview https://t.ly/iDcvg
πPaper https://arxiv.org/pdf/2508.13977
πProject https://xiandaguo.net/ROVR-Open-Dataset
πA novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπ
πReview https://t.ly/iDcvg
πPaper https://arxiv.org/pdf/2508.13977
πProject https://xiandaguo.net/ROVR-Open-Dataset
β€12π₯4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ YOPO: SOTA 9-DoF Poseπ§
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
β€7π₯1π€©1
π¬Intern-S1: SOTA MM-MoE π¬
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
β€6π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«ATLAS: SOTA Human Modelπ«
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
β€7π₯7π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€Diffusive Hand from Signsπ§€
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
β€3π₯3π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ VROOM: F1 Reconstruction ποΈ
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
1β€18π₯5π1
ezgif-8120c4563e81c3.mp4
510.6 KB
π₯Ά OmniHuman-1.5 π₯Ά
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
β€4π€―2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
β½SoccerNet 2025 results!β½
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
β€14π₯6π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉROSE: Remove Objects & EffectsπΉ
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
β€15π3π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Dress-up & Dance π
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
β€7π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-View 3D Tracking π
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
β€10π₯5π1
This media is not supported in your browser
VIEW IN TELEGRAM
β€οΈβπ₯PHD: Personalized 3D Humansβ€οΈβπ₯
πETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπ
πReview https://t.ly/IeRhH
πPaper https://arxiv.org/pdf/2508.21257
πProject https://phd-pose.github.io/
πRepo TBA
πETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπ
πReview https://t.ly/IeRhH
πPaper https://arxiv.org/pdf/2508.21257
πProject https://phd-pose.github.io/
πRepo TBA
β€7π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ΄ Pixie: Physics from Pixels πͺ΄
πUPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling realβtime physics simulations. Repo & Dataset under MIT licenseπ
πReview https://t.ly/1W0n5
πPaper https://lnkd.in/dsHAHDqM
πProject https://lnkd.in/dwrHRbRc
πRepo https://lnkd.in/dy7bvjsK
πUPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling realβtime physics simulations. Repo & Dataset under MIT licenseπ
πReview https://t.ly/1W0n5
πPaper https://lnkd.in/dsHAHDqM
πProject https://lnkd.in/dwrHRbRc
πRepo https://lnkd.in/dy7bvjsK
β€5π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«TMR: Few-Shot Template-matchingπ«
πPOSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπ
πReview https://t.ly/WWAcL
πPaper https://lnkd.in/dJbSu5vk
πProject https://lnkd.in/dwcDnHHQ
πRepo https://lnkd.in/dp7aw8Cs
πPOSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπ
πReview https://t.ly/WWAcL
πPaper https://lnkd.in/dJbSu5vk
πProject https://lnkd.in/dwcDnHHQ
πRepo https://lnkd.in/dp7aw8Cs
π₯5β€3π1
𧬠OpenVision 2 is out! π§¬
πUCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0π
πReview https://t.ly/Oma3w
πPaper https://arxiv.org/pdf/2509.01644
πProject https://ucsc-vlaa.github.io/OpenVision2/
πRepo https://github.com/UCSC-VLAA/OpenVision
πUCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0π
πReview https://t.ly/Oma3w
πPaper https://arxiv.org/pdf/2509.01644
πProject https://ucsc-vlaa.github.io/OpenVision2/
πRepo https://github.com/UCSC-VLAA/OpenVision
π₯7β€1π1