This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈGENMO: Generalist Human Motion π§ββοΈ
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π₯13β€3π2π’1π1
Dear friends,
Iβm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iβm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donβt have right now. Iβm sorry, be patient. Iβll be back.
Love u all,
Alessandro.
Iβm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iβm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donβt have right now. Iβm sorry, be patient. Iβll be back.
Love u all,
Alessandro.
β€397π28π’27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
1β€195π16π₯14π5πΎ3π’2π©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ DINOv3 is out π¦
π#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 Licenseπ
πReview https://t.ly/-S3ZL
πPaper https://t.ly/ervOT
πProject https://lnkd.in/dHFf3esd
πRepo https://lnkd.in/dPxhDxAq
π€HF https://lnkd.in/dWGudY2i
π#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 Licenseπ
πReview https://t.ly/-S3ZL
πPaper https://t.ly/ervOT
πProject https://lnkd.in/dHFf3esd
πRepo https://lnkd.in/dPxhDxAq
π€HF https://lnkd.in/dWGudY2i
β€40π₯13π2π1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π€ Impact of SuperHuman AI π€
πThe NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy π
πReview https://t.ly/EgmfJ
πProject https://ai-2027.com/
πThe NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy π
πReview https://t.ly/EgmfJ
πProject https://ai-2027.com/
β€7π€―2π₯1π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
πTOTNet: Occlusion-aware Trackingπ
πTOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπ
πReview https://t.ly/Q0jAf
πPaper https://lnkd.in/dUYsa-GC
πRepo https://lnkd.in/d3QGUHYb
πTOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπ
πReview https://t.ly/Q0jAf
πPaper https://lnkd.in/dUYsa-GC
πRepo https://lnkd.in/d3QGUHYb
π₯10β€5π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πFeed-Forward 4D videoπ
π4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced π
πReview https://t.ly/SpkD-
πPaper arxiv.org/pdf/2508.13154
πProject https://4dnex.github.io/
πRepo github.com/3DTopia/4DNeX
πData https://lnkd.in/dh4_3Ghf
πDemo https://lnkd.in/dztyzwgg
π4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced π
πReview https://t.ly/SpkD-
πPaper arxiv.org/pdf/2508.13154
πProject https://4dnex.github.io/
πRepo github.com/3DTopia/4DNeX
πData https://lnkd.in/dh4_3Ghf
πDemo https://lnkd.in/dztyzwgg
β€9π₯7π1
This media is not supported in your browser
VIEW IN TELEGRAM
πDAViD: Synthetic Depth-Normal-Segmentationπ
π#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπ
πReview https://t.ly/-SlO_
πPaper https://lnkd.in/eCmMXpTg
πProject https://lnkd.in/eurCSWkm
πRepo https://lnkd.in/e7PWFgP2
π#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπ
πReview https://t.ly/-SlO_
πPaper https://lnkd.in/eCmMXpTg
πProject https://lnkd.in/eurCSWkm
πRepo https://lnkd.in/e7PWFgP2
π7β€4π₯2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π OmniTry: Virtual Try-On Anything π
πOmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπ
πReview https://t.ly/wMBGQ
πPaper https://lnkd.in/dQe9MchS
πProject https://omnitry.github.io/
πRepo https://lnkd.in/d3QwAXY2
π€Demo https://lnkd.in/duUcZpVA
πOmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπ
πReview https://t.ly/wMBGQ
πPaper https://lnkd.in/dQe9MchS
πProject https://omnitry.github.io/
πRepo https://lnkd.in/d3QwAXY2
π€Demo https://lnkd.in/duUcZpVA
π₯15β€4π’1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘ ROVR Open Dataset is out π‘
πA novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπ
πReview https://t.ly/iDcvg
πPaper https://arxiv.org/pdf/2508.13977
πProject https://xiandaguo.net/ROVR-Open-Dataset
πA novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπ
πReview https://t.ly/iDcvg
πPaper https://arxiv.org/pdf/2508.13977
πProject https://xiandaguo.net/ROVR-Open-Dataset
β€12π₯4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ YOPO: SOTA 9-DoF Poseπ§
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
β€7π₯1π€©1
π¬Intern-S1: SOTA MM-MoE π¬
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
β€6π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«ATLAS: SOTA Human Modelπ«
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
β€7π₯7π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€Diffusive Hand from Signsπ§€
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
β€3π₯3π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ VROOM: F1 Reconstruction ποΈ
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
1β€18π₯5π1
ezgif-8120c4563e81c3.mp4
510.6 KB
π₯Ά OmniHuman-1.5 π₯Ά
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
β€4π€―2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
β½SoccerNet 2025 results!β½
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
β€15π₯6π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉROSE: Remove Objects & EffectsπΉ
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
β€15π3π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Dress-up & Dance π
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
β€8π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-View 3D Tracking π
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
β€10π₯5π1