This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ VROOM: F1 Reconstruction ποΈ
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
1β€18π₯5π1
ezgif-8120c4563e81c3.mp4
510.6 KB
π₯Ά OmniHuman-1.5 π₯Ά
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
β€4π€―2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
β½SoccerNet 2025 results!β½
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
β€14π₯6π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉROSE: Remove Objects & EffectsπΉ
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
β€15π3π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Dress-up & Dance π
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
β€7π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-View 3D Tracking π
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
β€10π₯5π1
This media is not supported in your browser
VIEW IN TELEGRAM
β€οΈβπ₯PHD: Personalized 3D Humansβ€οΈβπ₯
πETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπ
πReview https://t.ly/IeRhH
πPaper https://arxiv.org/pdf/2508.21257
πProject https://phd-pose.github.io/
πRepo TBA
πETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπ
πReview https://t.ly/IeRhH
πPaper https://arxiv.org/pdf/2508.21257
πProject https://phd-pose.github.io/
πRepo TBA
β€7π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ΄ Pixie: Physics from Pixels πͺ΄
πUPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling realβtime physics simulations. Repo & Dataset under MIT licenseπ
πReview https://t.ly/1W0n5
πPaper https://lnkd.in/dsHAHDqM
πProject https://lnkd.in/dwrHRbRc
πRepo https://lnkd.in/dy7bvjsK
πUPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling realβtime physics simulations. Repo & Dataset under MIT licenseπ
πReview https://t.ly/1W0n5
πPaper https://lnkd.in/dsHAHDqM
πProject https://lnkd.in/dwrHRbRc
πRepo https://lnkd.in/dy7bvjsK
β€5π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«TMR: Few-Shot Template-matchingπ«
πPOSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπ
πReview https://t.ly/WWAcL
πPaper https://lnkd.in/dJbSu5vk
πProject https://lnkd.in/dwcDnHHQ
πRepo https://lnkd.in/dp7aw8Cs
πPOSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπ
πReview https://t.ly/WWAcL
πPaper https://lnkd.in/dJbSu5vk
πProject https://lnkd.in/dwcDnHHQ
πRepo https://lnkd.in/dp7aw8Cs
π₯5β€3π1
𧬠OpenVision 2 is out! π§¬
πUCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0π
πReview https://t.ly/Oma3w
πPaper https://arxiv.org/pdf/2509.01644
πProject https://ucsc-vlaa.github.io/OpenVision2/
πRepo https://github.com/UCSC-VLAA/OpenVision
πUCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0π
πReview https://t.ly/Oma3w
πPaper https://arxiv.org/pdf/2509.01644
πProject https://ucsc-vlaa.github.io/OpenVision2/
πRepo https://github.com/UCSC-VLAA/OpenVision
π₯7β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #DoubleDragon with #AI π
πHow Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the gameβs streets into the real world. AUDIO ON. Damn romanticπ
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
πPost https://t.ly/0IpER
πChannel http://www.youtube.com/@iaiaoh84
πHow Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the gameβs streets into the real world. AUDIO ON. Damn romanticπ
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
πPost https://t.ly/0IpER
πChannel http://www.youtube.com/@iaiaoh84
β€5π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Promptable Human Mesh π
πPromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes βside informationβ readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπ
πReview https://t.ly/zJ7S-
πPaper arxiv.org/pdf/2504.06397
πProject yufu-wang.github.io/phmr-page/
πRepo github.com/yufu-wang/PromptHMR
πPromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes βside informationβ readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπ
πReview https://t.ly/zJ7S-
πPaper arxiv.org/pdf/2504.06397
πProject yufu-wang.github.io/phmr-page/
πRepo github.com/yufu-wang/PromptHMR
π€£18β€10π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯WebEyeTrack: real-time/web eyeπ₯
πWebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deepβlearning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπ
πReview https://t.ly/Xon9h
πPaper https://arxiv.org/pdf/2508.19544
πProject redforestai.github.io/WebEyeTrack/
πRepo github.com/RedForestAi/WebEyeTrack
πWebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deepβlearning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπ
πReview https://t.ly/Xon9h
πPaper https://arxiv.org/pdf/2508.19544
πProject redforestai.github.io/WebEyeTrack/
πRepo github.com/RedForestAi/WebEyeTrack
π₯8β€3π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ AI Open-Source Annotation βοΈ
πVisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0π
πReview https://t.ly/MoMvv
πPaper https://lnkd.in/dxTncSgv
πRepo https://lnkd.in/dCWMXp3x
πVisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0π
πReview https://t.ly/MoMvv
πPaper https://lnkd.in/dxTncSgv
πRepo https://lnkd.in/dCWMXp3x
π₯11π€―4β€3π3β‘1
Friends,
Iβve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
Iβve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
π10π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈReal-Time Drag-Based EditingποΈ
πThe Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)π
πReview https://t.ly/H5nlR
πPaper https://arxiv.org/pdf/2509.04582
πProject https://visual-ai.github.io/inpaint4drag/
πRepo https://github.com/Visual-AI/Inpaint4Drag
πDemo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
πThe Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)π
πReview https://t.ly/H5nlR
πPaper https://arxiv.org/pdf/2509.04582
πProject https://visual-ai.github.io/inpaint4drag/
πRepo https://github.com/Visual-AI/Inpaint4Drag
πDemo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
β€6π₯5π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©ΈFoundation Red Blood Cellsπ©Έ
πRedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0π
πReview https://t.ly/uWAch
πPaper arxiv.org/pdf/2508.08180
πCode github.com/Snarci/RedDino
πModels huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
πRedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0π
πReview https://t.ly/uWAch
πPaper arxiv.org/pdf/2508.08180
πCode github.com/Snarci/RedDino
πModels huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
β€13π4π₯2