This media is not supported in your browser
VIEW IN TELEGRAM
π§ YOPO: SOTA 9-DoF Poseπ§
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
β€7π₯1π€©1
π¬Intern-S1: SOTA MM-MoE π¬
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
β€6π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«ATLAS: SOTA Human Modelπ«
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
β€7π₯7π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€Diffusive Hand from Signsπ§€
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
β€3π₯3π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ VROOM: F1 Reconstruction ποΈ
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
πBerkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo releasedπ
πReview https://t.ly/uuHdT
πPaper arxiv.org/pdf/2508.17172
πRepo github.com/yajatyadav/vroom
πProject varun-bharadwaj.github.io/vroom/
1β€18π₯5π1
ezgif-8120c4563e81c3.mp4
510.6 KB
π₯Ά OmniHuman-1.5 π₯Ά
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
π#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code π₯Ί
πReview https://t.ly/CnRmX
πPaper arxiv.org/pdf/2508.19209
πProject omnihuman-lab.github.io/v1_5/
πRepo π₯Ί
β€4π€―2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
β½SoccerNet 2025 results!β½
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
πSoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available π
πReview https://t.ly/MfHKg
πPaper https://arxiv.org/pdf/2508.19182
πProject https://www.soccer-net.org/
πRepo https://github.com/SoccerNet
β€14π₯6π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉROSE: Remove Objects & EffectsπΉ
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
πFix the objectβs effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Faceπ
πReview https://t.ly/_KFM0
πPaper https://lnkd.in/dNcTXQAE
πProject https://lnkd.in/dFGmYT5h
πModel https://lnkd.in/dhTT-VkN
πDemo https://lnkd.in/dimgXZT6
πData https://lnkd.in/da7Jv667
β€15π3π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Dress-up & Dance π
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
πNovel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπ₯Ί
πReview https://t.ly/7NeTL
πPaper arxiv.org/pdf/2508.21070
πProject immortalco.github.io/DressAndDance/
πRepo π₯Ί
β€7π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-View 3D Tracking π
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
πMVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπ
πReview https://t.ly/rISMR
πPaper arxiv.org/pdf/2508.21060
πProject https://lnkd.in/drHtAmRC
πRepo https://lnkd.in/d4k8mg3B
β€10π₯5π1
This media is not supported in your browser
VIEW IN TELEGRAM
β€οΈβπ₯PHD: Personalized 3D Humansβ€οΈβπ₯
πETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπ
πReview https://t.ly/IeRhH
πPaper https://arxiv.org/pdf/2508.21257
πProject https://phd-pose.github.io/
πRepo TBA
πETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπ
πReview https://t.ly/IeRhH
πPaper https://arxiv.org/pdf/2508.21257
πProject https://phd-pose.github.io/
πRepo TBA
β€7π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ΄ Pixie: Physics from Pixels πͺ΄
πUPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling realβtime physics simulations. Repo & Dataset under MIT licenseπ
πReview https://t.ly/1W0n5
πPaper https://lnkd.in/dsHAHDqM
πProject https://lnkd.in/dwrHRbRc
πRepo https://lnkd.in/dy7bvjsK
πUPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling realβtime physics simulations. Repo & Dataset under MIT licenseπ
πReview https://t.ly/1W0n5
πPaper https://lnkd.in/dsHAHDqM
πProject https://lnkd.in/dwrHRbRc
πRepo https://lnkd.in/dy7bvjsK
β€5π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«TMR: Few-Shot Template-matchingπ«
πPOSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπ
πReview https://t.ly/WWAcL
πPaper https://lnkd.in/dJbSu5vk
πProject https://lnkd.in/dwcDnHHQ
πRepo https://lnkd.in/dp7aw8Cs
πPOSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπ
πReview https://t.ly/WWAcL
πPaper https://lnkd.in/dJbSu5vk
πProject https://lnkd.in/dwcDnHHQ
πRepo https://lnkd.in/dp7aw8Cs
π₯5β€3π1
𧬠OpenVision 2 is out! π§¬
πUCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0π
πReview https://t.ly/Oma3w
πPaper https://arxiv.org/pdf/2509.01644
πProject https://ucsc-vlaa.github.io/OpenVision2/
πRepo https://github.com/UCSC-VLAA/OpenVision
πUCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0π
πReview https://t.ly/Oma3w
πPaper https://arxiv.org/pdf/2509.01644
πProject https://ucsc-vlaa.github.io/OpenVision2/
πRepo https://github.com/UCSC-VLAA/OpenVision
π₯7β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #DoubleDragon with #AI π
πHow Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the gameβs streets into the real world. AUDIO ON. Damn romanticπ
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
πPost https://t.ly/0IpER
πChannel http://www.youtube.com/@iaiaoh84
πHow Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the gameβs streets into the real world. AUDIO ON. Damn romanticπ
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
πPost https://t.ly/0IpER
πChannel http://www.youtube.com/@iaiaoh84
β€5π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Promptable Human Mesh π
πPromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes βside informationβ readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπ
πReview https://t.ly/zJ7S-
πPaper arxiv.org/pdf/2504.06397
πProject yufu-wang.github.io/phmr-page/
πRepo github.com/yufu-wang/PromptHMR
πPromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes βside informationβ readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπ
πReview https://t.ly/zJ7S-
πPaper arxiv.org/pdf/2504.06397
πProject yufu-wang.github.io/phmr-page/
πRepo github.com/yufu-wang/PromptHMR
π€£17β€10π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯WebEyeTrack: real-time/web eyeπ₯
πWebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deepβlearning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπ
πReview https://t.ly/Xon9h
πPaper https://arxiv.org/pdf/2508.19544
πProject redforestai.github.io/WebEyeTrack/
πRepo github.com/RedForestAi/WebEyeTrack
πWebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deepβlearning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπ
πReview https://t.ly/Xon9h
πPaper https://arxiv.org/pdf/2508.19544
πProject redforestai.github.io/WebEyeTrack/
πRepo github.com/RedForestAi/WebEyeTrack
π₯7β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ AI Open-Source Annotation βοΈ
πVisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0π
πReview https://t.ly/MoMvv
πPaper https://lnkd.in/dxTncSgv
πRepo https://lnkd.in/dCWMXp3x
πVisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0π
πReview https://t.ly/MoMvv
πPaper https://lnkd.in/dxTncSgv
πRepo https://lnkd.in/dCWMXp3x
π₯11π€―4π3β€2β‘1
Friends,
Iβve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
Iβve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
π9π€―1