This media is not supported in your browser
VIEW IN TELEGRAM
πͺ¬UHM: Authentic Hand by Phoneπͺ¬
π META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released π
πReview https://t.ly/fU5rA
πPaper https://lnkd.in/dyGaiAnq
πCode https://lnkd.in/d9B_XFAA
π META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released π
πReview https://t.ly/fU5rA
πPaper https://lnkd.in/dyGaiAnq
πCode https://lnkd.in/d9B_XFAA
π4β€1π₯1π€―1
π₯EfficientTrain++: Efficient Foundation Visual Backbone Trainingπ₯
πTsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0Γ faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released π
πReview https://t.ly/D8ttv
πPaper https://arxiv.org/pdf/2405.08768
πCode https://github.com/LeapLabTHU/EfficientTrain
πTsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0Γ faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released π
πReview https://t.ly/D8ttv
πPaper https://arxiv.org/pdf/2405.08768
πCode https://github.com/LeapLabTHU/EfficientTrain
π9π₯3π€―3β€2π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π« EchoTracker: Tracking Echocardiographyπ«
πEchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code releasedπ
πReview https://t.ly/NyBe0
πPaper https://arxiv.org/pdf/2405.08587
πCode https://github.com/riponazad/echotracker/
πEchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code releasedπ
πReview https://t.ly/NyBe0
πPaper https://arxiv.org/pdf/2405.08587
πCode https://github.com/riponazad/echotracker/
β€15π1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Grounding DINO 1.5 Pro/Edge π¦
πGrounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0π
πReview https://t.ly/kS-og
πPaper https://lnkd.in/dNakMge2
πCode https://lnkd.in/djhnQmrm
πGrounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0π
πReview https://t.ly/kS-og
πPaper https://lnkd.in/dNakMge2
πCode https://lnkd.in/djhnQmrm
π₯22β€1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
β½3D Shot Posture in Broadcastβ½
πNagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.
πReview https://t.ly/IIMeZ
πPaper https://arxiv.org/pdf/2405.12070
πCode https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
πNagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.
πReview https://t.ly/IIMeZ
πPaper https://arxiv.org/pdf/2405.12070
πCode https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
π₯8π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΌοΈ Diffusive Images that Sound πΌοΈ
πThe University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.
πReview https://t.ly/ADtYM
πPaper arxiv.org/pdf/2405.12221
πProject ificl.github.io/images-that-sound
πCode github.com/IFICL/images-that-sound
πThe University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.
πReview https://t.ly/ADtYM
πPaper arxiv.org/pdf/2405.12221
πProject ificl.github.io/images-that-sound
πCode github.com/IFICL/images-that-sound
π€―11β€5π5π₯4π1
This media is not supported in your browser
VIEW IN TELEGRAM
πViViD: Diffusion VTONπ
πViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yetπ’
πReview https://t.ly/h_SyP
πPaper arxiv.org/pdf/2405.11794
πRepo https://lnkd.in/dT4_bzPw
πProject https://lnkd.in/dCK5ug4v
πViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yetπ’
πReview https://t.ly/h_SyP
πPaper arxiv.org/pdf/2405.11794
πRepo https://lnkd.in/dT4_bzPw
πProject https://lnkd.in/dCK5ug4v
π₯13π€©3β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πOmniGlue: Foundation Matcherπ
π#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!
πReview https://t.ly/ezaIc
πPaper https://arxiv.org/pdf/2405.12979
πProject hwjiang1510.github.io/OmniGlue/
πCode https://github.com/google-research/omniglue/
π#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!
πReview https://t.ly/ezaIc
πPaper https://arxiv.org/pdf/2405.12979
πProject hwjiang1510.github.io/OmniGlue/
πCode https://github.com/google-research/omniglue/
π€―10β€6π2π1
π₯ YOLOv10 is out π₯
πYOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0π
πReview https://shorturl.at/ZIHBh
πPaper arxiv.org/pdf/2405.14458
πCode https://github.com/THU-MIG/yolov10/
πYOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0π
πReview https://shorturl.at/ZIHBh
πPaper arxiv.org/pdf/2405.14458
πCode https://github.com/THU-MIG/yolov10/
π₯25β€3π2β‘1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈUnsupervised Neuromorphic MotionβοΈ
πThe Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
πReview https://t.ly/UZzIZ
πPaper arxiv.org/pdf/2405.15209
πProject samiarja.github.io/evairborne
πRepo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
πThe Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
πReview https://t.ly/UZzIZ
πPaper arxiv.org/pdf/2405.15209
πProject samiarja.github.io/evairborne
πRepo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
π5π₯1π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Z.S. Diffusive Segmentation π¦
πKAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MITπ
πReview https://t.ly/v_64K
πPaper arxiv.org/pdf/2405.16947
πProject https://lnkd.in/dcSt4dQx
πCode https://lnkd.in/dcZfM8F3
πKAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MITπ
πReview https://t.ly/v_64K
πPaper arxiv.org/pdf/2405.16947
πProject https://lnkd.in/dcSt4dQx
πCode https://lnkd.in/dcZfM8F3
π€―4π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ° Dynamic Gaussian Fusion via 4D Motion Scaffolds πͺ°
πMoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
πReview https://t.ly/nSdEL
πPaper arxiv.org/pdf/2405.17421
πCode github.com/JiahuiLei/MoSca
πProject https://lnkd.in/dkjMVcqZ
πMoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
πReview https://t.ly/nSdEL
πPaper arxiv.org/pdf/2405.17421
πCode github.com/JiahuiLei/MoSca
πProject https://lnkd.in/dkjMVcqZ
π₯6π1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€Transformer-based 4D Handsπ§€
π4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced π’
πReview https://t.ly/wvG-l
πPaper arxiv.org/pdf/2405.20330
πProject 4dhands.github.io/
π4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced π’
πReview https://t.ly/wvG-l
πPaper arxiv.org/pdf/2405.20330
πProject 4dhands.github.io/
π₯4π€―3β€1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πNew 2D Landmarks SOTAπ
πFlawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπ₯Ή
πReview https://t.ly/lew9a
πPaper arxiv.org/pdf/2405.19646
πProject davidcferman.github.io/FaceLift
πFlawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπ₯Ή
πReview https://t.ly/lew9a
πPaper arxiv.org/pdf/2405.19646
πProject davidcferman.github.io/FaceLift
π₯16β€5π’5π2π©2β‘1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π³ MultiPly: in-the-wild Multi-People π³
πMultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπ
πReview https://t.ly/_xjk_
πPaper arxiv.org/pdf/2406.01595
πProject eth-ait.github.io/MultiPly
πRepo github.com/eth-ait/MultiPly
πMultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπ
πReview https://t.ly/_xjk_
πPaper arxiv.org/pdf/2406.01595
πProject eth-ait.github.io/MultiPly
πRepo github.com/eth-ait/MultiPly
π₯14π4π2β€1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉAI and the Everything in the Whole Wide World BenchmarkπΉ
πLast week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a βsuperintelligenceβ.
πReview https://t.ly/isdxM
πNews https://lnkd.in/dFraieZS
πPaper https://lnkd.in/da-7PnVT
πLast week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a βsuperintelligenceβ.
πReview https://t.ly/isdxM
πNews https://lnkd.in/dFraieZS
πPaper https://lnkd.in/da-7PnVT
β€5π2π1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
πFacET: VideoCall Change Your Expressionπ
πColumbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
πReview https://t.ly/qsQmt
πPaper arxiv.org/pdf/2406.00955
πProject facet.cs.columbia.edu/
πRepo (empty) github.com/stellargo/facet
πColumbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
πReview https://t.ly/qsQmt
πPaper arxiv.org/pdf/2406.00955
πProject facet.cs.columbia.edu/
πRepo (empty) github.com/stellargo/facet
π₯8β€1π1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π UA-Track: Uncertainty-Aware MOTπ
πUA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
πReview https://t.ly/RmVSV
πPaper https://arxiv.org/pdf/2406.02147
πProject https://liautoad.github.io/ua-track-website
πUA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
πReview https://t.ly/RmVSV
πPaper https://arxiv.org/pdf/2406.02147
πProject https://liautoad.github.io/ua-track-website
π8β€1π₯1π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ Universal 6D Pose/Tracking π§
πOmni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
πReview https://t.ly/Ywgl1
πPaper arxiv.org/pdf/2406.04316
πProject https://lnkd.in/dHBvenhX
πLib https://lnkd.in/d8Yc-KFh
πOmni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
πReview https://t.ly/Ywgl1
πPaper arxiv.org/pdf/2406.04316
πProject https://lnkd.in/dHBvenhX
πLib https://lnkd.in/d8Yc-KFh
β€12π4π€©2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π SOTA Multi-Garment VTOn Editing π
π#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
πReview https://t.ly/66mLN
πPaper arxiv.org/pdf/2406.04542
πProject https://mmvto.github.io
π#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
πReview https://t.ly/66mLN
πPaper arxiv.org/pdf/2406.04542
πProject https://mmvto.github.io
π4β€3π₯°3π₯1π€―1π±1