๐ฆ Hyper-Detailed Image Descriptions ๐ฆ
๐#Google unveils ImageInWords (IIW), a carefully designed HIL annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process
๐Review https://t.ly/engkl
๐Paper arxiv.org/pdf/2405.02793
๐Repo github.com/google/imageinwords
๐Project google.github.io/imageinwords
๐Data huggingface.co/datasets/google/imageinwords
๐#Google unveils ImageInWords (IIW), a carefully designed HIL annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process
๐Review https://t.ly/engkl
๐Paper arxiv.org/pdf/2405.02793
๐Repo github.com/google/imageinwords
๐Project google.github.io/imageinwords
๐Data huggingface.co/datasets/google/imageinwords
โค11๐ฅ3๐2๐คฏ2๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ Free-Moving Reconstruction ๐ซ
๐EPFL (+#MagicLeap) unveils a novel approach for reconstructing free-moving object from monocular RGB clip. Free interaction with objects in front of a moving cam without relying on any prior, and optimizes the sequence globally without any segments. Great but no code announced๐ฅบ
๐Review https://t.ly/2xhtj
๐Paper arxiv.org/pdf/2405.05858
๐Project haixinshi.github.io/fmov/
๐EPFL (+#MagicLeap) unveils a novel approach for reconstructing free-moving object from monocular RGB clip. Free interaction with objects in front of a moving cam without relying on any prior, and optimizes the sequence globally without any segments. Great but no code announced๐ฅบ
๐Review https://t.ly/2xhtj
๐Paper arxiv.org/pdf/2405.05858
๐Project haixinshi.github.io/fmov/
๐6๐คฏ4โก1โค1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅFeatUp: Any Model at Any Resolution๐ฅ
๐FeatUp is a task-model agnostic framework to restore lost spatial information in deep features. It outperforms other methods in class activation map generation, transfer learning for segmentation & depth, and end-to-end training for semantic segm. Source Code released๐
๐Review https://t.ly/Evq_g
๐Paper https://lnkd.in/gweaN4s6
๐Project https://lnkd.in/gWcGXdxt
๐Code https://lnkd.in/gweq5NY4
๐FeatUp is a task-model agnostic framework to restore lost spatial information in deep features. It outperforms other methods in class activation map generation, transfer learning for segmentation & depth, and end-to-end training for semantic segm. Source Code released๐
๐Review https://t.ly/Evq_g
๐Paper https://lnkd.in/gweaN4s6
๐Project https://lnkd.in/gWcGXdxt
๐Code https://lnkd.in/gweq5NY4
๐ฅ19โค4๐3๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐AniTalker: Universal Talking Humans๐
๐SJTU (+AISpeech) unveils AniTalker, a framework that transforms a single static portrait and input audio into animated talking videos with naturally flowing movements.
๐Review https://t.ly/MD4yX
๐Paper https://arxiv.org/pdf/2405.03121
๐Project https://x-lance.github.io/AniTalker/
๐Repo https://github.com/X-LANCE/AniTalker
๐SJTU (+AISpeech) unveils AniTalker, a framework that transforms a single static portrait and input audio into animated talking videos with naturally flowing movements.
๐Review https://t.ly/MD4yX
๐Paper https://arxiv.org/pdf/2405.03121
๐Project https://x-lance.github.io/AniTalker/
๐Repo https://github.com/X-LANCE/AniTalker
๐ฅ6โค4๐2โก1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ป 3D Humans Motion from Text ๐ป
๐Zhejiang (+ANT) unveils a novel method to generate human motions containing accurate human-object interactions in 3D scenes based on textural descriptions. Code announced, coming ๐
๐Review https://t.ly/eOZnU
๐Paper https://arxiv.org/pdf/2405.07784
๐Project https://zju3dv.github.io/text_scene_motion/
๐Zhejiang (+ANT) unveils a novel method to generate human motions containing accurate human-object interactions in 3D scenes based on textural descriptions. Code announced, coming ๐
๐Review https://t.ly/eOZnU
๐Paper https://arxiv.org/pdf/2405.07784
๐Project https://zju3dv.github.io/text_scene_motion/
๐3๐ฅ2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชฌUHM: Authentic Hand by Phone๐ชฌ
๐ META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released ๐
๐Review https://t.ly/fU5rA
๐Paper https://lnkd.in/dyGaiAnq
๐Code https://lnkd.in/d9B_XFAA
๐ META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released ๐
๐Review https://t.ly/fU5rA
๐Paper https://lnkd.in/dyGaiAnq
๐Code https://lnkd.in/d9B_XFAA
๐4โค1๐ฅ1๐คฏ1
๐ฅEfficientTrain++: Efficient Foundation Visual Backbone Training๐ฅ
๐Tsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0ร faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released ๐
๐Review https://t.ly/D8ttv
๐Paper https://arxiv.org/pdf/2405.08768
๐Code https://github.com/LeapLabTHU/EfficientTrain
๐Tsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0ร faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released ๐
๐Review https://t.ly/D8ttv
๐Paper https://arxiv.org/pdf/2405.08768
๐Code https://github.com/LeapLabTHU/EfficientTrain
๐9๐ฅ3๐คฏ3โค2๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ EchoTracker: Tracking Echocardiography๐ซ
๐EchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code released๐
๐Review https://t.ly/NyBe0
๐Paper https://arxiv.org/pdf/2405.08587
๐Code https://github.com/riponazad/echotracker/
๐EchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code released๐
๐Review https://t.ly/NyBe0
๐Paper https://arxiv.org/pdf/2405.08587
๐Code https://github.com/riponazad/echotracker/
โค15๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Grounding DINO 1.5 Pro/Edge ๐ฆ
๐Grounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0๐
๐Review https://t.ly/kS-og
๐Paper https://lnkd.in/dNakMge2
๐Code https://lnkd.in/djhnQmrm
๐Grounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0๐
๐Review https://t.ly/kS-og
๐Paper https://lnkd.in/dNakMge2
๐Code https://lnkd.in/djhnQmrm
๐ฅ22โค1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
โฝ3D Shot Posture in Broadcastโฝ
๐Nagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.
๐Review https://t.ly/IIMeZ
๐Paper https://arxiv.org/pdf/2405.12070
๐Code https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
๐Nagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.
๐Review https://t.ly/IIMeZ
๐Paper https://arxiv.org/pdf/2405.12070
๐Code https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
๐ฅ8๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ผ๏ธ Diffusive Images that Sound ๐ผ๏ธ
๐The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.
๐Review https://t.ly/ADtYM
๐Paper arxiv.org/pdf/2405.12221
๐Project ificl.github.io/images-that-sound
๐Code github.com/IFICL/images-that-sound
๐The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.
๐Review https://t.ly/ADtYM
๐Paper arxiv.org/pdf/2405.12221
๐Project ificl.github.io/images-that-sound
๐Code github.com/IFICL/images-that-sound
๐คฏ11โค5๐5๐ฅ4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ViViD: Diffusion VTON๐
๐ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet๐ข
๐Review https://t.ly/h_SyP
๐Paper arxiv.org/pdf/2405.11794
๐Repo https://lnkd.in/dT4_bzPw
๐Project https://lnkd.in/dCK5ug4v
๐ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet๐ข
๐Review https://t.ly/h_SyP
๐Paper arxiv.org/pdf/2405.11794
๐Repo https://lnkd.in/dT4_bzPw
๐Project https://lnkd.in/dCK5ug4v
๐ฅ13๐คฉ3โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐OmniGlue: Foundation Matcher๐
๐#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!
๐Review https://t.ly/ezaIc
๐Paper https://arxiv.org/pdf/2405.12979
๐Project hwjiang1510.github.io/OmniGlue/
๐Code https://github.com/google-research/omniglue/
๐#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!
๐Review https://t.ly/ezaIc
๐Paper https://arxiv.org/pdf/2405.12979
๐Project hwjiang1510.github.io/OmniGlue/
๐Code https://github.com/google-research/omniglue/
๐คฏ10โค6๐2๐1
๐ฅ YOLOv10 is out ๐ฅ
๐YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0๐
๐Review https://shorturl.at/ZIHBh
๐Paper arxiv.org/pdf/2405.14458
๐Code https://github.com/THU-MIG/yolov10/
๐YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0๐
๐Review https://shorturl.at/ZIHBh
๐Paper arxiv.org/pdf/2405.14458
๐Code https://github.com/THU-MIG/yolov10/
๐ฅ25โค3๐2โก1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธUnsupervised Neuromorphic Motionโ๏ธ
๐The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
๐Review https://t.ly/UZzIZ
๐Paper arxiv.org/pdf/2405.15209
๐Project samiarja.github.io/evairborne
๐Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
๐The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
๐Review https://t.ly/UZzIZ
๐Paper arxiv.org/pdf/2405.15209
๐Project samiarja.github.io/evairborne
๐Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
๐5๐ฅ1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Z.S. Diffusive Segmentation ๐ฆ
๐KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT๐
๐Review https://t.ly/v_64K
๐Paper arxiv.org/pdf/2405.16947
๐Project https://lnkd.in/dcSt4dQx
๐Code https://lnkd.in/dcZfM8F3
๐KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT๐
๐Review https://t.ly/v_64K
๐Paper arxiv.org/pdf/2405.16947
๐Project https://lnkd.in/dcSt4dQx
๐Code https://lnkd.in/dcZfM8F3
๐คฏ4๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชฐ Dynamic Gaussian Fusion via 4D Motion Scaffolds ๐ชฐ
๐MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
๐Review https://t.ly/nSdEL
๐Paper arxiv.org/pdf/2405.17421
๐Code github.com/JiahuiLei/MoSca
๐Project https://lnkd.in/dkjMVcqZ
๐MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
๐Review https://t.ly/nSdEL
๐Paper arxiv.org/pdf/2405.17421
๐Code github.com/JiahuiLei/MoSca
๐Project https://lnkd.in/dkjMVcqZ
๐ฅ6๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคTransformer-based 4D Hands๐งค
๐4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced ๐ข
๐Review https://t.ly/wvG-l
๐Paper arxiv.org/pdf/2405.20330
๐Project 4dhands.github.io/
๐4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced ๐ข
๐Review https://t.ly/wvG-l
๐Paper arxiv.org/pdf/2405.20330
๐Project 4dhands.github.io/
๐ฅ4๐คฏ3โค1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ญNew 2D Landmarks SOTA๐ญ
๐Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced๐ฅน
๐Review https://t.ly/lew9a
๐Paper arxiv.org/pdf/2405.19646
๐Project davidcferman.github.io/FaceLift
๐Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced๐ฅน
๐Review https://t.ly/lew9a
๐Paper arxiv.org/pdf/2405.19646
๐Project davidcferman.github.io/FaceLift
๐ฅ16โค5๐ข5๐2๐ฉ2โก1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ MultiPly: in-the-wild Multi-People ๐ณ
๐MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming๐
๐Review https://t.ly/_xjk_
๐Paper arxiv.org/pdf/2406.01595
๐Project eth-ait.github.io/MultiPly
๐Repo github.com/eth-ait/MultiPly
๐MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming๐
๐Review https://t.ly/_xjk_
๐Paper arxiv.org/pdf/2406.01595
๐Project eth-ait.github.io/MultiPly
๐Repo github.com/eth-ait/MultiPly
๐ฅ14๐4๐2โค1๐คฏ1