๐ฅ YOLOv10 is out ๐ฅ
๐YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0๐
๐Review https://shorturl.at/ZIHBh
๐Paper arxiv.org/pdf/2405.14458
๐Code https://github.com/THU-MIG/yolov10/
๐YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0๐
๐Review https://shorturl.at/ZIHBh
๐Paper arxiv.org/pdf/2405.14458
๐Code https://github.com/THU-MIG/yolov10/
๐ฅ25โค3๐2โก1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธUnsupervised Neuromorphic Motionโ๏ธ
๐The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
๐Review https://t.ly/UZzIZ
๐Paper arxiv.org/pdf/2405.15209
๐Project samiarja.github.io/evairborne
๐Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
๐The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
๐Review https://t.ly/UZzIZ
๐Paper arxiv.org/pdf/2405.15209
๐Project samiarja.github.io/evairborne
๐Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
๐5๐ฅ1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Z.S. Diffusive Segmentation ๐ฆ
๐KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT๐
๐Review https://t.ly/v_64K
๐Paper arxiv.org/pdf/2405.16947
๐Project https://lnkd.in/dcSt4dQx
๐Code https://lnkd.in/dcZfM8F3
๐KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT๐
๐Review https://t.ly/v_64K
๐Paper arxiv.org/pdf/2405.16947
๐Project https://lnkd.in/dcSt4dQx
๐Code https://lnkd.in/dcZfM8F3
๐คฏ4๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชฐ Dynamic Gaussian Fusion via 4D Motion Scaffolds ๐ชฐ
๐MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
๐Review https://t.ly/nSdEL
๐Paper arxiv.org/pdf/2405.17421
๐Code github.com/JiahuiLei/MoSca
๐Project https://lnkd.in/dkjMVcqZ
๐MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
๐Review https://t.ly/nSdEL
๐Paper arxiv.org/pdf/2405.17421
๐Code github.com/JiahuiLei/MoSca
๐Project https://lnkd.in/dkjMVcqZ
๐ฅ6๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคTransformer-based 4D Hands๐งค
๐4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced ๐ข
๐Review https://t.ly/wvG-l
๐Paper arxiv.org/pdf/2405.20330
๐Project 4dhands.github.io/
๐4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced ๐ข
๐Review https://t.ly/wvG-l
๐Paper arxiv.org/pdf/2405.20330
๐Project 4dhands.github.io/
๐ฅ4๐คฏ3โค1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ญNew 2D Landmarks SOTA๐ญ
๐Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced๐ฅน
๐Review https://t.ly/lew9a
๐Paper arxiv.org/pdf/2405.19646
๐Project davidcferman.github.io/FaceLift
๐Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced๐ฅน
๐Review https://t.ly/lew9a
๐Paper arxiv.org/pdf/2405.19646
๐Project davidcferman.github.io/FaceLift
๐ฅ16โค5๐ข5๐2๐ฉ2โก1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ MultiPly: in-the-wild Multi-People ๐ณ
๐MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming๐
๐Review https://t.ly/_xjk_
๐Paper arxiv.org/pdf/2406.01595
๐Project eth-ait.github.io/MultiPly
๐Repo github.com/eth-ait/MultiPly
๐MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming๐
๐Review https://t.ly/_xjk_
๐Paper arxiv.org/pdf/2406.01595
๐Project eth-ait.github.io/MultiPly
๐Repo github.com/eth-ait/MultiPly
๐ฅ14๐4๐2โค1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐นAI and the Everything in the Whole Wide World Benchmark๐น
๐Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a โsuperintelligenceโ.
๐Review https://t.ly/isdxM
๐News https://lnkd.in/dFraieZS
๐Paper https://lnkd.in/da-7PnVT
๐Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a โsuperintelligenceโ.
๐Review https://t.ly/isdxM
๐News https://lnkd.in/dFraieZS
๐Paper https://lnkd.in/da-7PnVT
โค5๐2๐1๐ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐FacET: VideoCall Change Your Expression๐
๐Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
๐Review https://t.ly/qsQmt
๐Paper arxiv.org/pdf/2406.00955
๐Project facet.cs.columbia.edu/
๐Repo (empty) github.com/stellargo/facet
๐Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
๐Review https://t.ly/qsQmt
๐Paper arxiv.org/pdf/2406.00955
๐Project facet.cs.columbia.edu/
๐Repo (empty) github.com/stellargo/facet
๐ฅ8โค1๐1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ UA-Track: Uncertainty-Aware MOT๐
๐UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
๐Review https://t.ly/RmVSV
๐Paper https://arxiv.org/pdf/2406.02147
๐Project https://liautoad.github.io/ua-track-website
๐UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
๐Review https://t.ly/RmVSV
๐Paper https://arxiv.org/pdf/2406.02147
๐Project https://liautoad.github.io/ua-track-website
๐8โค1๐ฅ1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ง Universal 6D Pose/Tracking ๐ง
๐Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
๐Review https://t.ly/Ywgl1
๐Paper arxiv.org/pdf/2406.04316
๐Project https://lnkd.in/dHBvenhX
๐Lib https://lnkd.in/d8Yc-KFh
๐Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
๐Review https://t.ly/Ywgl1
๐Paper arxiv.org/pdf/2406.04316
๐Project https://lnkd.in/dHBvenhX
๐Lib https://lnkd.in/d8Yc-KFh
โค12๐4๐คฉ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Multi-Garment VTOn Editing ๐
๐#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
๐Review https://t.ly/66mLN
๐Paper arxiv.org/pdf/2406.04542
๐Project https://mmvto.github.io
๐#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
๐Review https://t.ly/66mLN
๐Paper arxiv.org/pdf/2406.04542
๐Project https://mmvto.github.io
๐4โค3๐ฅฐ3๐ฅ1๐คฏ1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Kling AI vs. OpenAI Sora ๐
๐Kling: the ultimate Chinese text-to-video model - rival to #OpenAIโs Sora. No papers or tech info to check, but stunning results from the official site.
๐Review https://t.ly/870DQ
๐Paper ???
๐Project https://kling.kuaishou.com/
๐Kling: the ultimate Chinese text-to-video model - rival to #OpenAIโs Sora. No papers or tech info to check, but stunning results from the official site.
๐Review https://t.ly/870DQ
๐Paper ???
๐Project https://kling.kuaishou.com/
๐ฅ6๐3โค1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ MASA: MOT Anything By SAM ๐
๐MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June ๐
๐Review https://t.ly/pKdEV
๐Paper https://lnkd.in/dnjuT7xm
๐Project https://lnkd.in/dYbWzG4E
๐Code https://lnkd.in/dr5BJCXm
๐MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June ๐
๐Review https://t.ly/pKdEV
๐Paper https://lnkd.in/dnjuT7xm
๐Project https://lnkd.in/dYbWzG4E
๐Code https://lnkd.in/dr5BJCXm
๐ฅ16โค4๐3๐2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐น PianoMotion10M for gen-hands ๐น
๐PianoMotion10M: 116 hours of piano playing videos from a birdโs-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset released๐
๐Review https://t.ly/_pKKz
๐Paper arxiv.org/pdf/2406.09326
๐Code https://lnkd.in/dcBP6nvm
๐Project https://lnkd.in/d_YqZk8x
๐Dataset https://lnkd.in/dUPyfNDA
๐PianoMotion10M: 116 hours of piano playing videos from a birdโs-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset released๐
๐Review https://t.ly/_pKKz
๐Paper arxiv.org/pdf/2406.09326
๐Code https://lnkd.in/dcBP6nvm
๐Project https://lnkd.in/d_YqZk8x
๐Dataset https://lnkd.in/dUPyfNDA
โค8๐ฅ4โก1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซMeshPose: DensePose+HMR๐ซ
๐MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.
๐Review https://t.ly/a-5uN
๐Paper arxiv.org/pdf/2406.10180
๐Project https://meshpose.github.io/
๐MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.
๐Review https://t.ly/a-5uN
๐Paper arxiv.org/pdf/2406.10180
๐Project https://meshpose.github.io/
๐ฅ6โค1๐1
lowlight_back_n_forth.gif
1.4 MB
๐ต RobustSAM for Degraded Images ๐ต
๐RobustSAM, the evolution of SAM for degraded images; enhancing the SAMโs performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code released๐
๐Review https://t.ly/mnyyG
๐Paper arxiv.org/pdf/2406.09627
๐Project robustsam.github.io
๐Code github.com/robustsam/RobustSAM
๐RobustSAM, the evolution of SAM for degraded images; enhancing the SAMโs performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code released๐
๐Review https://t.ly/mnyyG
๐Paper arxiv.org/pdf/2406.09627
๐Project robustsam.github.io
๐Code github.com/robustsam/RobustSAM
โค5๐1๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคHOT3D Hand/Object Tracking๐งค
๐#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available ๐
๐Review https://t.ly/cD76F
๐Paper https://lnkd.in/e6_7UNny
๐Data https://lnkd.in/e6P-sQFK
๐#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available ๐
๐Review https://t.ly/cD76F
๐Paper https://lnkd.in/e6_7UNny
๐Data https://lnkd.in/e6P-sQFK
๐ฅ9โค3๐3๐2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Self-driving in wet conditions ๐ฆ
๐BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.
๐Review https://t.ly/8S93j
๐Paper https://lnkd.in/dnN5MCZC
๐Project https://lnkd.in/dkUaxyEF
๐Data https://lnkd.in/ddhkyXv8
๐BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.
๐Review https://t.ly/8S93j
๐Paper https://lnkd.in/dnN5MCZC
๐Project https://lnkd.in/dkUaxyEF
๐Data https://lnkd.in/ddhkyXv8
๐ฅ6โค1๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฑ TokenHMR : new 3D human pose SOTA ๐ฑ
๐TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.
๐Review https://t.ly/K9_8n
๐Paper arxiv.org/pdf/2404.16752
๐Project tokenhmr.is.tue.mpg.de/
๐Code github.com/saidwivedi/TokenHMR
๐TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.
๐Review https://t.ly/K9_8n
๐Paper arxiv.org/pdf/2404.16752
๐Project tokenhmr.is.tue.mpg.de/
๐Code github.com/saidwivedi/TokenHMR
๐คฏ5๐3๐ฑ3โก2โค2๐ฅ1