This media is not supported in your browser
VIEW IN TELEGRAM
π§€Transformer-based 4D Handsπ§€
π4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced π’
πReview https://t.ly/wvG-l
πPaper arxiv.org/pdf/2405.20330
πProject 4dhands.github.io/
π4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced π’
πReview https://t.ly/wvG-l
πPaper arxiv.org/pdf/2405.20330
πProject 4dhands.github.io/
π₯4π€―3β€1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πNew 2D Landmarks SOTAπ
πFlawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπ₯Ή
πReview https://t.ly/lew9a
πPaper arxiv.org/pdf/2405.19646
πProject davidcferman.github.io/FaceLift
πFlawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπ₯Ή
πReview https://t.ly/lew9a
πPaper arxiv.org/pdf/2405.19646
πProject davidcferman.github.io/FaceLift
π₯16β€5π’5π2π©2β‘1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π³ MultiPly: in-the-wild Multi-People π³
πMultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπ
πReview https://t.ly/_xjk_
πPaper arxiv.org/pdf/2406.01595
πProject eth-ait.github.io/MultiPly
πRepo github.com/eth-ait/MultiPly
πMultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπ
πReview https://t.ly/_xjk_
πPaper arxiv.org/pdf/2406.01595
πProject eth-ait.github.io/MultiPly
πRepo github.com/eth-ait/MultiPly
π₯14π4π2β€1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉAI and the Everything in the Whole Wide World BenchmarkπΉ
πLast week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a βsuperintelligenceβ.
πReview https://t.ly/isdxM
πNews https://lnkd.in/dFraieZS
πPaper https://lnkd.in/da-7PnVT
πLast week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a βsuperintelligenceβ.
πReview https://t.ly/isdxM
πNews https://lnkd.in/dFraieZS
πPaper https://lnkd.in/da-7PnVT
β€5π2π1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
πFacET: VideoCall Change Your Expressionπ
πColumbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
πReview https://t.ly/qsQmt
πPaper arxiv.org/pdf/2406.00955
πProject facet.cs.columbia.edu/
πRepo (empty) github.com/stellargo/facet
πColumbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
πReview https://t.ly/qsQmt
πPaper arxiv.org/pdf/2406.00955
πProject facet.cs.columbia.edu/
πRepo (empty) github.com/stellargo/facet
π₯8β€1π1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π UA-Track: Uncertainty-Aware MOTπ
πUA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
πReview https://t.ly/RmVSV
πPaper https://arxiv.org/pdf/2406.02147
πProject https://liautoad.github.io/ua-track-website
πUA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
πReview https://t.ly/RmVSV
πPaper https://arxiv.org/pdf/2406.02147
πProject https://liautoad.github.io/ua-track-website
π8β€1π₯1π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ Universal 6D Pose/Tracking π§
πOmni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
πReview https://t.ly/Ywgl1
πPaper arxiv.org/pdf/2406.04316
πProject https://lnkd.in/dHBvenhX
πLib https://lnkd.in/d8Yc-KFh
πOmni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
πReview https://t.ly/Ywgl1
πPaper arxiv.org/pdf/2406.04316
πProject https://lnkd.in/dHBvenhX
πLib https://lnkd.in/d8Yc-KFh
β€12π4π€©2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π SOTA Multi-Garment VTOn Editing π
π#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
πReview https://t.ly/66mLN
πPaper arxiv.org/pdf/2406.04542
πProject https://mmvto.github.io
π#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
πReview https://t.ly/66mLN
πPaper arxiv.org/pdf/2406.04542
πProject https://mmvto.github.io
π4β€3π₯°3π₯1π€―1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π Kling AI vs. OpenAI Sora π
πKling: the ultimate Chinese text-to-video model - rival to #OpenAIβs Sora. No papers or tech info to check, but stunning results from the official site.
πReview https://t.ly/870DQ
πPaper ???
πProject https://kling.kuaishou.com/
πKling: the ultimate Chinese text-to-video model - rival to #OpenAIβs Sora. No papers or tech info to check, but stunning results from the official site.
πReview https://t.ly/870DQ
πPaper ???
πProject https://kling.kuaishou.com/
π₯6π3β€1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π MASA: MOT Anything By SAM π
πMASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June π
πReview https://t.ly/pKdEV
πPaper https://lnkd.in/dnjuT7xm
πProject https://lnkd.in/dYbWzG4E
πCode https://lnkd.in/dr5BJCXm
πMASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June π
πReview https://t.ly/pKdEV
πPaper https://lnkd.in/dnjuT7xm
πProject https://lnkd.in/dYbWzG4E
πCode https://lnkd.in/dr5BJCXm
π₯16β€4π3π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉ PianoMotion10M for gen-hands πΉ
πPianoMotion10M: 116 hours of piano playing videos from a birdβs-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset releasedπ
πReview https://t.ly/_pKKz
πPaper arxiv.org/pdf/2406.09326
πCode https://lnkd.in/dcBP6nvm
πProject https://lnkd.in/d_YqZk8x
πDataset https://lnkd.in/dUPyfNDA
πPianoMotion10M: 116 hours of piano playing videos from a birdβs-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset releasedπ
πReview https://t.ly/_pKKz
πPaper arxiv.org/pdf/2406.09326
πCode https://lnkd.in/dcBP6nvm
πProject https://lnkd.in/d_YqZk8x
πDataset https://lnkd.in/dUPyfNDA
β€8π₯4β‘1π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«MeshPose: DensePose+HMRπ«
πMeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.
πReview https://t.ly/a-5uN
πPaper arxiv.org/pdf/2406.10180
πProject https://meshpose.github.io/
πMeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.
πReview https://t.ly/a-5uN
πPaper arxiv.org/pdf/2406.10180
πProject https://meshpose.github.io/
π₯6β€1π1
lowlight_back_n_forth.gif
1.4 MB
π΅ RobustSAM for Degraded Images π΅
πRobustSAM, the evolution of SAM for degraded images; enhancing the SAMβs performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code releasedπ
πReview https://t.ly/mnyyG
πPaper arxiv.org/pdf/2406.09627
πProject robustsam.github.io
πCode github.com/robustsam/RobustSAM
πRobustSAM, the evolution of SAM for degraded images; enhancing the SAMβs performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code releasedπ
πReview https://t.ly/mnyyG
πPaper arxiv.org/pdf/2406.09627
πProject robustsam.github.io
πCode github.com/robustsam/RobustSAM
β€5π1π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€HOT3D Hand/Object Trackingπ§€
π#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available π
πReview https://t.ly/cD76F
πPaper https://lnkd.in/e6_7UNny
πData https://lnkd.in/e6P-sQFK
π#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available π
πReview https://t.ly/cD76F
πPaper https://lnkd.in/e6_7UNny
πData https://lnkd.in/e6P-sQFK
π₯9β€3π3π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Self-driving in wet conditions π¦
πBMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.
πReview https://t.ly/8S93j
πPaper https://lnkd.in/dnN5MCZC
πProject https://lnkd.in/dkUaxyEF
πData https://lnkd.in/ddhkyXv8
πBMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.
πReview https://t.ly/8S93j
πPaper https://lnkd.in/dnN5MCZC
πProject https://lnkd.in/dkUaxyEF
πData https://lnkd.in/ddhkyXv8
π₯6β€1π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π± TokenHMR : new 3D human pose SOTA π±
πTokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.
πReview https://t.ly/K9_8n
πPaper arxiv.org/pdf/2404.16752
πProject tokenhmr.is.tue.mpg.de/
πCode github.com/saidwivedi/TokenHMR
πTokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.
πReview https://t.ly/K9_8n
πPaper arxiv.org/pdf/2404.16752
πProject tokenhmr.is.tue.mpg.de/
πCode github.com/saidwivedi/TokenHMR
π€―5π3π±3β‘2β€2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Glasses-Removal in Videosπ€
πLightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.
πReview https://t.ly/Hgs2d
πPaper arxiv.org/pdf/2406.14510
πProject https://v-lasik.github.io/
πCode github.com/v-lasik/v-lasik-code
πLightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.
πReview https://t.ly/Hgs2d
πPaper arxiv.org/pdf/2406.14510
πProject https://v-lasik.github.io/
πCode github.com/v-lasik/v-lasik-code
π©16β€6π€―5π3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π§¬Event-driven SuperResolutionπ§¬
πUSTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code availableπ
πReview https://t.ly/zlb4c
πPaper arxiv.org/pdf/2406.13457
πCode github.com/DachunKai/EvTexture
πUSTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code availableπ
πReview https://t.ly/zlb4c
πPaper arxiv.org/pdf/2406.13457
πCode github.com/DachunKai/EvTexture
π11β€6π€―4π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
π»StableNormal: Stable/Sharp Normalπ»
πAlibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is availableπ
πReview https://t.ly/FPJlG
πPaper https://arxiv.org/pdf/2406.16864
πDemo https://huggingface.co/Stable-X
πAlibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is availableπ
πReview https://t.ly/FPJlG
πPaper https://arxiv.org/pdf/2406.16864
πDemo https://huggingface.co/Stable-X
π₯4β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Geometry Guided Depthπ¦
πDepth and #3D reconstruction which can take as input, where available, previously-made estimates of the sceneβs geometry
πReview https://lnkd.in/dMgakzWm
πPaper https://arxiv.org/pdf/2406.18387
πRepo (empty) https://github.com/nianticlabs/DoubleTake
πDepth and #3D reconstruction which can take as input, where available, previously-made estimates of the sceneβs geometry
πReview https://lnkd.in/dMgakzWm
πPaper https://arxiv.org/pdf/2406.18387
πRepo (empty) https://github.com/nianticlabs/DoubleTake
π7π₯7β€1π₯°1