This media is not supported in your browser
VIEW IN TELEGRAM
๐นAI and the Everything in the Whole Wide World Benchmark๐น
๐Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a โsuperintelligenceโ.
๐Review https://t.ly/isdxM
๐News https://lnkd.in/dFraieZS
๐Paper https://lnkd.in/da-7PnVT
๐Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a โsuperintelligenceโ.
๐Review https://t.ly/isdxM
๐News https://lnkd.in/dFraieZS
๐Paper https://lnkd.in/da-7PnVT
โค5๐2๐1๐ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐FacET: VideoCall Change Your Expression๐
๐Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
๐Review https://t.ly/qsQmt
๐Paper arxiv.org/pdf/2406.00955
๐Project facet.cs.columbia.edu/
๐Repo (empty) github.com/stellargo/facet
๐Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
๐Review https://t.ly/qsQmt
๐Paper arxiv.org/pdf/2406.00955
๐Project facet.cs.columbia.edu/
๐Repo (empty) github.com/stellargo/facet
๐ฅ8โค1๐1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ UA-Track: Uncertainty-Aware MOT๐
๐UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
๐Review https://t.ly/RmVSV
๐Paper https://arxiv.org/pdf/2406.02147
๐Project https://liautoad.github.io/ua-track-website
๐UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
๐Review https://t.ly/RmVSV
๐Paper https://arxiv.org/pdf/2406.02147
๐Project https://liautoad.github.io/ua-track-website
๐8โค1๐ฅ1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ง Universal 6D Pose/Tracking ๐ง
๐Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
๐Review https://t.ly/Ywgl1
๐Paper arxiv.org/pdf/2406.04316
๐Project https://lnkd.in/dHBvenhX
๐Lib https://lnkd.in/d8Yc-KFh
๐Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
๐Review https://t.ly/Ywgl1
๐Paper arxiv.org/pdf/2406.04316
๐Project https://lnkd.in/dHBvenhX
๐Lib https://lnkd.in/d8Yc-KFh
โค12๐4๐คฉ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ SOTA Multi-Garment VTOn Editing ๐
๐#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
๐Review https://t.ly/66mLN
๐Paper arxiv.org/pdf/2406.04542
๐Project https://mmvto.github.io
๐#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
๐Review https://t.ly/66mLN
๐Paper arxiv.org/pdf/2406.04542
๐Project https://mmvto.github.io
๐4โค3๐ฅฐ3๐ฅ1๐คฏ1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Kling AI vs. OpenAI Sora ๐
๐Kling: the ultimate Chinese text-to-video model - rival to #OpenAIโs Sora. No papers or tech info to check, but stunning results from the official site.
๐Review https://t.ly/870DQ
๐Paper ???
๐Project https://kling.kuaishou.com/
๐Kling: the ultimate Chinese text-to-video model - rival to #OpenAIโs Sora. No papers or tech info to check, but stunning results from the official site.
๐Review https://t.ly/870DQ
๐Paper ???
๐Project https://kling.kuaishou.com/
๐ฅ6๐3โค1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ MASA: MOT Anything By SAM ๐
๐MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June ๐
๐Review https://t.ly/pKdEV
๐Paper https://lnkd.in/dnjuT7xm
๐Project https://lnkd.in/dYbWzG4E
๐Code https://lnkd.in/dr5BJCXm
๐MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June ๐
๐Review https://t.ly/pKdEV
๐Paper https://lnkd.in/dnjuT7xm
๐Project https://lnkd.in/dYbWzG4E
๐Code https://lnkd.in/dr5BJCXm
๐ฅ16โค4๐3๐2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐น PianoMotion10M for gen-hands ๐น
๐PianoMotion10M: 116 hours of piano playing videos from a birdโs-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset released๐
๐Review https://t.ly/_pKKz
๐Paper arxiv.org/pdf/2406.09326
๐Code https://lnkd.in/dcBP6nvm
๐Project https://lnkd.in/d_YqZk8x
๐Dataset https://lnkd.in/dUPyfNDA
๐PianoMotion10M: 116 hours of piano playing videos from a birdโs-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset released๐
๐Review https://t.ly/_pKKz
๐Paper arxiv.org/pdf/2406.09326
๐Code https://lnkd.in/dcBP6nvm
๐Project https://lnkd.in/d_YqZk8x
๐Dataset https://lnkd.in/dUPyfNDA
โค8๐ฅ4โก1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซMeshPose: DensePose+HMR๐ซ
๐MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.
๐Review https://t.ly/a-5uN
๐Paper arxiv.org/pdf/2406.10180
๐Project https://meshpose.github.io/
๐MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.
๐Review https://t.ly/a-5uN
๐Paper arxiv.org/pdf/2406.10180
๐Project https://meshpose.github.io/
๐ฅ6โค1๐1
lowlight_back_n_forth.gif
1.4 MB
๐ต RobustSAM for Degraded Images ๐ต
๐RobustSAM, the evolution of SAM for degraded images; enhancing the SAMโs performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code released๐
๐Review https://t.ly/mnyyG
๐Paper arxiv.org/pdf/2406.09627
๐Project robustsam.github.io
๐Code github.com/robustsam/RobustSAM
๐RobustSAM, the evolution of SAM for degraded images; enhancing the SAMโs performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code released๐
๐Review https://t.ly/mnyyG
๐Paper arxiv.org/pdf/2406.09627
๐Project robustsam.github.io
๐Code github.com/robustsam/RobustSAM
โค5๐1๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคHOT3D Hand/Object Tracking๐งค
๐#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available ๐
๐Review https://t.ly/cD76F
๐Paper https://lnkd.in/e6_7UNny
๐Data https://lnkd.in/e6P-sQFK
๐#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available ๐
๐Review https://t.ly/cD76F
๐Paper https://lnkd.in/e6_7UNny
๐Data https://lnkd.in/e6P-sQFK
๐ฅ9โค3๐3๐2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Self-driving in wet conditions ๐ฆ
๐BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.
๐Review https://t.ly/8S93j
๐Paper https://lnkd.in/dnN5MCZC
๐Project https://lnkd.in/dkUaxyEF
๐Data https://lnkd.in/ddhkyXv8
๐BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.
๐Review https://t.ly/8S93j
๐Paper https://lnkd.in/dnN5MCZC
๐Project https://lnkd.in/dkUaxyEF
๐Data https://lnkd.in/ddhkyXv8
๐ฅ6โค1๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฑ TokenHMR : new 3D human pose SOTA ๐ฑ
๐TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.
๐Review https://t.ly/K9_8n
๐Paper arxiv.org/pdf/2404.16752
๐Project tokenhmr.is.tue.mpg.de/
๐Code github.com/saidwivedi/TokenHMR
๐TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.
๐Review https://t.ly/K9_8n
๐Paper arxiv.org/pdf/2404.16752
๐Project tokenhmr.is.tue.mpg.de/
๐Code github.com/saidwivedi/TokenHMR
๐คฏ5๐3๐ฑ3โก2โค2๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐คGlasses-Removal in Videos๐ค
๐Lightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.
๐Review https://t.ly/Hgs2d
๐Paper arxiv.org/pdf/2406.14510
๐Project https://v-lasik.github.io/
๐Code github.com/v-lasik/v-lasik-code
๐Lightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.
๐Review https://t.ly/Hgs2d
๐Paper arxiv.org/pdf/2406.14510
๐Project https://v-lasik.github.io/
๐Code github.com/v-lasik/v-lasik-code
๐ฉ16โค6๐คฏ5๐3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฌEvent-driven SuperResolution๐งฌ
๐USTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code available๐
๐Review https://t.ly/zlb4c
๐Paper arxiv.org/pdf/2406.13457
๐Code github.com/DachunKai/EvTexture
๐USTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code available๐
๐Review https://t.ly/zlb4c
๐Paper arxiv.org/pdf/2406.13457
๐Code github.com/DachunKai/EvTexture
๐11โค6๐คฏ4๐ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ปStableNormal: Stable/Sharp Normal๐ป
๐Alibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is available๐
๐Review https://t.ly/FPJlG
๐Paper https://arxiv.org/pdf/2406.16864
๐Demo https://huggingface.co/Stable-X
๐Alibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is available๐
๐Review https://t.ly/FPJlG
๐Paper https://arxiv.org/pdf/2406.16864
๐Demo https://huggingface.co/Stable-X
๐ฅ4โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆGeometry Guided Depth๐ฆ
๐Depth and #3D reconstruction which can take as input, where available, previously-made estimates of the sceneโs geometry
๐Review https://lnkd.in/dMgakzWm
๐Paper https://arxiv.org/pdf/2406.18387
๐Repo (empty) https://github.com/nianticlabs/DoubleTake
๐Depth and #3D reconstruction which can take as input, where available, previously-made estimates of the sceneโs geometry
๐Review https://lnkd.in/dMgakzWm
๐Paper https://arxiv.org/pdf/2406.18387
๐Repo (empty) https://github.com/nianticlabs/DoubleTake
๐7๐ฅ7โค1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎMeshAnything with Transformers๐ฎ
๐MeshAnything converts any 3D representation into Artist-Created Meshes (AMs), i.e., meshes created by human artists. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to transform their results into AMs that can be seamlessly applied in the 3D industry. Source Code available๐
๐Review https://t.ly/HvkD4
๐Paper arxiv.org/pdf/2406.10163
๐Code github.com/buaacyw/MeshAnything
๐MeshAnything converts any 3D representation into Artist-Created Meshes (AMs), i.e., meshes created by human artists. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to transform their results into AMs that can be seamlessly applied in the 3D industry. Source Code available๐
๐Review https://t.ly/HvkD4
๐Paper arxiv.org/pdf/2406.10163
๐Code github.com/buaacyw/MeshAnything
๐คฏ11โค10๐ฅ5๐4๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐พLLaNA: NeRF-LLM assistant๐พ
๐UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.
๐Review https://t.ly/JAfhV
๐Paper arxiv.org/pdf/2406.11840
๐Project andreamaduzzi.github.io/llana/
๐Code & Data coming
๐UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.
๐Review https://t.ly/JAfhV
๐Paper arxiv.org/pdf/2406.11840
๐Project andreamaduzzi.github.io/llana/
๐Code & Data coming
โค16๐ฅ2๐2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Depth Anything v2 is out! ๐ฅ
๐ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available๐
๐Review https://t.ly/QX9Nu
๐Paper arxiv.org/pdf/2406.09414
๐Project depth-anything-v2.github.io/
๐Repo github.com/DepthAnything/Depth-Anything-V2
๐Data huggingface.co/datasets/depth-anything/DA-2K
๐ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available๐
๐Review https://t.ly/QX9Nu
๐Paper arxiv.org/pdf/2406.09414
๐Project depth-anything-v2.github.io/
๐Repo github.com/DepthAnything/Depth-Anything-V2
๐Data huggingface.co/datasets/depth-anything/DA-2K
๐ฅ10๐คฏ9โก1โค1๐1๐ฅฐ1๐1