AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
236 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧀Transformer-based 4D Hands🧀

πŸ‘‰4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced 😒

πŸ‘‰Review https://t.ly/wvG-l
πŸ‘‰Paper arxiv.org/pdf/2405.20330
πŸ‘‰Project 4dhands.github.io/
πŸ”₯4🀯3❀1πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭New 2D Landmarks SOTA🎭

πŸ‘‰Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπŸ₯Ή

πŸ‘‰Review https://t.ly/lew9a
πŸ‘‰Paper arxiv.org/pdf/2405.19646
πŸ‘‰Project davidcferman.github.io/FaceLift
πŸ”₯16❀5😒5πŸ‘2πŸ’©2⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐳 MultiPly: in-the-wild Multi-People 🐳

πŸ‘‰MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/_xjk_
πŸ‘‰Paper arxiv.org/pdf/2406.01595
πŸ‘‰Project eth-ait.github.io/MultiPly
πŸ‘‰Repo github.com/eth-ait/MultiPly
πŸ”₯14πŸ‘4πŸ‘2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ΉAI and the Everything in the Whole Wide World BenchmarkπŸ‘Ή

πŸ‘‰Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a β€œsuperintelligence”.

πŸ‘‰Review https://t.ly/isdxM
πŸ‘‰News https://lnkd.in/dFraieZS
πŸ‘‰Paper https://lnkd.in/da-7PnVT
❀5πŸ‘2πŸ‘1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“žFacET: VideoCall Change Your ExpressionπŸ“ž

πŸ‘‰Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).

πŸ‘‰Review https://t.ly/qsQmt
πŸ‘‰Paper arxiv.org/pdf/2406.00955
πŸ‘‰Project facet.cs.columbia.edu/
πŸ‘‰Repo (empty) github.com/stellargo/facet
πŸ”₯8❀1πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸš™ UA-Track: Uncertainty-Aware MOTπŸš™

πŸ‘‰UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.

πŸ‘‰Review https://t.ly/RmVSV
πŸ‘‰Paper https://arxiv.org/pdf/2406.02147
πŸ‘‰Project https://liautoad.github.io/ua-track-website
πŸ‘8❀1πŸ”₯1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊 Universal 6D Pose/Tracking 🧊

πŸ‘‰Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.

πŸ‘‰Review https://t.ly/Ywgl1
πŸ‘‰Paper arxiv.org/pdf/2406.04316
πŸ‘‰Project https://lnkd.in/dHBvenhX
πŸ‘‰Lib https://lnkd.in/d8Yc-KFh
❀12πŸ‘4🀩2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— SOTA Multi-Garment VTOn Editing πŸ‘—

πŸ‘‰#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!

πŸ‘‰Review https://t.ly/66mLN
πŸ‘‰Paper arxiv.org/pdf/2406.04542
πŸ‘‰Project https://mmvto.github.io
πŸ‘4❀3πŸ₯°3πŸ”₯1🀯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘‘ Kling AI vs. OpenAI Sora πŸ‘‘

πŸ‘‰Kling: the ultimate Chinese text-to-video model - rival to #OpenAI’s Sora. No papers or tech info to check, but stunning results from the official site.

πŸ‘‰Review https://t.ly/870DQ
πŸ‘‰Paper ???
πŸ‘‰Project https://kling.kuaishou.com/
πŸ”₯6πŸ‘3❀1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‰ MASA: MOT Anything By SAM πŸ‰

πŸ‘‰MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June πŸ’™

πŸ‘‰Review https://t.ly/pKdEV
πŸ‘‰Paper https://lnkd.in/dnjuT7xm
πŸ‘‰Project https://lnkd.in/dYbWzG4E
πŸ‘‰Code https://lnkd.in/dr5BJCXm
πŸ”₯16❀4πŸ‘3πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎹 PianoMotion10M for gen-hands 🎹

πŸ‘‰PianoMotion10M: 116 hours of piano playing videos from a bird’s-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset releasedπŸ’™

πŸ‘‰Review https://t.ly/_pKKz
πŸ‘‰Paper arxiv.org/pdf/2406.09326
πŸ‘‰Code https://lnkd.in/dcBP6nvm
πŸ‘‰Project https://lnkd.in/d_YqZk8x
πŸ‘‰Dataset https://lnkd.in/dUPyfNDA
❀8πŸ”₯4⚑1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“«MeshPose: DensePose+HMRπŸ“«

πŸ‘‰MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.

πŸ‘‰Review https://t.ly/a-5uN
πŸ‘‰Paper arxiv.org/pdf/2406.10180
πŸ‘‰Project https://meshpose.github.io/
πŸ”₯6❀1πŸ‘1
lowlight_back_n_forth.gif
1.4 MB
🌡 RobustSAM for Degraded Images 🌡

πŸ‘‰RobustSAM, the evolution of SAM for degraded images; enhancing the SAM’s performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code releasedπŸ’™

πŸ‘‰Review https://t.ly/mnyyG
πŸ‘‰Paper arxiv.org/pdf/2406.09627
πŸ‘‰Project robustsam.github.io
πŸ‘‰Code github.com/robustsam/RobustSAM
❀5πŸ‘1πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀HOT3D Hand/Object Tracking🧀

πŸ‘‰#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available πŸ’™

πŸ‘‰Review https://t.ly/cD76F
πŸ‘‰Paper https://lnkd.in/e6_7UNny
πŸ‘‰Data https://lnkd.in/e6P-sQFK
πŸ”₯9❀3πŸ‘3πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’¦ Self-driving in wet conditions πŸ’¦

πŸ‘‰BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.

πŸ‘‰Review https://t.ly/8S93j
πŸ‘‰Paper https://lnkd.in/dnN5MCZC
πŸ‘‰Project https://lnkd.in/dkUaxyEF
πŸ‘‰Data https://lnkd.in/ddhkyXv8
πŸ”₯6❀1πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱 TokenHMR : new 3D human pose SOTA 🌱

πŸ‘‰TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.

πŸ‘‰Review https://t.ly/K9_8n
πŸ‘‰Paper arxiv.org/pdf/2404.16752
πŸ‘‰Project tokenhmr.is.tue.mpg.de/
πŸ‘‰Code github.com/saidwivedi/TokenHMR
🀯5πŸ‘3😱3⚑2❀2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€“Glasses-Removal in VideosπŸ€“

πŸ‘‰Lightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.

πŸ‘‰Review https://t.ly/Hgs2d
πŸ‘‰Paper arxiv.org/pdf/2406.14510
πŸ‘‰Project https://v-lasik.github.io/
πŸ‘‰Code github.com/v-lasik/v-lasik-code
πŸ’©16❀6🀯5πŸ‘3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🧬Event-driven SuperResolution🧬

πŸ‘‰USTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code availableπŸ’™

πŸ‘‰Review https://t.ly/zlb4c
πŸ‘‰Paper arxiv.org/pdf/2406.13457
πŸ‘‰Code github.com/DachunKai/EvTexture
πŸ‘11❀6🀯4πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
🐻StableNormal: Stable/Sharp Normal🐻

πŸ‘‰Alibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is availableπŸ’™

πŸ‘‰Review https://t.ly/FPJlG
πŸ‘‰Paper https://arxiv.org/pdf/2406.16864
πŸ‘‰Demo https://huggingface.co/Stable-X
πŸ”₯4❀2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍦Geometry Guided Depth🍦

πŸ‘‰Depth and #3D reconstruction which can take as input, where available, previously-made estimates of the scene’s geometry

πŸ‘‰Review https://lnkd.in/dMgakzWm
πŸ‘‰Paper https://arxiv.org/pdf/2406.18387
πŸ‘‰Repo (empty) https://github.com/nianticlabs/DoubleTake
πŸ‘7πŸ”₯7❀1πŸ₯°1