AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
239 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ–ΌοΈ Diffusive Images that Sound πŸ–ΌοΈ

πŸ‘‰The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.

πŸ‘‰Review https://t.ly/ADtYM
πŸ‘‰Paper arxiv.org/pdf/2405.12221
πŸ‘‰Project ificl.github.io/images-that-sound
πŸ‘‰Code github.com/IFICL/images-that-sound
🀯11❀5😍5πŸ”₯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘šViViD: Diffusion VTONπŸ‘š

πŸ‘‰ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet😒

πŸ‘‰Review https://t.ly/h_SyP
πŸ‘‰Paper arxiv.org/pdf/2405.11794
πŸ‘‰Repo https://lnkd.in/dT4_bzPw
πŸ‘‰Project https://lnkd.in/dCK5ug4v
πŸ”₯13🀩3❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€OmniGlue: Foundation MatcherπŸ€

πŸ‘‰#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!

πŸ‘‰Review https://t.ly/ezaIc
πŸ‘‰Paper https://arxiv.org/pdf/2405.12979
πŸ‘‰Project hwjiang1510.github.io/OmniGlue/
πŸ‘‰Code https://github.com/google-research/omniglue/
🀯10❀6πŸ‘2πŸ‘1
πŸ”₯ YOLOv10 is out πŸ”₯

πŸ‘‰YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0πŸ’™

πŸ‘‰Review https://shorturl.at/ZIHBh
πŸ‘‰Paper arxiv.org/pdf/2405.14458
πŸ‘‰Code https://github.com/THU-MIG/yolov10/
πŸ”₯25❀3πŸ‘2⚑1
This media is not supported in your browser
VIEW IN TELEGRAM
β›ˆοΈUnsupervised Neuromorphic Motionβ›ˆοΈ

πŸ‘‰The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.

πŸ‘‰Review https://t.ly/UZzIZ
πŸ‘‰Paper arxiv.org/pdf/2405.15209
πŸ‘‰Project samiarja.github.io/evairborne
πŸ‘‰Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
πŸ‘5πŸ”₯1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦“ Z.S. Diffusive Segmentation πŸ¦“

πŸ‘‰KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MITπŸ’™

πŸ‘‰Review https://t.ly/v_64K
πŸ‘‰Paper arxiv.org/pdf/2405.16947
πŸ‘‰Project https://lnkd.in/dcSt4dQx
πŸ‘‰Code https://lnkd.in/dcZfM8F3
🀯4πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ° Dynamic Gaussian Fusion via 4D Motion Scaffolds πŸͺ°

πŸ‘‰MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!

πŸ‘‰Review https://t.ly/nSdEL
πŸ‘‰Paper arxiv.org/pdf/2405.17421
πŸ‘‰Code github.com/JiahuiLei/MoSca
πŸ‘‰Project https://lnkd.in/dkjMVcqZ
πŸ”₯6πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀Transformer-based 4D Hands🧀

πŸ‘‰4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced 😒

πŸ‘‰Review https://t.ly/wvG-l
πŸ‘‰Paper arxiv.org/pdf/2405.20330
πŸ‘‰Project 4dhands.github.io/
πŸ”₯4🀯3❀1πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭New 2D Landmarks SOTA🎭

πŸ‘‰Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπŸ₯Ή

πŸ‘‰Review https://t.ly/lew9a
πŸ‘‰Paper arxiv.org/pdf/2405.19646
πŸ‘‰Project davidcferman.github.io/FaceLift
πŸ”₯16❀5😒5πŸ‘2πŸ’©2⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐳 MultiPly: in-the-wild Multi-People 🐳

πŸ‘‰MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/_xjk_
πŸ‘‰Paper arxiv.org/pdf/2406.01595
πŸ‘‰Project eth-ait.github.io/MultiPly
πŸ‘‰Repo github.com/eth-ait/MultiPly
πŸ”₯14πŸ‘4πŸ‘2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ΉAI and the Everything in the Whole Wide World BenchmarkπŸ‘Ή

πŸ‘‰Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a β€œsuperintelligence”.

πŸ‘‰Review https://t.ly/isdxM
πŸ‘‰News https://lnkd.in/dFraieZS
πŸ‘‰Paper https://lnkd.in/da-7PnVT
❀5πŸ‘2πŸ‘1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“žFacET: VideoCall Change Your ExpressionπŸ“ž

πŸ‘‰Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).

πŸ‘‰Review https://t.ly/qsQmt
πŸ‘‰Paper arxiv.org/pdf/2406.00955
πŸ‘‰Project facet.cs.columbia.edu/
πŸ‘‰Repo (empty) github.com/stellargo/facet
πŸ”₯8❀1πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸš™ UA-Track: Uncertainty-Aware MOTπŸš™

πŸ‘‰UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.

πŸ‘‰Review https://t.ly/RmVSV
πŸ‘‰Paper https://arxiv.org/pdf/2406.02147
πŸ‘‰Project https://liautoad.github.io/ua-track-website
πŸ‘8❀1πŸ”₯1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊 Universal 6D Pose/Tracking 🧊

πŸ‘‰Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.

πŸ‘‰Review https://t.ly/Ywgl1
πŸ‘‰Paper arxiv.org/pdf/2406.04316
πŸ‘‰Project https://lnkd.in/dHBvenhX
πŸ‘‰Lib https://lnkd.in/d8Yc-KFh
❀12πŸ‘4🀩2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— SOTA Multi-Garment VTOn Editing πŸ‘—

πŸ‘‰#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!

πŸ‘‰Review https://t.ly/66mLN
πŸ‘‰Paper arxiv.org/pdf/2406.04542
πŸ‘‰Project https://mmvto.github.io
πŸ‘4❀3πŸ₯°3πŸ”₯1🀯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘‘ Kling AI vs. OpenAI Sora πŸ‘‘

πŸ‘‰Kling: the ultimate Chinese text-to-video model - rival to #OpenAI’s Sora. No papers or tech info to check, but stunning results from the official site.

πŸ‘‰Review https://t.ly/870DQ
πŸ‘‰Paper ???
πŸ‘‰Project https://kling.kuaishou.com/
πŸ”₯6πŸ‘3❀1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‰ MASA: MOT Anything By SAM πŸ‰

πŸ‘‰MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June πŸ’™

πŸ‘‰Review https://t.ly/pKdEV
πŸ‘‰Paper https://lnkd.in/dnjuT7xm
πŸ‘‰Project https://lnkd.in/dYbWzG4E
πŸ‘‰Code https://lnkd.in/dr5BJCXm
πŸ”₯16❀4πŸ‘3πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎹 PianoMotion10M for gen-hands 🎹

πŸ‘‰PianoMotion10M: 116 hours of piano playing videos from a bird’s-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset releasedπŸ’™

πŸ‘‰Review https://t.ly/_pKKz
πŸ‘‰Paper arxiv.org/pdf/2406.09326
πŸ‘‰Code https://lnkd.in/dcBP6nvm
πŸ‘‰Project https://lnkd.in/d_YqZk8x
πŸ‘‰Dataset https://lnkd.in/dUPyfNDA
❀8πŸ”₯4⚑1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“«MeshPose: DensePose+HMRπŸ“«

πŸ‘‰MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.

πŸ‘‰Review https://t.ly/a-5uN
πŸ‘‰Paper arxiv.org/pdf/2406.10180
πŸ‘‰Project https://meshpose.github.io/
πŸ”₯6❀1πŸ‘1
lowlight_back_n_forth.gif
1.4 MB
🌡 RobustSAM for Degraded Images 🌡

πŸ‘‰RobustSAM, the evolution of SAM for degraded images; enhancing the SAM’s performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code releasedπŸ’™

πŸ‘‰Review https://t.ly/mnyyG
πŸ‘‰Paper arxiv.org/pdf/2406.09627
πŸ‘‰Project robustsam.github.io
πŸ‘‰Code github.com/robustsam/RobustSAM
❀5πŸ‘1πŸ”₯1πŸ‘1