AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ¬UHM: Authentic Hand by PhoneπŸͺ¬

πŸ‘‰ META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/fU5rA
πŸ‘‰Paper https://lnkd.in/dyGaiAnq
πŸ‘‰Code https://lnkd.in/d9B_XFAA
πŸ‘4❀1πŸ”₯1🀯1
πŸ”₯EfficientTrain++: Efficient Foundation Visual Backbone TrainingπŸ”₯

πŸ‘‰Tsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0Γ— faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/D8ttv
πŸ‘‰Paper https://arxiv.org/pdf/2405.08768
πŸ‘‰Code https://github.com/LeapLabTHU/EfficientTrain
πŸ‘9πŸ”₯3🀯3❀2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«€ EchoTracker: Tracking EchocardiographyπŸ«€

πŸ‘‰EchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/NyBe0
πŸ‘‰Paper https://arxiv.org/pdf/2405.08587
πŸ‘‰Code https://github.com/riponazad/echotracker/
❀15πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦• Grounding DINO 1.5 Pro/Edge πŸ¦•

πŸ‘‰Grounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/kS-og
πŸ‘‰Paper https://lnkd.in/dNakMge2
πŸ‘‰Code https://lnkd.in/djhnQmrm
πŸ”₯22❀1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽3D Shot Posture in Broadcast⚽

πŸ‘‰Nagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.

πŸ‘‰Review https://t.ly/IIMeZ
πŸ‘‰Paper https://arxiv.org/pdf/2405.12070
πŸ‘‰Code https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
πŸ”₯8πŸ₯°1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ–ΌοΈ Diffusive Images that Sound πŸ–ΌοΈ

πŸ‘‰The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.

πŸ‘‰Review https://t.ly/ADtYM
πŸ‘‰Paper arxiv.org/pdf/2405.12221
πŸ‘‰Project ificl.github.io/images-that-sound
πŸ‘‰Code github.com/IFICL/images-that-sound
🀯11❀5😍5πŸ”₯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘šViViD: Diffusion VTONπŸ‘š

πŸ‘‰ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet😒

πŸ‘‰Review https://t.ly/h_SyP
πŸ‘‰Paper arxiv.org/pdf/2405.11794
πŸ‘‰Repo https://lnkd.in/dT4_bzPw
πŸ‘‰Project https://lnkd.in/dCK5ug4v
πŸ”₯13🀩3❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€OmniGlue: Foundation MatcherπŸ€

πŸ‘‰#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!

πŸ‘‰Review https://t.ly/ezaIc
πŸ‘‰Paper https://arxiv.org/pdf/2405.12979
πŸ‘‰Project hwjiang1510.github.io/OmniGlue/
πŸ‘‰Code https://github.com/google-research/omniglue/
🀯10❀6πŸ‘2πŸ‘1
πŸ”₯ YOLOv10 is out πŸ”₯

πŸ‘‰YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0πŸ’™

πŸ‘‰Review https://shorturl.at/ZIHBh
πŸ‘‰Paper arxiv.org/pdf/2405.14458
πŸ‘‰Code https://github.com/THU-MIG/yolov10/
πŸ”₯25❀3πŸ‘2⚑1
This media is not supported in your browser
VIEW IN TELEGRAM
β›ˆοΈUnsupervised Neuromorphic Motionβ›ˆοΈ

πŸ‘‰The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.

πŸ‘‰Review https://t.ly/UZzIZ
πŸ‘‰Paper arxiv.org/pdf/2405.15209
πŸ‘‰Project samiarja.github.io/evairborne
πŸ‘‰Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
πŸ‘5πŸ”₯1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦“ Z.S. Diffusive Segmentation πŸ¦“

πŸ‘‰KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MITπŸ’™

πŸ‘‰Review https://t.ly/v_64K
πŸ‘‰Paper arxiv.org/pdf/2405.16947
πŸ‘‰Project https://lnkd.in/dcSt4dQx
πŸ‘‰Code https://lnkd.in/dcZfM8F3
🀯4πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ° Dynamic Gaussian Fusion via 4D Motion Scaffolds πŸͺ°

πŸ‘‰MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!

πŸ‘‰Review https://t.ly/nSdEL
πŸ‘‰Paper arxiv.org/pdf/2405.17421
πŸ‘‰Code github.com/JiahuiLei/MoSca
πŸ‘‰Project https://lnkd.in/dkjMVcqZ
πŸ”₯6πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀Transformer-based 4D Hands🧀

πŸ‘‰4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced 😒

πŸ‘‰Review https://t.ly/wvG-l
πŸ‘‰Paper arxiv.org/pdf/2405.20330
πŸ‘‰Project 4dhands.github.io/
πŸ”₯4🀯3❀1πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭New 2D Landmarks SOTA🎭

πŸ‘‰Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announcedπŸ₯Ή

πŸ‘‰Review https://t.ly/lew9a
πŸ‘‰Paper arxiv.org/pdf/2405.19646
πŸ‘‰Project davidcferman.github.io/FaceLift
πŸ”₯16❀5😒5πŸ‘2πŸ’©2⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐳 MultiPly: in-the-wild Multi-People 🐳

πŸ‘‰MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/_xjk_
πŸ‘‰Paper arxiv.org/pdf/2406.01595
πŸ‘‰Project eth-ait.github.io/MultiPly
πŸ‘‰Repo github.com/eth-ait/MultiPly
πŸ”₯14πŸ‘4πŸ‘2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ΉAI and the Everything in the Whole Wide World BenchmarkπŸ‘Ή

πŸ‘‰Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a β€œsuperintelligence”.

πŸ‘‰Review https://t.ly/isdxM
πŸ‘‰News https://lnkd.in/dFraieZS
πŸ‘‰Paper https://lnkd.in/da-7PnVT
❀5πŸ‘2πŸ‘1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“žFacET: VideoCall Change Your ExpressionπŸ“ž

πŸ‘‰Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).

πŸ‘‰Review https://t.ly/qsQmt
πŸ‘‰Paper arxiv.org/pdf/2406.00955
πŸ‘‰Project facet.cs.columbia.edu/
πŸ‘‰Repo (empty) github.com/stellargo/facet
πŸ”₯8❀1πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸš™ UA-Track: Uncertainty-Aware MOTπŸš™

πŸ‘‰UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.

πŸ‘‰Review https://t.ly/RmVSV
πŸ‘‰Paper https://arxiv.org/pdf/2406.02147
πŸ‘‰Project https://liautoad.github.io/ua-track-website
πŸ‘8❀1πŸ”₯1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊 Universal 6D Pose/Tracking 🧊

πŸ‘‰Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.

πŸ‘‰Review https://t.ly/Ywgl1
πŸ‘‰Paper arxiv.org/pdf/2406.04316
πŸ‘‰Project https://lnkd.in/dHBvenhX
πŸ‘‰Lib https://lnkd.in/d8Yc-KFh
❀12πŸ‘4🀩2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— SOTA Multi-Garment VTOn Editing πŸ‘—

πŸ‘‰#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!

πŸ‘‰Review https://t.ly/66mLN
πŸ‘‰Paper arxiv.org/pdf/2406.04542
πŸ‘‰Project https://mmvto.github.io
πŸ‘4❀3πŸ₯°3πŸ”₯1🀯1😱1