AI with Papers - Artificial Intelligence & Deep Learning

🦑 Hyper-Detailed Image Descriptions 🦑

👉#Google unveils ImageInWords (IIW), a carefully designed HIL annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process

👉Review https://t.ly/engkl
👉Paper arxiv.org/pdf/2405.02793
👉Repo github.com/google/imageinwords
👉Project google.github.io/imageinwords
👉Data huggingface.co/datasets/google/imageinwords

❤11🔥3👍2🤯2🍾1

7.94K viewsedited 16:01

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔫 Free-Moving Reconstruction 🔫

👉EPFL (+#MagicLeap) unveils a novel approach for reconstructing free-moving object from monocular RGB clip. Free interaction with objects in front of a moving cam without relying on any prior, and optimizes the sequence globally without any segments. Great but no code announced🥺

👉Review https://t.ly/2xhtj
👉Paper arxiv.org/pdf/2405.05858
👉Project haixinshi.github.io/fmov/

👍6🤯4⚡1❤1🥰1

8.48K views08:55

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

💥FeatUp: Any Model at Any Resolution💥

👉FeatUp is a task-model agnostic framework to restore lost spatial information in deep features. It outperforms other methods in class activation map generation, transfer learning for segmentation & depth, and end-to-end training for semantic segm. Source Code released💙

👉Review https://t.ly/Evq_g
👉Paper https://lnkd.in/gweaN4s6
👉Project https://lnkd.in/gWcGXdxt
👉Code https://lnkd.in/gweq5NY4

🔥19❤4👍3👏1🍾1

8K viewsedited 06:52

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐏AniTalker: Universal Talking Humans🐏

👉SJTU (+AISpeech) unveils AniTalker, a framework that transforms a single static portrait and input audio into animated talking videos with naturally flowing movements.

👉Review https://t.ly/MD4yX
👉Paper https://arxiv.org/pdf/2405.03121
👉Project https://x-lance.github.io/AniTalker/
👉Repo https://github.com/X-LANCE/AniTalker

🔥6❤4👍2⚡1🤯1

7.17K viewsedited 12:38

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

👻 3D Humans Motion from Text 👻

👉Zhejiang (+ANT) unveils a novel method to generate human motions containing accurate human-object interactions in 3D scenes based on textural descriptions. Code announced, coming 💙

👉Review https://t.ly/eOZnU
👉Paper https://arxiv.org/pdf/2405.07784
👉Project https://zju3dv.github.io/text_scene_motion/

👍3🔥2❤1

7.44K viewsedited 06:57

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪬UHM: Authentic Hand by Phone🪬

👉 META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released 💙

👉Review https://t.ly/fU5rA
👉Paper https://lnkd.in/dyGaiAnq
👉Code https://lnkd.in/d9B_XFAA

👍4❤1🔥1🤯1

7.49K views15:51

AI with Papers - Artificial Intelligence & Deep Learning

🔥EfficientTrain++: Efficient Foundation Visual Backbone Training🔥

👉Tsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0× faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released 💙

👉Review https://t.ly/D8ttv
👉Paper https://arxiv.org/pdf/2405.08768
👉Code https://github.com/LeapLabTHU/EfficientTrain

👍9🔥3🤯3❤2🥰1

8.54K viewsedited 06:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🫀 EchoTracker: Tracking Echocardiography🫀

👉EchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code released💙

👉Review https://t.ly/NyBe0
👉Paper https://arxiv.org/pdf/2405.08587
👉Code https://github.com/riponazad/echotracker/

❤15👍1🥰1

8.19K viewsedited 12:04

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦕 Grounding DINO 1.5 Pro/Edge 🦕

👉Grounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0💙

👉Review https://t.ly/kS-og
👉Paper https://lnkd.in/dNakMge2
👉Code https://lnkd.in/djhnQmrm

🔥22❤1👍1😍1

8.69K views11:59

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

⚽3D Shot Posture in Broadcast⚽

👉Nagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.

👉Review https://t.ly/IIMeZ
👉Paper https://arxiv.org/pdf/2405.12070
👉Code https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master

🔥8🥰1😍1

7.44K viewsedited 06:30

AI with Papers - Artificial Intelligence & Deep Learning

0:09

This media is not supported in your browser

VIEW IN TELEGRAM

🖼️ Diffusive Images that Sound 🖼️

👉The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.

👉Review https://t.ly/ADtYM
👉Paper arxiv.org/pdf/2405.12221
👉Project ificl.github.io/images-that-sound
👉Code github.com/IFICL/images-that-sound

🤯11❤5😍5🔥4👍1

7.92K views11:50

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

👚ViViD: Diffusion VTON👚

👉ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet😢

👉Review https://t.ly/h_SyP
👉Paper arxiv.org/pdf/2405.11794
👉Repo https://lnkd.in/dT4_bzPw
👉Project https://lnkd.in/dCK5ug4v

🔥13🤩3❤1👍1

8.57K viewsedited 06:12

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍀OmniGlue: Foundation Matcher🍀

👉#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!

👉Review https://t.ly/ezaIc
👉Paper https://arxiv.org/pdf/2405.12979
👉Project hwjiang1510.github.io/OmniGlue/
👉Code https://github.com/google-research/omniglue/

🤯10❤6👍2👏1

8.65K viewsedited 06:39

AI with Papers - Artificial Intelligence & Deep Learning

🔥 YOLOv10 is out 🔥

👉YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0💙

👉Review https://shorturl.at/ZIHBh
👉Paper arxiv.org/pdf/2405.14458
👉Code https://github.com/THU-MIG/yolov10/

🔥25❤3👍2⚡1

8.57K viewsedited 07:54

AI with Papers - Artificial Intelligence & Deep Learning

0:16

This media is not supported in your browser

VIEW IN TELEGRAM

⛈️Unsupervised Neuromorphic Motion⛈️

👉The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.

👉Review https://t.ly/UZzIZ
👉Paper arxiv.org/pdf/2405.15209
👉Project samiarja.github.io/evairborne
👉Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation

👍5🔥1🥰1👏1

7.89K viewsedited 12:21

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🦓 Z.S. Diffusive Segmentation 🦓

👉KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT💙

👉Review https://t.ly/v_64K
👉Paper arxiv.org/pdf/2405.16947
👉Project https://lnkd.in/dcSt4dQx
👉Code https://lnkd.in/dcZfM8F3

🤯4🔥2👏1

8.27K views06:54

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪰 Dynamic Gaussian Fusion via 4D Motion Scaffolds 🪰

👉MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!

👉Review https://t.ly/nSdEL
👉Paper arxiv.org/pdf/2405.17421
👉Code github.com/JiahuiLei/MoSca
👉Project https://lnkd.in/dkjMVcqZ

🔥6👍1🥰1

8.32K views15:37

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🧤Transformer-based 4D Hands🧤

👉4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced 😢

👉Review https://t.ly/wvG-l
👉Paper arxiv.org/pdf/2405.20330
👉Project 4dhands.github.io/

🔥4🤯3❤1👍1👏1

7.78K viewsedited 11:44

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🎭New 2D Landmarks SOTA🎭

👉Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced🥹

👉Review https://t.ly/lew9a
👉Paper arxiv.org/pdf/2405.19646
👉Project davidcferman.github.io/FaceLift

🔥16❤5😢5👏2💩2⚡1👍1

7.62K views07:39

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐳 MultiPly: in-the-wild Multi-People 🐳

👉MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming💙

👉Review https://t.ly/_xjk_
👉Paper arxiv.org/pdf/2406.01595
👉Project eth-ait.github.io/MultiPly
👉Repo github.com/eth-ait/MultiPly

🔥14👍4👏2❤1🤯1

7.15K views06:54

About

Blog

Apps

Platform