AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧿 Shape of Motion for 4D 🧿

👉 Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released 💙

👉Review https://t.ly/d9RsA
👉Project https://shape-of-motion.github.io/
👉Paper arxiv.org/pdf/2407.13764
👉Code github.com/vye16/shape-of-motion/
5🤯4🔥2👍1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭 TRG: new SOTA 6DoF Head 🎭

👉ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released💙

👉Review https://t.ly/lOIRA
👉Paper https://lnkd.in/dCWEwNyF
👉Code https://lnkd.in/dzRrwKBD
🔥5🤯3👍1🥰1
🏆Who's the REAL SOTA tracker in the world?🏆

👉BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available💙

👉Review https://t.ly/WB9AR
👉Paper https://arxiv.org/pdf/2407.15707
👉Code github.com/BasitAlawode/Best_of_N_Trackers
🔥5🤯5👍21😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🐢 TAPTRv2: new SOTA for TAP 🐢

👉TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming💙

👉Review https://t.ly/H84ae
👉Paper v1 https://lnkd.in/d4vD_6xx
👉Paper v2 https://lnkd.in/dE_TUzar
👉Project https://taptr.github.io/
👉Code https://lnkd.in/dgfs9Qdy
👍6🔥3🤯32😱1
🧱EAFormer: Scene Text-Segm.🧱

👉A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on 🤗

👉Review https://t.ly/0G2uX
👉Paper arxiv.org/pdf/2407.17020
👉Project hyangyu.github.io/EAFormer/
👉Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
14🔥6👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👽 Keypoint Promptable Re-ID 👽

👉KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon💙

👉Review https://t.ly/vCXV_
👉Paper https://arxiv.org/pdf/2407.18112
👉Repo github.com/VlSomers/keypoint_promptable_reidentification
🔥6👍3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁 A guide for modern CV 🎁

👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

𝐁𝐨𝐨𝐤𝐬:
DL with Python https://t.ly/VjaVx
Python OOP https://t.ly/pTQRm

V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
Berkeley | Modern CV (2023) https://t.ly/AU7S3

𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
PyTorch https://lnkd.in/dTvJbjAx
PyTorchLighting https://lnkd.in/dAruPA6T
Albumentations https://albumentations.ai/

𝐏𝐚𝐩𝐞𝐫𝐬:
EfficientNet https://lnkd.in/dTsT44ae
ViT https://lnkd.in/dB5yKdaW
UNet https://lnkd.in/dnpKVa6T
DeepLabV3+ https://lnkd.in/dVvqkmPk
YOLOv1: https://lnkd.in/dQ9rs53B
YOLOv2: arxiv.org/abs/1612.08242
YOLOX: https://lnkd.in/d9ZtsF7g
SAM: https://arxiv.org/abs/2304.02643

👉More papers and the full list: https://t.ly/WAwAk
34👍19
This media is not supported in your browser
VIEW IN TELEGRAM
🪄 Diffusion Models for Transparency 🪄

👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺

👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
🔥17👍411🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥🔥 SAM v2 is out! 🔥🔥

👉#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses💙

👉Review https://t.ly/oovJZ
👉Paper https://t.ly/sCxMY
👉Demo https://sam2.metademolab.com
👉Project ai.meta.com/blog/segment-anything-2/
👉Models github.com/facebookresearch/segment-anything-2
🔥2710🤯4👍2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
👋 Real-time Expressive Hands 👋

👉Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024💙

👉Review https://t.ly/8obbB
👉Project https://lnkd.in/dRtVGe6i
👉Paper https://lnkd.in/daCx2iB7
👉Code https://lnkd.in/dZ9pgzug
👏6👍32🤣21🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧪 Click-Attention Segmentation 🧪

👉An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache💙

👉Review https://t.ly/tG05L
👉Paper https://arxiv.org/pdf/2408.06021
👉Code https://github.com/hahamyt/ClickAttention
12🔥3👍2👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🏗️ #Adobe Instant TurboEdit 🏗️

👉Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.

👉Review https://t.ly/Na7-y
👉Paper https://lnkd.in/dVs9RcCK
👉Project https://lnkd.in/dGCqwh9Z
👉Code 😢
🔥14👍4🥰2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦓 Zebra Detection & Pose 🦓

👉The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced💙

👉Review https://t.ly/HTEZZ
👉Paper https://lnkd.in/dQYT-fyq
👉Project https://lnkd.in/dAnNXgG3
👉Code https://lnkd.in/dhvU97xD
👏7👍31🔥1🥰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧Sapiens: SOTA ViTs for human🦧

👉META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming💙

👉Review https://t.ly/GKQI0
👉Paper arxiv.org/pdf/2408.12569
👉Project rawalkhirodkar.github.io/sapiens
👉Code github.com/facebookresearch/sapiens
🔥197🥰2👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🐺 Diffusion Game Engine 🐺

👉#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it 💙

👉Review https://t.ly/_WR5z
👉Paper https://lnkd.in/dZqgiqb9
👉Project https://lnkd.in/dJUd2Fr6
🔥10👍52👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🫒 Omni Urban Scene Reconstruction 🫒

👉OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released💙

👉Review https://t.ly/SXVPa
👉Paper arxiv.org/pdf/2408.16760
👉Project ziyc.github.io/omnire/
👉Code github.com/ziyc/drivestudio
🔥10👍93🤯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
💄Interactive Drag-based Editing💄

👉CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming💙

👉Review https://t.ly/hy6SL
👉Paper arxiv.org/pdf/2409.08857
👉Project joonghyuk.com/instantdrag-web/
👉Code github.com/alex4727/InstantDrag
🔥13👍3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌭Hand-Object interaction Pretraining🌭

👉Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

👉Review https://t.ly/FLqvJ
👉Paper https://arxiv.org/pdf/2409.08273
👉Project https://hgaurav2k.github.io/hop/
🥰31👍1🔥1