AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
šŸŽ­ TRG: new SOTA 6DoF Head šŸŽ­

šŸ‘‰ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedšŸ’™

šŸ‘‰Review https://t.ly/lOIRA
šŸ‘‰Paper https://lnkd.in/dCWEwNyF
šŸ‘‰Code https://lnkd.in/dzRrwKBD
šŸ”„5🤯3šŸ‘1🄰1
šŸ†Who's the REAL SOTA tracker in the world?šŸ†

šŸ‘‰BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availablešŸ’™

šŸ‘‰Review https://t.ly/WB9AR
šŸ‘‰Paper https://arxiv.org/pdf/2407.15707
šŸ‘‰Code github.com/BasitAlawode/Best_of_N_Trackers
šŸ”„5🤯5šŸ‘2ā¤1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🐢 TAPTRv2: new SOTA for TAP 🐢

šŸ‘‰TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingšŸ’™

šŸ‘‰Review https://t.ly/H84ae
šŸ‘‰Paper v1 https://lnkd.in/d4vD_6xx
šŸ‘‰Paper v2 https://lnkd.in/dE_TUzar
šŸ‘‰Project https://taptr.github.io/
šŸ‘‰Code https://lnkd.in/dgfs9Qdy
šŸ‘6šŸ”„3🤯3ā¤2😱1
🧱EAFormer: Scene Text-Segm.🧱

šŸ‘‰A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on šŸ¤—

šŸ‘‰Review https://t.ly/0G2uX
šŸ‘‰Paper arxiv.org/pdf/2407.17020
šŸ‘‰Project hyangyu.github.io/EAFormer/
šŸ‘‰Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
ā¤14šŸ”„6šŸ‘1🄰1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ‘½ Keypoint Promptable Re-ID šŸ‘½

šŸ‘‰KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonšŸ’™

šŸ‘‰Review https://t.ly/vCXV_
šŸ‘‰Paper https://arxiv.org/pdf/2407.18112
šŸ‘‰Repo github.com/VlSomers/keypoint_promptable_reidentification
šŸ”„6šŸ‘3🄰1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸŽ A guide for modern CV šŸŽ

šŸ‘‰In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

ššØšØš¤š¬:
āœ…DL with Python https://t.ly/VjaVx
āœ…Python OOP https://t.ly/pTQRm

Vš¢ššžšØ š‚šØš®š«š¬šžš¬:
āœ…Berkeley | Modern CV (2023) https://t.ly/AU7S3

š‹š¢š›š«ššš«š¢šžš¬:
āœ…PyTorch https://lnkd.in/dTvJbjAx
āœ…PyTorchLighting https://lnkd.in/dAruPA6T
āœ…Albumentations https://albumentations.ai/

šššš©šžš«š¬:
āœ…EfficientNet https://lnkd.in/dTsT44ae
āœ…ViT https://lnkd.in/dB5yKdaW
āœ…UNet https://lnkd.in/dnpKVa6T
āœ…DeepLabV3+ https://lnkd.in/dVvqkmPk
āœ…YOLOv1: https://lnkd.in/dQ9rs53B
āœ…YOLOv2: arxiv.org/abs/1612.08242
āœ…YOLOX: https://lnkd.in/d9ZtsF7g
āœ…SAM: https://arxiv.org/abs/2304.02643

šŸ‘‰More papers and the full list: https://t.ly/WAwAk
ā¤34šŸ‘19
This media is not supported in your browser
VIEW IN TELEGRAM
šŸŖ„ Diffusion Models for Transparency šŸŖ„

šŸ‘‰MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🄺

šŸ‘‰Review https://t.ly/U98_G
šŸ‘‰Paper arxiv.org/pdf/2312.02970
šŸ‘‰Project www.prafullsharma.net/alchemist/
šŸ”„17šŸ‘4⚔1ā¤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ”„šŸ”„ SAM v2 is out! šŸ”„šŸ”„

šŸ‘‰#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesšŸ’™

šŸ‘‰Review https://t.ly/oovJZ
šŸ‘‰Paper https://t.ly/sCxMY
šŸ‘‰Demo https://sam2.metademolab.com
šŸ‘‰Project ai.meta.com/blog/segment-anything-2/
šŸ‘‰Models github.com/facebookresearch/segment-anything-2
šŸ”„27ā¤10🤯4šŸ‘2šŸ¾1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ‘‹ Real-time Expressive Hands šŸ‘‹

šŸ‘‰Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024šŸ’™

šŸ‘‰Review https://t.ly/8obbB
šŸ‘‰Project https://lnkd.in/dRtVGe6i
šŸ‘‰Paper https://lnkd.in/daCx2iB7
šŸ‘‰Code https://lnkd.in/dZ9pgzug
šŸ‘6šŸ‘3ā¤2🤣2⚔1šŸ”„1
This media is not supported in your browser
VIEW IN TELEGRAM
🧪 Click-Attention Segmentation 🧪

šŸ‘‰An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under ApachešŸ’™

šŸ‘‰Review https://t.ly/tG05L
šŸ‘‰Paper https://arxiv.org/pdf/2408.06021
šŸ‘‰Code https://github.com/hahamyt/ClickAttention
ā¤12šŸ”„3šŸ‘2šŸ‘1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ—ļø #Adobe Instant TurboEdit šŸ—ļø

šŸ‘‰Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.

šŸ‘‰Review https://t.ly/Na7-y
šŸ‘‰Paper https://lnkd.in/dVs9RcCK
šŸ‘‰Project https://lnkd.in/dGCqwh9Z
šŸ‘‰Code 😢
šŸ”„14šŸ‘4🄰2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ¦“ Zebra Detection & Pose šŸ¦“

šŸ‘‰The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourcedšŸ’™

šŸ‘‰Review https://t.ly/HTEZZ
šŸ‘‰Paper https://lnkd.in/dQYT-fyq
šŸ‘‰Project https://lnkd.in/dAnNXgG3
šŸ‘‰Code https://lnkd.in/dhvU97xD
šŸ‘7šŸ‘3ā¤1šŸ”„1🄰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧Sapiens: SOTA ViTs for human🦧

šŸ‘‰META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingšŸ’™

šŸ‘‰Review https://t.ly/GKQI0
šŸ‘‰Paper arxiv.org/pdf/2408.12569
šŸ‘‰Project rawalkhirodkar.github.io/sapiens
šŸ‘‰Code github.com/facebookresearch/sapiens
šŸ”„19ā¤7🄰2šŸ‘1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🐺 Diffusion Game Engine 🐺

šŸ‘‰#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it šŸ’™

šŸ‘‰Review https://t.ly/_WR5z
šŸ‘‰Paper https://lnkd.in/dZqgiqb9
šŸ‘‰Project https://lnkd.in/dJUd2Fr6
šŸ”„10šŸ‘5ā¤2šŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ«’ Omni Urban Scene Reconstruction šŸ«’

šŸ‘‰OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedšŸ’™

šŸ‘‰Review https://t.ly/SXVPa
šŸ‘‰Paper arxiv.org/pdf/2408.16760
šŸ‘‰Project ziyc.github.io/omnire/
šŸ‘‰Code github.com/ziyc/drivestudio
šŸ”„10šŸ‘9ā¤3🤯1šŸ¾1
This media is not supported in your browser
VIEW IN TELEGRAM
šŸ’„Interactive Drag-based EditingšŸ’„

šŸ‘‰CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingšŸ’™

šŸ‘‰Review https://t.ly/hy6SL
šŸ‘‰Paper arxiv.org/pdf/2409.08857
šŸ‘‰Project joonghyuk.com/instantdrag-web/
šŸ‘‰Code github.com/alex4727/InstantDrag
šŸ”„13šŸ‘3šŸ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
🌭Hand-Object interaction Pretraining🌭

šŸ‘‰Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

šŸ‘‰Review https://t.ly/FLqvJ
šŸ‘‰Paper https://arxiv.org/pdf/2409.08273
šŸ‘‰Project https://hgaurav2k.github.io/hop/
🄰3ā¤1šŸ‘1šŸ”„1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸Motion Instruction Fine-Tuning🧸

šŸ‘‰MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingšŸ’™

šŸ‘‰Review https://t.ly/iJ2UY
šŸ‘‰Paper https://arxiv.org/pdf/2409.10683
šŸ‘‰Project https://motif-1k.github.io/
šŸ‘‰Code coming
šŸ‘1šŸ”„1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ SoccerNet 2024 Results ⚽

šŸ‘‰SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

šŸ‘‰Review https://t.ly/DUPgx
šŸ‘‰Paper arxiv.org/pdf/2409.10587
šŸ‘‰Repo github.com/SoccerNet
šŸ‘‰Project www.soccer-net.org/
šŸ”„12šŸ‘6🤯1