AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
๐Ÿ†Who's the REAL SOTA tracker in the world?๐Ÿ†

๐Ÿ‘‰BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WB9AR
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.15707
๐Ÿ‘‰Code github.com/BasitAlawode/Best_of_N_Trackers
๐Ÿ”ฅ5๐Ÿคฏ5๐Ÿ‘2โค1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿข TAPTRv2: new SOTA for TAP ๐Ÿข

๐Ÿ‘‰TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/H84ae
๐Ÿ‘‰Paper v1 https://lnkd.in/d4vD_6xx
๐Ÿ‘‰Paper v2 https://lnkd.in/dE_TUzar
๐Ÿ‘‰Project https://taptr.github.io/
๐Ÿ‘‰Code https://lnkd.in/dgfs9Qdy
๐Ÿ‘6๐Ÿ”ฅ3๐Ÿคฏ3โค2๐Ÿ˜ฑ1
๐ŸงฑEAFormer: Scene Text-Segm.๐Ÿงฑ

๐Ÿ‘‰A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐Ÿค—

๐Ÿ‘‰Review https://t.ly/0G2uX
๐Ÿ‘‰Paper arxiv.org/pdf/2407.17020
๐Ÿ‘‰Project hyangyu.github.io/EAFormer/
๐Ÿ‘‰Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐Ÿ”ฅ6๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฝ Keypoint Promptable Re-ID ๐Ÿ‘ฝ

๐Ÿ‘‰KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/vCXV_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.18112
๐Ÿ‘‰Repo github.com/VlSomers/keypoint_promptable_reidentification
๐Ÿ”ฅ6๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ A guide for modern CV ๐ŸŽ

๐Ÿ‘‰In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

๐๐จ๐จ๐ค๐ฌ:
โœ…DL with Python https://t.ly/VjaVx
โœ…Python OOP https://t.ly/pTQRm

V๐ข๐๐ž๐จ ๐‚๐จ๐ฎ๐ซ๐ฌ๐ž๐ฌ:
โœ…Berkeley | Modern CV (2023) https://t.ly/AU7S3

๐‹๐ข๐›๐ซ๐š๐ซ๐ข๐ž๐ฌ:
โœ…PyTorch https://lnkd.in/dTvJbjAx
โœ…PyTorchLighting https://lnkd.in/dAruPA6T
โœ…Albumentations https://albumentations.ai/

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โœ…EfficientNet https://lnkd.in/dTsT44ae
โœ…ViT https://lnkd.in/dB5yKdaW
โœ…UNet https://lnkd.in/dnpKVa6T
โœ…DeepLabV3+ https://lnkd.in/dVvqkmPk
โœ…YOLOv1: https://lnkd.in/dQ9rs53B
โœ…YOLOv2: arxiv.org/abs/1612.08242
โœ…YOLOX: https://lnkd.in/d9ZtsF7g
โœ…SAM: https://arxiv.org/abs/2304.02643

๐Ÿ‘‰More papers and the full list: https://t.ly/WAwAk
โค34๐Ÿ‘19
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช„ Diffusion Models for Transparency ๐Ÿช„

๐Ÿ‘‰MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/U98_G
๐Ÿ‘‰Paper arxiv.org/pdf/2312.02970
๐Ÿ‘‰Project www.prafullsharma.net/alchemist/
๐Ÿ”ฅ17๐Ÿ‘4โšก1โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ๐Ÿ”ฅ SAM v2 is out! ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ‘‰#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/oovJZ
๐Ÿ‘‰Paper https://t.ly/sCxMY
๐Ÿ‘‰Demo https://sam2.metademolab.com
๐Ÿ‘‰Project ai.meta.com/blog/segment-anything-2/
๐Ÿ‘‰Models github.com/facebookresearch/segment-anything-2
๐Ÿ”ฅ27โค10๐Ÿคฏ4๐Ÿ‘2๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘‹ Real-time Expressive Hands ๐Ÿ‘‹

๐Ÿ‘‰Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/8obbB
๐Ÿ‘‰Project https://lnkd.in/dRtVGe6i
๐Ÿ‘‰Paper https://lnkd.in/daCx2iB7
๐Ÿ‘‰Code https://lnkd.in/dZ9pgzug
๐Ÿ‘6๐Ÿ‘3โค2๐Ÿคฃ2โšก1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงช Click-Attention Segmentation ๐Ÿงช

๐Ÿ‘‰An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/tG05L
๐Ÿ‘‰Paper https://arxiv.org/pdf/2408.06021
๐Ÿ‘‰Code https://github.com/hahamyt/ClickAttention
โค12๐Ÿ”ฅ3๐Ÿ‘2๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ—๏ธ #Adobe Instant TurboEdit ๐Ÿ—๏ธ

๐Ÿ‘‰Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.

๐Ÿ‘‰Review https://t.ly/Na7-y
๐Ÿ‘‰Paper https://lnkd.in/dVs9RcCK
๐Ÿ‘‰Project https://lnkd.in/dGCqwh9Z
๐Ÿ‘‰Code ๐Ÿ˜ข
๐Ÿ”ฅ14๐Ÿ‘4๐Ÿฅฐ2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ“ Zebra Detection & Pose ๐Ÿฆ“

๐Ÿ‘‰The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HTEZZ
๐Ÿ‘‰Paper https://lnkd.in/dQYT-fyq
๐Ÿ‘‰Project https://lnkd.in/dAnNXgG3
๐Ÿ‘‰Code https://lnkd.in/dhvU97xD
๐Ÿ‘7๐Ÿ‘3โค1๐Ÿ”ฅ1๐Ÿฅฐ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆงSapiens: SOTA ViTs for human๐Ÿฆง

๐Ÿ‘‰META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GKQI0
๐Ÿ‘‰Paper arxiv.org/pdf/2408.12569
๐Ÿ‘‰Project rawalkhirodkar.github.io/sapiens
๐Ÿ‘‰Code github.com/facebookresearch/sapiens
๐Ÿ”ฅ19โค7๐Ÿฅฐ2๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿบ Diffusion Game Engine ๐Ÿบ

๐Ÿ‘‰#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_WR5z
๐Ÿ‘‰Paper https://lnkd.in/dZqgiqb9
๐Ÿ‘‰Project https://lnkd.in/dJUd2Fr6
๐Ÿ”ฅ10๐Ÿ‘5โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ’ Omni Urban Scene Reconstruction ๐Ÿซ’

๐Ÿ‘‰OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SXVPa
๐Ÿ‘‰Paper arxiv.org/pdf/2408.16760
๐Ÿ‘‰Project ziyc.github.io/omnire/
๐Ÿ‘‰Code github.com/ziyc/drivestudio
๐Ÿ”ฅ10๐Ÿ‘9โค3๐Ÿคฏ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’„Interactive Drag-based Editing๐Ÿ’„

๐Ÿ‘‰CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/hy6SL
๐Ÿ‘‰Paper arxiv.org/pdf/2409.08857
๐Ÿ‘‰Project joonghyuk.com/instantdrag-web/
๐Ÿ‘‰Code github.com/alex4727/InstantDrag
๐Ÿ”ฅ13๐Ÿ‘3๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒญHand-Object interaction Pretraining๐ŸŒญ

๐Ÿ‘‰Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

๐Ÿ‘‰Review https://t.ly/FLqvJ
๐Ÿ‘‰Paper https://arxiv.org/pdf/2409.08273
๐Ÿ‘‰Project https://hgaurav2k.github.io/hop/
๐Ÿฅฐ3โค1๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงธMotion Instruction Fine-Tuning๐Ÿงธ

๐Ÿ‘‰MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/iJ2UY
๐Ÿ‘‰Paper https://arxiv.org/pdf/2409.10683
๐Ÿ‘‰Project https://motif-1k.github.io/
๐Ÿ‘‰Code coming
๐Ÿ‘1๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ SoccerNet 2024 Results โšฝ

๐Ÿ‘‰SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

๐Ÿ‘‰Review https://t.ly/DUPgx
๐Ÿ‘‰Paper arxiv.org/pdf/2409.10587
๐Ÿ‘‰Repo github.com/SoccerNet
๐Ÿ‘‰Project www.soccer-net.org/
๐Ÿ”ฅ12๐Ÿ‘6๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ JoyHallo: Mandarin Digital Human ๐ŸŒ

๐Ÿ‘‰JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageโ€™s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/5NGDh
๐Ÿ‘‰Paper arxiv.org/pdf/2409.13268
๐Ÿ‘‰Project jdh-algo.github.io/JoyHallo/
๐Ÿ‘‰Code github.com/jdh-algo/JoyHallo
๐Ÿ”ฅ9๐Ÿ‘1๐Ÿคฏ1