AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
236 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฝ Keypoint Promptable Re-ID ๐Ÿ‘ฝ

๐Ÿ‘‰KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/vCXV_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.18112
๐Ÿ‘‰Repo github.com/VlSomers/keypoint_promptable_reidentification
๐Ÿ”ฅ6๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ A guide for modern CV ๐ŸŽ

๐Ÿ‘‰In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

๐๐จ๐จ๐ค๐ฌ:
โœ…DL with Python https://t.ly/VjaVx
โœ…Python OOP https://t.ly/pTQRm

V๐ข๐๐ž๐จ ๐‚๐จ๐ฎ๐ซ๐ฌ๐ž๐ฌ:
โœ…Berkeley | Modern CV (2023) https://t.ly/AU7S3

๐‹๐ข๐›๐ซ๐š๐ซ๐ข๐ž๐ฌ:
โœ…PyTorch https://lnkd.in/dTvJbjAx
โœ…PyTorchLighting https://lnkd.in/dAruPA6T
โœ…Albumentations https://albumentations.ai/

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โœ…EfficientNet https://lnkd.in/dTsT44ae
โœ…ViT https://lnkd.in/dB5yKdaW
โœ…UNet https://lnkd.in/dnpKVa6T
โœ…DeepLabV3+ https://lnkd.in/dVvqkmPk
โœ…YOLOv1: https://lnkd.in/dQ9rs53B
โœ…YOLOv2: arxiv.org/abs/1612.08242
โœ…YOLOX: https://lnkd.in/d9ZtsF7g
โœ…SAM: https://arxiv.org/abs/2304.02643

๐Ÿ‘‰More papers and the full list: https://t.ly/WAwAk
โค34๐Ÿ‘19
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช„ Diffusion Models for Transparency ๐Ÿช„

๐Ÿ‘‰MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/U98_G
๐Ÿ‘‰Paper arxiv.org/pdf/2312.02970
๐Ÿ‘‰Project www.prafullsharma.net/alchemist/
๐Ÿ”ฅ17๐Ÿ‘4โšก1โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ๐Ÿ”ฅ SAM v2 is out! ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ‘‰#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/oovJZ
๐Ÿ‘‰Paper https://t.ly/sCxMY
๐Ÿ‘‰Demo https://sam2.metademolab.com
๐Ÿ‘‰Project ai.meta.com/blog/segment-anything-2/
๐Ÿ‘‰Models github.com/facebookresearch/segment-anything-2
๐Ÿ”ฅ27โค10๐Ÿคฏ4๐Ÿ‘2๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘‹ Real-time Expressive Hands ๐Ÿ‘‹

๐Ÿ‘‰Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/8obbB
๐Ÿ‘‰Project https://lnkd.in/dRtVGe6i
๐Ÿ‘‰Paper https://lnkd.in/daCx2iB7
๐Ÿ‘‰Code https://lnkd.in/dZ9pgzug
๐Ÿ‘6๐Ÿ‘3โค2๐Ÿคฃ2โšก1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงช Click-Attention Segmentation ๐Ÿงช

๐Ÿ‘‰An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/tG05L
๐Ÿ‘‰Paper https://arxiv.org/pdf/2408.06021
๐Ÿ‘‰Code https://github.com/hahamyt/ClickAttention
โค12๐Ÿ”ฅ3๐Ÿ‘2๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ—๏ธ #Adobe Instant TurboEdit ๐Ÿ—๏ธ

๐Ÿ‘‰Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.

๐Ÿ‘‰Review https://t.ly/Na7-y
๐Ÿ‘‰Paper https://lnkd.in/dVs9RcCK
๐Ÿ‘‰Project https://lnkd.in/dGCqwh9Z
๐Ÿ‘‰Code ๐Ÿ˜ข
๐Ÿ”ฅ14๐Ÿ‘4๐Ÿฅฐ2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ“ Zebra Detection & Pose ๐Ÿฆ“

๐Ÿ‘‰The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HTEZZ
๐Ÿ‘‰Paper https://lnkd.in/dQYT-fyq
๐Ÿ‘‰Project https://lnkd.in/dAnNXgG3
๐Ÿ‘‰Code https://lnkd.in/dhvU97xD
๐Ÿ‘7๐Ÿ‘3โค1๐Ÿ”ฅ1๐Ÿฅฐ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆงSapiens: SOTA ViTs for human๐Ÿฆง

๐Ÿ‘‰META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GKQI0
๐Ÿ‘‰Paper arxiv.org/pdf/2408.12569
๐Ÿ‘‰Project rawalkhirodkar.github.io/sapiens
๐Ÿ‘‰Code github.com/facebookresearch/sapiens
๐Ÿ”ฅ19โค7๐Ÿฅฐ2๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿบ Diffusion Game Engine ๐Ÿบ

๐Ÿ‘‰#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_WR5z
๐Ÿ‘‰Paper https://lnkd.in/dZqgiqb9
๐Ÿ‘‰Project https://lnkd.in/dJUd2Fr6
๐Ÿ”ฅ10๐Ÿ‘5โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ’ Omni Urban Scene Reconstruction ๐Ÿซ’

๐Ÿ‘‰OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SXVPa
๐Ÿ‘‰Paper arxiv.org/pdf/2408.16760
๐Ÿ‘‰Project ziyc.github.io/omnire/
๐Ÿ‘‰Code github.com/ziyc/drivestudio
๐Ÿ”ฅ10๐Ÿ‘9โค3๐Ÿคฏ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’„Interactive Drag-based Editing๐Ÿ’„

๐Ÿ‘‰CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/hy6SL
๐Ÿ‘‰Paper arxiv.org/pdf/2409.08857
๐Ÿ‘‰Project joonghyuk.com/instantdrag-web/
๐Ÿ‘‰Code github.com/alex4727/InstantDrag
๐Ÿ”ฅ13๐Ÿ‘3๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒญHand-Object interaction Pretraining๐ŸŒญ

๐Ÿ‘‰Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

๐Ÿ‘‰Review https://t.ly/FLqvJ
๐Ÿ‘‰Paper https://arxiv.org/pdf/2409.08273
๐Ÿ‘‰Project https://hgaurav2k.github.io/hop/
๐Ÿฅฐ3โค1๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงธMotion Instruction Fine-Tuning๐Ÿงธ

๐Ÿ‘‰MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/iJ2UY
๐Ÿ‘‰Paper https://arxiv.org/pdf/2409.10683
๐Ÿ‘‰Project https://motif-1k.github.io/
๐Ÿ‘‰Code coming
๐Ÿ‘1๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ SoccerNet 2024 Results โšฝ

๐Ÿ‘‰SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

๐Ÿ‘‰Review https://t.ly/DUPgx
๐Ÿ‘‰Paper arxiv.org/pdf/2409.10587
๐Ÿ‘‰Repo github.com/SoccerNet
๐Ÿ‘‰Project www.soccer-net.org/
๐Ÿ”ฅ12๐Ÿ‘6๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ JoyHallo: Mandarin Digital Human ๐ŸŒ

๐Ÿ‘‰JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageโ€™s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/5NGDh
๐Ÿ‘‰Paper arxiv.org/pdf/2409.13268
๐Ÿ‘‰Project jdh-algo.github.io/JoyHallo/
๐Ÿ‘‰Code github.com/jdh-algo/JoyHallo
๐Ÿ”ฅ9๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽข Robo-quadruped Parkour๐ŸŽข

๐Ÿ‘‰LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-6VRm
๐Ÿ‘‰Paper arxiv.org/pdf/2409.13678
๐Ÿ‘‰Project gepetto.github.io/SoloParkour/
๐Ÿ‘‰Code github.com/Gepetto/SoloParkour
๐Ÿ”ฅ5๐Ÿ‘2๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฉฐ Dressed Humans in the wild ๐Ÿฉฐ

๐Ÿ‘‰ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/evgmN
๐Ÿ‘‰Paper arxiv.org/pdf/2409.15269
๐Ÿ‘‰Project moygcc.github.io/ReLoo/
๐Ÿ‘‰Code github.com/eth-ait/ReLoo
๐Ÿคฏ9โค2๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒพ New SOTA Edge Detection ๐ŸŒพ

๐Ÿ‘‰CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zUMcS
๐Ÿ‘‰Paper arxiv.org/pdf/2409.14976
๐Ÿ‘‰Code github.com/Li-yachuan/NBED
๐Ÿ”ฅ11๐Ÿ‘5๐Ÿ‘1