AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
236 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช…Anomaly Object-Detection๐Ÿช…

๐Ÿ‘‰The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying โ€˜odd-lookingโ€™ objects relative to the other instances within a multiple-views scene. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3dGHp
๐Ÿ‘‰Paper arxiv.org/pdf/2406.20099
๐Ÿ‘‰Repo https://lnkd.in/d9x6FpUq
โค10๐Ÿ”ฅ6๐Ÿ‘3๐Ÿ‘3โšก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชฉ MimicMotion: HQ Motion Generation ๐Ÿชฉ

๐Ÿ‘‰#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XFoin
๐Ÿ‘‰Paper arxiv.org/pdf/2406.19680
๐Ÿ‘‰Project https://lnkd.in/eW-CMg_C
๐Ÿ‘‰Code https://lnkd.in/eZ6SC2bc
๐Ÿ”ฅ12๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชด CAVIS: SOTA Context-Aware Segmentation๐Ÿชด

๐Ÿ‘‰DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/G5obN
๐Ÿ‘‰Paper arxiv.org/pdf/2407.03010
๐Ÿ‘‰Repo github.com/Seung-Hun-Lee/CAVIS
๐Ÿ‘‰Project seung-hun-lee.github.io/projects/CAVIS
โค6๐Ÿ‘5๐Ÿ”ฅ4๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Segment Any 4D Gaussians ๐Ÿ”ฅ

๐Ÿ‘‰SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/uw3FS
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.04504
๐Ÿ‘‰Project https://jsxzs.github.io/sa4d/
๐Ÿ‘‰Repo https://github.com/hustvl/SA4D
๐Ÿคฏ5๐Ÿ‘3โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค– CODERS: Stereo Detection, 6D & Shape ๐Ÿค–

๐Ÿ‘‰CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Xpizz
๐Ÿ‘‰Paper https://lnkd.in/dr5ZxC46
๐Ÿ‘‰Project xingyoujun.github.io/coders/
๐Ÿ‘‰Repo (TBA)
๐Ÿ”ฅ12โค1๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿธ Tracking Everything via Decomposition ๐Ÿธ

๐Ÿ‘‰Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/OsFTO
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.06531
๐Ÿ‘‰Repo github.com/qianduoduolr/DecoMotion
๐Ÿ”ฅ9๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸพTAPVid-3D: benchmark for TAP-3D๐Ÿพ

๐Ÿ‘‰#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SsptD
๐Ÿ‘‰Paper arxiv.org/pdf/2407.05921
๐Ÿ‘‰Project tapvid3d.github.io/
๐Ÿ‘‰Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐Ÿ”ฅ3๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ 940+ FPS Multi-Person Pose Estimation ๐Ÿ”ฅ

๐Ÿ‘‰RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XkBmg
๐Ÿ‘‰Paper arxiv.org/pdf/2407.08634
๐Ÿ‘‰Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
โค8๐Ÿ”ฅ4๐Ÿ‘1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅฅ OmniNOCS: largest 3D NOCS ๐Ÿฅฅ

๐Ÿ‘‰OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/xPgBn
๐Ÿ‘‰Paper arxiv.org/pdf/2407.08711
๐Ÿ‘‰Project https://omninocs.github.io/
๐Ÿ‘‰Data github.com/google-deepmind/omninocs
๐Ÿ”ฅ4โค3๐Ÿ‘2๐Ÿ‘1๐Ÿฅฐ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Œ KineTy: Typography Diffusion ๐Ÿ’Œ

๐Ÿ‘‰GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/2FWo9
๐Ÿ‘‰Paper arxiv.org/pdf/2407.10476
๐Ÿ‘‰Project seonmip.github.io/kinety/
๐Ÿ‘‰Repo github.com/SeonmiP/KineTy/tree/main
โค4๐Ÿ‘1๐Ÿ”ฅ1๐Ÿฅฐ1
๐Ÿ“ˆGradient Boosting Reinforcement Learning๐Ÿ“ˆ

๐Ÿ‘‰#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zv9pl
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.08250
๐Ÿ‘‰Code https://github.com/NVlabs/gbrl
โค7๐Ÿคฏ4๐Ÿ‘3๐Ÿ”ฅ1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฟ Shape of Motion for 4D ๐Ÿงฟ

๐Ÿ‘‰ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/d9RsA
๐Ÿ‘‰Project https://shape-of-motion.github.io/
๐Ÿ‘‰Paper arxiv.org/pdf/2407.13764
๐Ÿ‘‰Code github.com/vye16/shape-of-motion/
โค5๐Ÿคฏ4๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽญ TRG: new SOTA 6DoF Head ๐ŸŽญ

๐Ÿ‘‰ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/lOIRA
๐Ÿ‘‰Paper https://lnkd.in/dCWEwNyF
๐Ÿ‘‰Code https://lnkd.in/dzRrwKBD
๐Ÿ”ฅ5๐Ÿคฏ3๐Ÿ‘1๐Ÿฅฐ1
๐Ÿ†Who's the REAL SOTA tracker in the world?๐Ÿ†

๐Ÿ‘‰BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WB9AR
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.15707
๐Ÿ‘‰Code github.com/BasitAlawode/Best_of_N_Trackers
๐Ÿ”ฅ5๐Ÿคฏ5๐Ÿ‘2โค1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿข TAPTRv2: new SOTA for TAP ๐Ÿข

๐Ÿ‘‰TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/H84ae
๐Ÿ‘‰Paper v1 https://lnkd.in/d4vD_6xx
๐Ÿ‘‰Paper v2 https://lnkd.in/dE_TUzar
๐Ÿ‘‰Project https://taptr.github.io/
๐Ÿ‘‰Code https://lnkd.in/dgfs9Qdy
๐Ÿ‘6๐Ÿ”ฅ3๐Ÿคฏ3โค2๐Ÿ˜ฑ1
๐ŸงฑEAFormer: Scene Text-Segm.๐Ÿงฑ

๐Ÿ‘‰A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐Ÿค—

๐Ÿ‘‰Review https://t.ly/0G2uX
๐Ÿ‘‰Paper arxiv.org/pdf/2407.17020
๐Ÿ‘‰Project hyangyu.github.io/EAFormer/
๐Ÿ‘‰Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐Ÿ”ฅ6๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฝ Keypoint Promptable Re-ID ๐Ÿ‘ฝ

๐Ÿ‘‰KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/vCXV_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.18112
๐Ÿ‘‰Repo github.com/VlSomers/keypoint_promptable_reidentification
๐Ÿ”ฅ6๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ A guide for modern CV ๐ŸŽ

๐Ÿ‘‰In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

๐๐จ๐จ๐ค๐ฌ:
โœ…DL with Python https://t.ly/VjaVx
โœ…Python OOP https://t.ly/pTQRm

V๐ข๐๐ž๐จ ๐‚๐จ๐ฎ๐ซ๐ฌ๐ž๐ฌ:
โœ…Berkeley | Modern CV (2023) https://t.ly/AU7S3

๐‹๐ข๐›๐ซ๐š๐ซ๐ข๐ž๐ฌ:
โœ…PyTorch https://lnkd.in/dTvJbjAx
โœ…PyTorchLighting https://lnkd.in/dAruPA6T
โœ…Albumentations https://albumentations.ai/

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โœ…EfficientNet https://lnkd.in/dTsT44ae
โœ…ViT https://lnkd.in/dB5yKdaW
โœ…UNet https://lnkd.in/dnpKVa6T
โœ…DeepLabV3+ https://lnkd.in/dVvqkmPk
โœ…YOLOv1: https://lnkd.in/dQ9rs53B
โœ…YOLOv2: arxiv.org/abs/1612.08242
โœ…YOLOX: https://lnkd.in/d9ZtsF7g
โœ…SAM: https://arxiv.org/abs/2304.02643

๐Ÿ‘‰More papers and the full list: https://t.ly/WAwAk
โค34๐Ÿ‘19
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช„ Diffusion Models for Transparency ๐Ÿช„

๐Ÿ‘‰MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/U98_G
๐Ÿ‘‰Paper arxiv.org/pdf/2312.02970
๐Ÿ‘‰Project www.prafullsharma.net/alchemist/
๐Ÿ”ฅ17๐Ÿ‘4โšก1โค1๐Ÿคฏ1