AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŠī CAVIS: SOTA Context-Aware SegmentationðŸŠī

👉DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced💙

👉Review https://t.ly/G5obN
👉Paper arxiv.org/pdf/2407.03010
👉Repo github.com/Seung-Hun-Lee/CAVIS
👉Project seung-hun-lee.github.io/projects/CAVIS
âĪ6👍5ðŸ”Ĩ4👏2
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ”Ĩ Segment Any 4D Gaussians ðŸ”Ĩ

👉SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024💙

👉Review https://t.ly/uw3FS
👉Paper https://arxiv.org/pdf/2407.04504
👉Project https://jsxzs.github.io/sa4d/
👉Repo https://github.com/hustvl/SA4D
ðŸĪŊ5👍3âĪ2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĪ– CODERS: Stereo Detection, 6D & Shape ðŸĪ–

👉CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced💙

👉Review https://t.ly/Xpizz
👉Paper https://lnkd.in/dr5ZxC46
👉Project xingyoujun.github.io/coders/
👉Repo (TBA)
ðŸ”Ĩ12âĪ1👍1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸļ Tracking Everything via Decomposition ðŸļ

👉Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License💙

👉Review https://t.ly/OsFTO
👉Paper https://arxiv.org/pdf/2407.06531
👉Repo github.com/qianduoduolr/DecoMotion
ðŸ”Ĩ9👍1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸūTAPVid-3D: benchmark for TAP-3DðŸū

👉#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0💙

👉Review https://t.ly/SsptD
👉Paper arxiv.org/pdf/2407.05921
👉Project tapvid3d.github.io/
👉Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
ðŸ”Ĩ3👍1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ”Ĩ 940+ FPS Multi-Person Pose Estimation ðŸ”Ĩ

👉RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models 💙

👉Review https://t.ly/XkBmg
👉Paper arxiv.org/pdf/2407.08634
👉Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
âĪ8ðŸ”Ĩ4👏1ðŸū1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĨĨ OmniNOCS: largest 3D NOCS ðŸĨĨ

👉OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0💙

👉Review https://t.ly/xPgBn
👉Paper arxiv.org/pdf/2407.08711
👉Project https://omninocs.github.io/
👉Data github.com/google-deepmind/omninocs
ðŸ”Ĩ4âĪ3👏2👍1ðŸĨ°1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
💌 KineTy: Typography Diffusion 💌

👉GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0💙

👉Review https://t.ly/2FWo9
👉Paper arxiv.org/pdf/2407.10476
👉Project seonmip.github.io/kinety/
👉Repo github.com/SeonmiP/KineTy/tree/main
âĪ4👍1ðŸ”Ĩ1ðŸĨ°1
📈Gradient Boosting Reinforcement Learning📈

👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙

👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
âĪ7ðŸĪŊ4👍3ðŸ”Ĩ1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ§ŋ Shape of Motion for 4D ðŸ§ŋ

👉 Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released 💙

👉Review https://t.ly/d9RsA
👉Project https://shape-of-motion.github.io/
👉Paper arxiv.org/pdf/2407.13764
👉Code github.com/vye16/shape-of-motion/
âĪ5ðŸĪŊ4ðŸ”Ĩ2👍1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭 TRG: new SOTA 6DoF Head 🎭

👉ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released💙

👉Review https://t.ly/lOIRA
👉Paper https://lnkd.in/dCWEwNyF
👉Code https://lnkd.in/dzRrwKBD
ðŸ”Ĩ5ðŸĪŊ3👍1ðŸĨ°1
🏆Who's the REAL SOTA tracker in the world?🏆

👉BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available💙

👉Review https://t.ly/WB9AR
👉Paper https://arxiv.org/pdf/2407.15707
👉Code github.com/BasitAlawode/Best_of_N_Trackers
ðŸ”Ĩ5ðŸĪŊ5👍2âĪ1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĒ TAPTRv2: new SOTA for TAP ðŸĒ

👉TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming💙

👉Review https://t.ly/H84ae
👉Paper v1 https://lnkd.in/d4vD_6xx
👉Paper v2 https://lnkd.in/dE_TUzar
👉Project https://taptr.github.io/
👉Code https://lnkd.in/dgfs9Qdy
👍6ðŸ”Ĩ3ðŸĪŊ3âĪ2ðŸ˜ą1
ðŸ§ąEAFormer: Scene Text-Segm.ðŸ§ą

👉A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ðŸĪ—

👉Review https://t.ly/0G2uX
👉Paper arxiv.org/pdf/2407.17020
👉Project hyangyu.github.io/EAFormer/
👉Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
âĪ14ðŸ”Ĩ6👍1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ‘― Keypoint Promptable Re-ID ðŸ‘―

👉KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon💙

👉Review https://t.ly/vCXV_
👉Paper https://arxiv.org/pdf/2407.18112
👉Repo github.com/VlSomers/keypoint_promptable_reidentification
ðŸ”Ĩ6👍3ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁 A guide for modern CV 🎁

👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

𝐁ðĻðĻðĪ𝐎:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm

VðĒ𝐝𝐞ðĻ 𝐂ðĻðŪðŦ𝐎𝐞𝐎:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3

𝐋ðĒ𝐛ðŦ𝐚ðŦðĒ𝐞𝐎:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/

𝐏𝐚ðĐ𝐞ðŦ𝐎:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643

👉More papers and the full list: https://t.ly/WAwAk
âĪ34👍19
This media is not supported in your browser
VIEW IN TELEGRAM
🊄 Diffusion Models for Transparency 🊄

👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announcedðŸĨš

👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
ðŸ”Ĩ17👍4⚡1âĪ1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ”ĨðŸ”Ĩ SAM v2 is out! ðŸ”ĨðŸ”Ĩ

👉#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses💙

👉Review https://t.ly/oovJZ
👉Paper https://t.ly/sCxMY
👉Demo https://sam2.metademolab.com
👉Project ai.meta.com/blog/segment-anything-2/
👉Models github.com/facebookresearch/segment-anything-2
ðŸ”Ĩ27âĪ10ðŸĪŊ4👍2ðŸū1
This media is not supported in your browser
VIEW IN TELEGRAM
👋 Real-time Expressive Hands 👋

👉Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024💙

👉Review https://t.ly/8obbB
👉Project https://lnkd.in/dRtVGe6i
👉Paper https://lnkd.in/daCx2iB7
👉Code https://lnkd.in/dZ9pgzug
👏6👍3âĪ2ðŸĪĢ2⚡1ðŸ”Ĩ1