๐Who's the REAL SOTA tracker in the world?๐
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐ฅ5๐คฏ5๐2โค1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข TAPTRv2: new SOTA for TAP ๐ข
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐6๐ฅ3๐คฏ3โค2๐ฑ1
๐งฑEAFormer: Scene Text-Segm.๐งฑ
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐ฅ6๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฝ Keypoint Promptable Re-ID ๐ฝ
๐KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐
๐Review https://t.ly/vCXV_
๐Paper https://arxiv.org/pdf/2407.18112
๐Repo github.com/VlSomers/keypoint_promptable_reidentification
๐KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐
๐Review https://t.ly/vCXV_
๐Paper https://arxiv.org/pdf/2407.18112
๐Repo github.com/VlSomers/keypoint_promptable_reidentification
๐ฅ6๐3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ A guide for modern CV ๐
๐In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
๐๐จ๐จ๐ค๐ฌ:
โ DL with Python https://t.ly/VjaVx
โ Python OOP https://t.ly/pTQRm
V๐ข๐๐๐จ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ:
โ Berkeley | Modern CV (2023) https://t.ly/AU7S3
๐๐ข๐๐ซ๐๐ซ๐ข๐๐ฌ:
โ PyTorch https://lnkd.in/dTvJbjAx
โ PyTorchLighting https://lnkd.in/dAruPA6T
โ Albumentations https://albumentations.ai/
๐๐๐ฉ๐๐ซ๐ฌ:
โ EfficientNet https://lnkd.in/dTsT44ae
โ ViT https://lnkd.in/dB5yKdaW
โ UNet https://lnkd.in/dnpKVa6T
โ DeepLabV3+ https://lnkd.in/dVvqkmPk
โ YOLOv1: https://lnkd.in/dQ9rs53B
โ YOLOv2: arxiv.org/abs/1612.08242
โ YOLOX: https://lnkd.in/d9ZtsF7g
โ SAM: https://arxiv.org/abs/2304.02643
๐More papers and the full list: https://t.ly/WAwAk
๐In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
๐๐จ๐จ๐ค๐ฌ:
โ DL with Python https://t.ly/VjaVx
โ Python OOP https://t.ly/pTQRm
V๐ข๐๐๐จ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ:
โ Berkeley | Modern CV (2023) https://t.ly/AU7S3
๐๐ข๐๐ซ๐๐ซ๐ข๐๐ฌ:
โ PyTorch https://lnkd.in/dTvJbjAx
โ PyTorchLighting https://lnkd.in/dAruPA6T
โ Albumentations https://albumentations.ai/
๐๐๐ฉ๐๐ซ๐ฌ:
โ EfficientNet https://lnkd.in/dTsT44ae
โ ViT https://lnkd.in/dB5yKdaW
โ UNet https://lnkd.in/dnpKVa6T
โ DeepLabV3+ https://lnkd.in/dVvqkmPk
โ YOLOv1: https://lnkd.in/dQ9rs53B
โ YOLOv2: arxiv.org/abs/1612.08242
โ YOLOX: https://lnkd.in/d9ZtsF7g
โ SAM: https://arxiv.org/abs/2304.02643
๐More papers and the full list: https://t.ly/WAwAk
โค34๐19
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช Diffusion Models for Transparency ๐ช
๐MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐ฅบ
๐Review https://t.ly/U98_G
๐Paper arxiv.org/pdf/2312.02970
๐Project www.prafullsharma.net/alchemist/
๐MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐ฅบ
๐Review https://t.ly/U98_G
๐Paper arxiv.org/pdf/2312.02970
๐Project www.prafullsharma.net/alchemist/
๐ฅ17๐4โก1โค1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ๐ฅ SAM v2 is out! ๐ฅ๐ฅ
๐#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses๐
๐Review https://t.ly/oovJZ
๐Paper https://t.ly/sCxMY
๐Demo https://sam2.metademolab.com
๐Project ai.meta.com/blog/segment-anything-2/
๐Models github.com/facebookresearch/segment-anything-2
๐#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses๐
๐Review https://t.ly/oovJZ
๐Paper https://t.ly/sCxMY
๐Demo https://sam2.metademolab.com
๐Project ai.meta.com/blog/segment-anything-2/
๐Models github.com/facebookresearch/segment-anything-2
๐ฅ27โค10๐คฏ4๐2๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Real-time Expressive Hands ๐
๐Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024๐
๐Review https://t.ly/8obbB
๐Project https://lnkd.in/dRtVGe6i
๐Paper https://lnkd.in/daCx2iB7
๐Code https://lnkd.in/dZ9pgzug
๐Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024๐
๐Review https://t.ly/8obbB
๐Project https://lnkd.in/dRtVGe6i
๐Paper https://lnkd.in/daCx2iB7
๐Code https://lnkd.in/dZ9pgzug
๐6๐3โค2๐คฃ2โก1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งช Click-Attention Segmentation ๐งช
๐An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache๐
๐Review https://t.ly/tG05L
๐Paper https://arxiv.org/pdf/2408.06021
๐Code https://github.com/hahamyt/ClickAttention
๐An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache๐
๐Review https://t.ly/tG05L
๐Paper https://arxiv.org/pdf/2408.06021
๐Code https://github.com/hahamyt/ClickAttention
โค12๐ฅ3๐2๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธ #Adobe Instant TurboEdit ๐๏ธ
๐Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
๐Review https://t.ly/Na7-y
๐Paper https://lnkd.in/dVs9RcCK
๐Project https://lnkd.in/dGCqwh9Z
๐Code ๐ข
๐Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
๐Review https://t.ly/Na7-y
๐Paper https://lnkd.in/dVs9RcCK
๐Project https://lnkd.in/dGCqwh9Z
๐Code ๐ข
๐ฅ14๐4๐ฅฐ2๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Zebra Detection & Pose ๐ฆ
๐The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced๐
๐Review https://t.ly/HTEZZ
๐Paper https://lnkd.in/dQYT-fyq
๐Project https://lnkd.in/dAnNXgG3
๐Code https://lnkd.in/dhvU97xD
๐The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced๐
๐Review https://t.ly/HTEZZ
๐Paper https://lnkd.in/dQYT-fyq
๐Project https://lnkd.in/dAnNXgG3
๐Code https://lnkd.in/dhvU97xD
๐7๐3โค1๐ฅ1๐ฅฐ1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆงSapiens: SOTA ViTs for human๐ฆง
๐META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming๐
๐Review https://t.ly/GKQI0
๐Paper arxiv.org/pdf/2408.12569
๐Project rawalkhirodkar.github.io/sapiens
๐Code github.com/facebookresearch/sapiens
๐META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming๐
๐Review https://t.ly/GKQI0
๐Paper arxiv.org/pdf/2408.12569
๐Project rawalkhirodkar.github.io/sapiens
๐Code github.com/facebookresearch/sapiens
๐ฅ19โค7๐ฅฐ2๐1๐คฏ1
AI with Papers - Artificial Intelligence & Deep Learning
๐ฆงSapiens: SOTA ViTs for human๐ฆง ๐META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming๐ ๐Review https://t.ly/GKQI0โฆ
๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ SOURCE CODE IS OUT !!! ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ
Thanks Danny for the info ๐ฅ
Thanks Danny for the info ๐ฅ
๐11๐ฅ4๐4โค3๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐บ Diffusion Game Engine ๐บ
๐#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it ๐
๐Review https://t.ly/_WR5z
๐Paper https://lnkd.in/dZqgiqb9
๐Project https://lnkd.in/dJUd2Fr6
๐#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it ๐
๐Review https://t.ly/_WR5z
๐Paper https://lnkd.in/dZqgiqb9
๐Project https://lnkd.in/dJUd2Fr6
๐ฅ10๐5โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ Omni Urban Scene Reconstruction ๐ซ
๐OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released๐
๐Review https://t.ly/SXVPa
๐Paper arxiv.org/pdf/2408.16760
๐Project ziyc.github.io/omnire/
๐Code github.com/ziyc/drivestudio
๐OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released๐
๐Review https://t.ly/SXVPa
๐Paper arxiv.org/pdf/2408.16760
๐Project ziyc.github.io/omnire/
๐Code github.com/ziyc/drivestudio
๐ฅ10๐9โค3๐คฏ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Interactive Drag-based Editing๐
๐CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming๐
๐Review https://t.ly/hy6SL
๐Paper arxiv.org/pdf/2409.08857
๐Project joonghyuk.com/instantdrag-web/
๐Code github.com/alex4727/InstantDrag
๐CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming๐
๐Review https://t.ly/hy6SL
๐Paper arxiv.org/pdf/2409.08857
๐Project joonghyuk.com/instantdrag-web/
๐Code github.com/alex4727/InstantDrag
๐ฅ13๐3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ญHand-Object interaction Pretraining๐ญ
๐Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
๐Review https://t.ly/FLqvJ
๐Paper https://arxiv.org/pdf/2409.08273
๐Project https://hgaurav2k.github.io/hop/
๐Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
๐Review https://t.ly/FLqvJ
๐Paper https://arxiv.org/pdf/2409.08273
๐Project https://hgaurav2k.github.io/hop/
๐ฅฐ3โค1๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งธMotion Instruction Fine-Tuning๐งธ
๐MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming๐
๐Review https://t.ly/iJ2UY
๐Paper https://arxiv.org/pdf/2409.10683
๐Project https://motif-1k.github.io/
๐Code coming
๐MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming๐
๐Review https://t.ly/iJ2UY
๐Paper https://arxiv.org/pdf/2409.10683
๐Project https://motif-1k.github.io/
๐Code coming
๐1๐ฅ1๐คฏ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โฝ SoccerNet 2024 Results โฝ
๐SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
๐Review https://t.ly/DUPgx
๐Paper arxiv.org/pdf/2409.10587
๐Repo github.com/SoccerNet
๐Project www.soccer-net.org/
๐SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
๐Review https://t.ly/DUPgx
๐Paper arxiv.org/pdf/2409.10587
๐Repo github.com/SoccerNet
๐Project www.soccer-net.org/
๐ฅ12๐6๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ JoyHallo: Mandarin Digital Human ๐
๐JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageโs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available๐
๐Review https://t.ly/5NGDh
๐Paper arxiv.org/pdf/2409.13268
๐Project jdh-algo.github.io/JoyHallo/
๐Code github.com/jdh-algo/JoyHallo
๐JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageโs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available๐
๐Review https://t.ly/5NGDh
๐Paper arxiv.org/pdf/2409.13268
๐Project jdh-algo.github.io/JoyHallo/
๐Code github.com/jdh-algo/JoyHallo
๐ฅ9๐1๐คฏ1