This media is not supported in your browser
VIEW IN TELEGRAM
š TRG: new SOTA 6DoF Head š
šECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedš
šReview https://t.ly/lOIRA
šPaper https://lnkd.in/dCWEwNyF
šCode https://lnkd.in/dzRrwKBD
šECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedš
šReview https://t.ly/lOIRA
šPaper https://lnkd.in/dCWEwNyF
šCode https://lnkd.in/dzRrwKBD
š„5š¤Æ3š1š„°1
šWho's the REAL SOTA tracker in the world?š
šBofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableš
šReview https://t.ly/WB9AR
šPaper https://arxiv.org/pdf/2407.15707
šCode github.com/BasitAlawode/Best_of_N_Trackers
šBofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableš
šReview https://t.ly/WB9AR
šPaper https://arxiv.org/pdf/2407.15707
šCode github.com/BasitAlawode/Best_of_N_Trackers
š„5š¤Æ5š2ā¤1š±1
This media is not supported in your browser
VIEW IN TELEGRAM
š¢ TAPTRv2: new SOTA for TAP š¢
šTAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingš
šReview https://t.ly/H84ae
šPaper v1 https://lnkd.in/d4vD_6xx
šPaper v2 https://lnkd.in/dE_TUzar
šProject https://taptr.github.io/
šCode https://lnkd.in/dgfs9Qdy
šTAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingš
šReview https://t.ly/H84ae
šPaper v1 https://lnkd.in/d4vD_6xx
šPaper v2 https://lnkd.in/dE_TUzar
šProject https://taptr.github.io/
šCode https://lnkd.in/dgfs9Qdy
š6š„3š¤Æ3ā¤2š±1
š§±EAFormer: Scene Text-Segm.š§±
šA novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on š¤
šReview https://t.ly/0G2uX
šPaper arxiv.org/pdf/2407.17020
šProject hyangyu.github.io/EAFormer/
šData huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
šA novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on š¤
šReview https://t.ly/0G2uX
šPaper arxiv.org/pdf/2407.17020
šProject hyangyu.github.io/EAFormer/
šData huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
ā¤14š„6š1š„°1
This media is not supported in your browser
VIEW IN TELEGRAM
š½ Keypoint Promptable Re-ID š½
šKPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonš
šReview https://t.ly/vCXV_
šPaper https://arxiv.org/pdf/2407.18112
šRepo github.com/VlSomers/keypoint_promptable_reidentification
šKPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonš
šReview https://t.ly/vCXV_
šPaper https://arxiv.org/pdf/2407.18112
šRepo github.com/VlSomers/keypoint_promptable_reidentification
š„6š3š„°1
This media is not supported in your browser
VIEW IN TELEGRAM
š A guide for modern CV š
šIn the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
ššØšØš¤š¬:
ā DL with Python https://t.ly/VjaVx
ā Python OOP https://t.ly/pTQRm
Vš¢šššØ ššØš®š«š¬šš¬:
ā Berkeley | Modern CV (2023) https://t.ly/AU7S3
šš¢šš«šš«š¢šš¬:
ā PyTorch https://lnkd.in/dTvJbjAx
ā PyTorchLighting https://lnkd.in/dAruPA6T
ā Albumentations https://albumentations.ai/
ššš©šš«š¬:
ā EfficientNet https://lnkd.in/dTsT44ae
ā ViT https://lnkd.in/dB5yKdaW
ā UNet https://lnkd.in/dnpKVa6T
ā DeepLabV3+ https://lnkd.in/dVvqkmPk
ā YOLOv1: https://lnkd.in/dQ9rs53B
ā YOLOv2: arxiv.org/abs/1612.08242
ā YOLOX: https://lnkd.in/d9ZtsF7g
ā SAM: https://arxiv.org/abs/2304.02643
šMore papers and the full list: https://t.ly/WAwAk
šIn the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
ššØšØš¤š¬:
ā DL with Python https://t.ly/VjaVx
ā Python OOP https://t.ly/pTQRm
Vš¢šššØ ššØš®š«š¬šš¬:
ā Berkeley | Modern CV (2023) https://t.ly/AU7S3
šš¢šš«šš«š¢šš¬:
ā PyTorch https://lnkd.in/dTvJbjAx
ā PyTorchLighting https://lnkd.in/dAruPA6T
ā Albumentations https://albumentations.ai/
ššš©šš«š¬:
ā EfficientNet https://lnkd.in/dTsT44ae
ā ViT https://lnkd.in/dB5yKdaW
ā UNet https://lnkd.in/dnpKVa6T
ā DeepLabV3+ https://lnkd.in/dVvqkmPk
ā YOLOv1: https://lnkd.in/dQ9rs53B
ā YOLOv2: arxiv.org/abs/1612.08242
ā YOLOX: https://lnkd.in/d9ZtsF7g
ā SAM: https://arxiv.org/abs/2304.02643
šMore papers and the full list: https://t.ly/WAwAk
ā¤34š19
This media is not supported in your browser
VIEW IN TELEGRAM
šŖ Diffusion Models for Transparency šŖ
šMIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announcedš„ŗ
šReview https://t.ly/U98_G
šPaper arxiv.org/pdf/2312.02970
šProject www.prafullsharma.net/alchemist/
šMIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announcedš„ŗ
šReview https://t.ly/U98_G
šPaper arxiv.org/pdf/2312.02970
šProject www.prafullsharma.net/alchemist/
š„17š4ā”1ā¤1š¤Æ1
This media is not supported in your browser
VIEW IN TELEGRAM
š„š„ SAM v2 is out! š„š„
š#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesš
šReview https://t.ly/oovJZ
šPaper https://t.ly/sCxMY
šDemo https://sam2.metademolab.com
šProject ai.meta.com/blog/segment-anything-2/
šModels github.com/facebookresearch/segment-anything-2
š#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesš
šReview https://t.ly/oovJZ
šPaper https://t.ly/sCxMY
šDemo https://sam2.metademolab.com
šProject ai.meta.com/blog/segment-anything-2/
šModels github.com/facebookresearch/segment-anything-2
š„27ā¤10š¤Æ4š2š¾1
This media is not supported in your browser
VIEW IN TELEGRAM
š Real-time Expressive Hands š
šZhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024š
šReview https://t.ly/8obbB
šProject https://lnkd.in/dRtVGe6i
šPaper https://lnkd.in/daCx2iB7
šCode https://lnkd.in/dZ9pgzug
šZhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024š
šReview https://t.ly/8obbB
šProject https://lnkd.in/dRtVGe6i
šPaper https://lnkd.in/daCx2iB7
šCode https://lnkd.in/dZ9pgzug
š6š3ā¤2š¤£2ā”1š„1
This media is not supported in your browser
VIEW IN TELEGRAM
š§Ŗ Click-Attention Segmentation š§Ŗ
šAn interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apacheš
šReview https://t.ly/tG05L
šPaper https://arxiv.org/pdf/2408.06021
šCode https://github.com/hahamyt/ClickAttention
šAn interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apacheš
šReview https://t.ly/tG05L
šPaper https://arxiv.org/pdf/2408.06021
šCode https://github.com/hahamyt/ClickAttention
ā¤12š„3š2š1š¤©1
This media is not supported in your browser
VIEW IN TELEGRAM
šļø #Adobe Instant TurboEdit šļø
šAdobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
šReview https://t.ly/Na7-y
šPaper https://lnkd.in/dVs9RcCK
šProject https://lnkd.in/dGCqwh9Z
šCode š¢
šAdobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
šReview https://t.ly/Na7-y
šPaper https://lnkd.in/dVs9RcCK
šProject https://lnkd.in/dGCqwh9Z
šCode š¢
š„14š4š„°2š¤©1
This media is not supported in your browser
VIEW IN TELEGRAM
š¦ Zebra Detection & Pose š¦
šThe first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourcedš
šReview https://t.ly/HTEZZ
šPaper https://lnkd.in/dQYT-fyq
šProject https://lnkd.in/dAnNXgG3
šCode https://lnkd.in/dhvU97xD
šThe first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourcedš
šReview https://t.ly/HTEZZ
šPaper https://lnkd.in/dQYT-fyq
šProject https://lnkd.in/dAnNXgG3
šCode https://lnkd.in/dhvU97xD
š7š3ā¤1š„1š„°1š¤Æ1
This media is not supported in your browser
VIEW IN TELEGRAM
š¦§Sapiens: SOTA ViTs for humanš¦§
šMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingš
šReview https://t.ly/GKQI0
šPaper arxiv.org/pdf/2408.12569
šProject rawalkhirodkar.github.io/sapiens
šCode github.com/facebookresearch/sapiens
šMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingš
šReview https://t.ly/GKQI0
šPaper arxiv.org/pdf/2408.12569
šProject rawalkhirodkar.github.io/sapiens
šCode github.com/facebookresearch/sapiens
š„19ā¤7š„°2š1š¤Æ1
AI with Papers - Artificial Intelligence & Deep Learning
š¦§Sapiens: SOTA ViTs for human𦧠šMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingš šReview https://t.ly/GKQI0ā¦
š„š„š„š„š„ SOURCE CODE IS OUT !!! š„š„š„š„š„
Thanks Danny for the info š„
Thanks Danny for the info š„
š11š„4š4ā¤3š±1
This media is not supported in your browser
VIEW IN TELEGRAM
šŗ Diffusion Game Engine šŗ
š#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it š
šReview https://t.ly/_WR5z
šPaper https://lnkd.in/dZqgiqb9
šProject https://lnkd.in/dJUd2Fr6
š#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it š
šReview https://t.ly/_WR5z
šPaper https://lnkd.in/dZqgiqb9
šProject https://lnkd.in/dJUd2Fr6
š„10š5ā¤2š1
This media is not supported in your browser
VIEW IN TELEGRAM
š« Omni Urban Scene Reconstruction š«
šOmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedš
šReview https://t.ly/SXVPa
šPaper arxiv.org/pdf/2408.16760
šProject ziyc.github.io/omnire/
šCode github.com/ziyc/drivestudio
šOmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedš
šReview https://t.ly/SXVPa
šPaper arxiv.org/pdf/2408.16760
šProject ziyc.github.io/omnire/
šCode github.com/ziyc/drivestudio
š„10š9ā¤3š¤Æ1š¾1
This media is not supported in your browser
VIEW IN TELEGRAM
šInteractive Drag-based Editingš
šCSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingš
šReview https://t.ly/hy6SL
šPaper arxiv.org/pdf/2409.08857
šProject joonghyuk.com/instantdrag-web/
šCode github.com/alex4727/InstantDrag
šCSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingš
šReview https://t.ly/hy6SL
šPaper arxiv.org/pdf/2409.08857
šProject joonghyuk.com/instantdrag-web/
šCode github.com/alex4727/InstantDrag
š„13š3š1
This media is not supported in your browser
VIEW IN TELEGRAM
šHand-Object interaction Pretrainingš
šBerkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
šReview https://t.ly/FLqvJ
šPaper https://arxiv.org/pdf/2409.08273
šProject https://hgaurav2k.github.io/hop/
šBerkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
šReview https://t.ly/FLqvJ
šPaper https://arxiv.org/pdf/2409.08273
šProject https://hgaurav2k.github.io/hop/
š„°3ā¤1š1š„1
This media is not supported in your browser
VIEW IN TELEGRAM
š§øMotion Instruction Fine-Tuningš§ø
šMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingš
šReview https://t.ly/iJ2UY
šPaper https://arxiv.org/pdf/2409.10683
šProject https://motif-1k.github.io/
šCode coming
šMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingš
šReview https://t.ly/iJ2UY
šPaper https://arxiv.org/pdf/2409.10683
šProject https://motif-1k.github.io/
šCode coming
š1š„1š¤Æ1š¤©1
This media is not supported in your browser
VIEW IN TELEGRAM
ā½ SoccerNet 2024 Results ā½
šSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
šReview https://t.ly/DUPgx
šPaper arxiv.org/pdf/2409.10587
šRepo github.com/SoccerNet
šProject www.soccer-net.org/
šSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
šReview https://t.ly/DUPgx
šPaper arxiv.org/pdf/2409.10587
šRepo github.com/SoccerNet
šProject www.soccer-net.org/
š„12š6š¤Æ1