This media is not supported in your browser
VIEW IN TELEGRAM
ðŠī CAVIS: SOTA Context-Aware SegmentationðŠī
ðDGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announcedð
ðReview https://t.ly/G5obN
ðPaper arxiv.org/pdf/2407.03010
ðRepo github.com/Seung-Hun-Lee/CAVIS
ðProject seung-hun-lee.github.io/projects/CAVIS
ðDGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announcedð
ðReview https://t.ly/G5obN
ðPaper arxiv.org/pdf/2407.03010
ðRepo github.com/Seung-Hun-Lee/CAVIS
ðProject seung-hun-lee.github.io/projects/CAVIS
âĪ6ð5ðĨ4ð2
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨ Segment Any 4D Gaussians ðĨ
ðSA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024ð
ðReview https://t.ly/uw3FS
ðPaper https://arxiv.org/pdf/2407.04504
ðProject https://jsxzs.github.io/sa4d/
ðRepo https://github.com/hustvl/SA4D
ðSA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024ð
ðReview https://t.ly/uw3FS
ðPaper https://arxiv.org/pdf/2407.04504
ðProject https://jsxzs.github.io/sa4d/
ðRepo https://github.com/hustvl/SA4D
ðĪŊ5ð3âĪ2ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĪ CODERS: Stereo Detection, 6D & Shape ðĪ
ðCODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announcedð
ðReview https://t.ly/Xpizz
ðPaper https://lnkd.in/dr5ZxC46
ðProject xingyoujun.github.io/coders/
ðRepo (TBA)
ðCODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announcedð
ðReview https://t.ly/Xpizz
ðPaper https://lnkd.in/dr5ZxC46
ðProject xingyoujun.github.io/coders/
ðRepo (TBA)
ðĨ12âĪ1ð1ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðļ Tracking Everything via Decomposition ðļ
ðHefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT Licenseð
ðReview https://t.ly/OsFTO
ðPaper https://arxiv.org/pdf/2407.06531
ðRepo github.com/qianduoduolr/DecoMotion
ðHefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT Licenseð
ðReview https://t.ly/OsFTO
ðPaper https://arxiv.org/pdf/2407.06531
ðRepo github.com/qianduoduolr/DecoMotion
ðĨ9ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðūTAPVid-3D: benchmark for TAP-3Dðū
ð#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0ð
ðReview https://t.ly/SsptD
ðPaper arxiv.org/pdf/2407.05921
ðProject tapvid3d.github.io/
ðCode github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
ð#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0ð
ðReview https://t.ly/SsptD
ðPaper arxiv.org/pdf/2407.05921
ðProject tapvid3d.github.io/
ðCode github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
ðĨ3ð1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨ 940+ FPS Multi-Person Pose Estimation ðĨ
ðRTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ð
ðReview https://t.ly/XkBmg
ðPaper arxiv.org/pdf/2407.08634
ðRepo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
ðRTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ð
ðReview https://t.ly/XkBmg
ðPaper arxiv.org/pdf/2407.08634
ðRepo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
âĪ8ðĨ4ð1ðū1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨĨ OmniNOCS: largest 3D NOCS ðĨĨ
ðOmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0ð
ðReview https://t.ly/xPgBn
ðPaper arxiv.org/pdf/2407.08711
ðProject https://omninocs.github.io/
ðData github.com/google-deepmind/omninocs
ðOmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0ð
ðReview https://t.ly/xPgBn
ðPaper arxiv.org/pdf/2407.08711
ðProject https://omninocs.github.io/
ðData github.com/google-deepmind/omninocs
ðĨ4âĪ3ð2ð1ðĨ°1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð KineTy: Typography Diffusion ð
ðGIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0ð
ðReview https://t.ly/2FWo9
ðPaper arxiv.org/pdf/2407.10476
ðProject seonmip.github.io/kinety/
ðRepo github.com/SeonmiP/KineTy/tree/main
ðGIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0ð
ðReview https://t.ly/2FWo9
ðPaper arxiv.org/pdf/2407.10476
ðProject seonmip.github.io/kinety/
ðRepo github.com/SeonmiP/KineTy/tree/main
âĪ4ð1ðĨ1ðĨ°1
ðGradient Boosting Reinforcement Learningð
ð#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedð
ðReview https://t.ly/zv9pl
ðPaper https://arxiv.org/pdf/2407.08250
ðCode https://github.com/NVlabs/gbrl
ð#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedð
ðReview https://t.ly/zv9pl
ðPaper https://arxiv.org/pdf/2407.08250
ðCode https://github.com/NVlabs/gbrl
âĪ7ðĪŊ4ð3ðĨ1ðĨ°1
Hi folks,
I need you help ð
ð Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help ð
ð Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearningâĶ | Alessandro Ferrari
ð― ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/codingâĶ
ð5
This media is not supported in your browser
VIEW IN TELEGRAM
ð§ŋ Shape of Motion for 4D ð§ŋ
ð Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ð
ðReview https://t.ly/d9RsA
ðProject https://shape-of-motion.github.io/
ðPaper arxiv.org/pdf/2407.13764
ðCode github.com/vye16/shape-of-motion/
ð Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ð
ðReview https://t.ly/d9RsA
ðProject https://shape-of-motion.github.io/
ðPaper arxiv.org/pdf/2407.13764
ðCode github.com/vye16/shape-of-motion/
âĪ5ðĪŊ4ðĨ2ð1ðą1
This media is not supported in your browser
VIEW IN TELEGRAM
ð TRG: new SOTA 6DoF Head ð
ðECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedð
ðReview https://t.ly/lOIRA
ðPaper https://lnkd.in/dCWEwNyF
ðCode https://lnkd.in/dzRrwKBD
ðECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedð
ðReview https://t.ly/lOIRA
ðPaper https://lnkd.in/dCWEwNyF
ðCode https://lnkd.in/dzRrwKBD
ðĨ5ðĪŊ3ð1ðĨ°1
ðWho's the REAL SOTA tracker in the world?ð
ðBofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableð
ðReview https://t.ly/WB9AR
ðPaper https://arxiv.org/pdf/2407.15707
ðCode github.com/BasitAlawode/Best_of_N_Trackers
ðBofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableð
ðReview https://t.ly/WB9AR
ðPaper https://arxiv.org/pdf/2407.15707
ðCode github.com/BasitAlawode/Best_of_N_Trackers
ðĨ5ðĪŊ5ð2âĪ1ðą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĒ TAPTRv2: new SOTA for TAP ðĒ
ðTAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingð
ðReview https://t.ly/H84ae
ðPaper v1 https://lnkd.in/d4vD_6xx
ðPaper v2 https://lnkd.in/dE_TUzar
ðProject https://taptr.github.io/
ðCode https://lnkd.in/dgfs9Qdy
ðTAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingð
ðReview https://t.ly/H84ae
ðPaper v1 https://lnkd.in/d4vD_6xx
ðPaper v2 https://lnkd.in/dE_TUzar
ðProject https://taptr.github.io/
ðCode https://lnkd.in/dgfs9Qdy
ð6ðĨ3ðĪŊ3âĪ2ðą1
ð§ąEAFormer: Scene Text-Segm.ð§ą
ðA novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ðĪ
ðReview https://t.ly/0G2uX
ðPaper arxiv.org/pdf/2407.17020
ðProject hyangyu.github.io/EAFormer/
ðData huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
ðA novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ðĪ
ðReview https://t.ly/0G2uX
ðPaper arxiv.org/pdf/2407.17020
ðProject hyangyu.github.io/EAFormer/
ðData huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
âĪ14ðĨ6ð1ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ð― Keypoint Promptable Re-ID ð―
ðKPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonð
ðReview https://t.ly/vCXV_
ðPaper https://arxiv.org/pdf/2407.18112
ðRepo github.com/VlSomers/keypoint_promptable_reidentification
ðKPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonð
ðReview https://t.ly/vCXV_
ðPaper https://arxiv.org/pdf/2407.18112
ðRepo github.com/VlSomers/keypoint_promptable_reidentification
ðĨ6ð3ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ð A guide for modern CV ð
ðIn the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
ððĻðĻðĪðŽ:
â DL with Python https://t.ly/VjaVx
â Python OOP https://t.ly/pTQRm
VðĒðððĻ ððĻðŪðŦðŽððŽ:
â Berkeley | Modern CV (2023) https://t.ly/AU7S3
ððĒððŦððŦðĒððŽ:
â PyTorch https://lnkd.in/dTvJbjAx
â PyTorchLighting https://lnkd.in/dAruPA6T
â Albumentations https://albumentations.ai/
ðððĐððŦðŽ:
â EfficientNet https://lnkd.in/dTsT44ae
â ViT https://lnkd.in/dB5yKdaW
â UNet https://lnkd.in/dnpKVa6T
â DeepLabV3+ https://lnkd.in/dVvqkmPk
â YOLOv1: https://lnkd.in/dQ9rs53B
â YOLOv2: arxiv.org/abs/1612.08242
â YOLOX: https://lnkd.in/d9ZtsF7g
â SAM: https://arxiv.org/abs/2304.02643
ðMore papers and the full list: https://t.ly/WAwAk
ðIn the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
ððĻðĻðĪðŽ:
â DL with Python https://t.ly/VjaVx
â Python OOP https://t.ly/pTQRm
VðĒðððĻ ððĻðŪðŦðŽððŽ:
â Berkeley | Modern CV (2023) https://t.ly/AU7S3
ððĒððŦððŦðĒððŽ:
â PyTorch https://lnkd.in/dTvJbjAx
â PyTorchLighting https://lnkd.in/dAruPA6T
â Albumentations https://albumentations.ai/
ðððĐððŦðŽ:
â EfficientNet https://lnkd.in/dTsT44ae
â ViT https://lnkd.in/dB5yKdaW
â UNet https://lnkd.in/dnpKVa6T
â DeepLabV3+ https://lnkd.in/dVvqkmPk
â YOLOv1: https://lnkd.in/dQ9rs53B
â YOLOv2: arxiv.org/abs/1612.08242
â YOLOX: https://lnkd.in/d9ZtsF7g
â SAM: https://arxiv.org/abs/2304.02643
ðMore papers and the full list: https://t.ly/WAwAk
âĪ34ð19
This media is not supported in your browser
VIEW IN TELEGRAM
ðŠ Diffusion Models for Transparency ðŠ
ðMIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announcedðĨš
ðReview https://t.ly/U98_G
ðPaper arxiv.org/pdf/2312.02970
ðProject www.prafullsharma.net/alchemist/
ðMIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announcedðĨš
ðReview https://t.ly/U98_G
ðPaper arxiv.org/pdf/2312.02970
ðProject www.prafullsharma.net/alchemist/
ðĨ17ð4âĄ1âĪ1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨðĨ SAM v2 is out! ðĨðĨ
ð#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesð
ðReview https://t.ly/oovJZ
ðPaper https://t.ly/sCxMY
ðDemo https://sam2.metademolab.com
ðProject ai.meta.com/blog/segment-anything-2/
ðModels github.com/facebookresearch/segment-anything-2
ð#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesð
ðReview https://t.ly/oovJZ
ðPaper https://t.ly/sCxMY
ðDemo https://sam2.metademolab.com
ðProject ai.meta.com/blog/segment-anything-2/
ðModels github.com/facebookresearch/segment-anything-2
ðĨ27âĪ10ðĪŊ4ð2ðū1
This media is not supported in your browser
VIEW IN TELEGRAM
ð Real-time Expressive Hands ð
ðZhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024ð
ðReview https://t.ly/8obbB
ðProject https://lnkd.in/dRtVGe6i
ðPaper https://lnkd.in/daCx2iB7
ðCode https://lnkd.in/dZ9pgzug
ðZhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024ð
ðReview https://t.ly/8obbB
ðProject https://lnkd.in/dRtVGe6i
ðPaper https://lnkd.in/daCx2iB7
ðCode https://lnkd.in/dZ9pgzug
ð6ð3âĪ2ðĪĢ2âĄ1ðĨ1