This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Segment Any 4D Gaussians ๐ฅ
๐SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐
๐Review https://t.ly/uw3FS
๐Paper https://arxiv.org/pdf/2407.04504
๐Project https://jsxzs.github.io/sa4d/
๐Repo https://github.com/hustvl/SA4D
๐SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐
๐Review https://t.ly/uw3FS
๐Paper https://arxiv.org/pdf/2407.04504
๐Project https://jsxzs.github.io/sa4d/
๐Repo https://github.com/hustvl/SA4D
๐คฏ5๐3โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ค CODERS: Stereo Detection, 6D & Shape ๐ค
๐CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐
๐Review https://t.ly/Xpizz
๐Paper https://lnkd.in/dr5ZxC46
๐Project xingyoujun.github.io/coders/
๐Repo (TBA)
๐CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐
๐Review https://t.ly/Xpizz
๐Paper https://lnkd.in/dr5ZxC46
๐Project xingyoujun.github.io/coders/
๐Repo (TBA)
๐ฅ12โค1๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ธ Tracking Everything via Decomposition ๐ธ
๐Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐
๐Review https://t.ly/OsFTO
๐Paper https://arxiv.org/pdf/2407.06531
๐Repo github.com/qianduoduolr/DecoMotion
๐Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐
๐Review https://t.ly/OsFTO
๐Paper https://arxiv.org/pdf/2407.06531
๐Repo github.com/qianduoduolr/DecoMotion
๐ฅ9๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐พTAPVid-3D: benchmark for TAP-3D๐พ
๐#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐
๐Review https://t.ly/SsptD
๐Paper arxiv.org/pdf/2407.05921
๐Project tapvid3d.github.io/
๐Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐
๐Review https://t.ly/SsptD
๐Paper arxiv.org/pdf/2407.05921
๐Project tapvid3d.github.io/
๐Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐ฅ3๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ 940+ FPS Multi-Person Pose Estimation ๐ฅ
๐RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐
๐Review https://t.ly/XkBmg
๐Paper arxiv.org/pdf/2407.08634
๐Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
๐RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐
๐Review https://t.ly/XkBmg
๐Paper arxiv.org/pdf/2407.08634
๐Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
โค8๐ฅ4๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅฅ OmniNOCS: largest 3D NOCS ๐ฅฅ
๐OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐
๐Review https://t.ly/xPgBn
๐Paper arxiv.org/pdf/2407.08711
๐Project https://omninocs.github.io/
๐Data github.com/google-deepmind/omninocs
๐OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐
๐Review https://t.ly/xPgBn
๐Paper arxiv.org/pdf/2407.08711
๐Project https://omninocs.github.io/
๐Data github.com/google-deepmind/omninocs
๐ฅ4โค3๐2๐1๐ฅฐ1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ KineTy: Typography Diffusion ๐
๐GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐
๐Review https://t.ly/2FWo9
๐Paper arxiv.org/pdf/2407.10476
๐Project seonmip.github.io/kinety/
๐Repo github.com/SeonmiP/KineTy/tree/main
๐GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐
๐Review https://t.ly/2FWo9
๐Paper arxiv.org/pdf/2407.10476
๐Project seonmip.github.io/kinety/
๐Repo github.com/SeonmiP/KineTy/tree/main
โค4๐1๐ฅ1๐ฅฐ1
๐Gradient Boosting Reinforcement Learning๐
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
โค7๐คฏ4๐3๐ฅ1๐ฅฐ1
Hi folks,
I need you help ๐
๐ Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help ๐
๐ Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearningโฆ | Alessandro Ferrari
๐ฝ ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/codingโฆ
๐5
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฟ Shape of Motion for 4D ๐งฟ
๐ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐
๐Review https://t.ly/d9RsA
๐Project https://shape-of-motion.github.io/
๐Paper arxiv.org/pdf/2407.13764
๐Code github.com/vye16/shape-of-motion/
๐ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐
๐Review https://t.ly/d9RsA
๐Project https://shape-of-motion.github.io/
๐Paper arxiv.org/pdf/2407.13764
๐Code github.com/vye16/shape-of-motion/
โค5๐คฏ4๐ฅ2๐1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ญ TRG: new SOTA 6DoF Head ๐ญ
๐ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐
๐Review https://t.ly/lOIRA
๐Paper https://lnkd.in/dCWEwNyF
๐Code https://lnkd.in/dzRrwKBD
๐ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐
๐Review https://t.ly/lOIRA
๐Paper https://lnkd.in/dCWEwNyF
๐Code https://lnkd.in/dzRrwKBD
๐ฅ5๐คฏ3๐1๐ฅฐ1
๐Who's the REAL SOTA tracker in the world?๐
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐ฅ5๐คฏ5๐2โค1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข TAPTRv2: new SOTA for TAP ๐ข
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐6๐ฅ3๐คฏ3โค2๐ฑ1
๐งฑEAFormer: Scene Text-Segm.๐งฑ
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐ฅ6๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฝ Keypoint Promptable Re-ID ๐ฝ
๐KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐
๐Review https://t.ly/vCXV_
๐Paper https://arxiv.org/pdf/2407.18112
๐Repo github.com/VlSomers/keypoint_promptable_reidentification
๐KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐
๐Review https://t.ly/vCXV_
๐Paper https://arxiv.org/pdf/2407.18112
๐Repo github.com/VlSomers/keypoint_promptable_reidentification
๐ฅ6๐3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ A guide for modern CV ๐
๐In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
๐๐จ๐จ๐ค๐ฌ:
โ DL with Python https://t.ly/VjaVx
โ Python OOP https://t.ly/pTQRm
V๐ข๐๐๐จ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ:
โ Berkeley | Modern CV (2023) https://t.ly/AU7S3
๐๐ข๐๐ซ๐๐ซ๐ข๐๐ฌ:
โ PyTorch https://lnkd.in/dTvJbjAx
โ PyTorchLighting https://lnkd.in/dAruPA6T
โ Albumentations https://albumentations.ai/
๐๐๐ฉ๐๐ซ๐ฌ:
โ EfficientNet https://lnkd.in/dTsT44ae
โ ViT https://lnkd.in/dB5yKdaW
โ UNet https://lnkd.in/dnpKVa6T
โ DeepLabV3+ https://lnkd.in/dVvqkmPk
โ YOLOv1: https://lnkd.in/dQ9rs53B
โ YOLOv2: arxiv.org/abs/1612.08242
โ YOLOX: https://lnkd.in/d9ZtsF7g
โ SAM: https://arxiv.org/abs/2304.02643
๐More papers and the full list: https://t.ly/WAwAk
๐In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
๐๐จ๐จ๐ค๐ฌ:
โ DL with Python https://t.ly/VjaVx
โ Python OOP https://t.ly/pTQRm
V๐ข๐๐๐จ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ:
โ Berkeley | Modern CV (2023) https://t.ly/AU7S3
๐๐ข๐๐ซ๐๐ซ๐ข๐๐ฌ:
โ PyTorch https://lnkd.in/dTvJbjAx
โ PyTorchLighting https://lnkd.in/dAruPA6T
โ Albumentations https://albumentations.ai/
๐๐๐ฉ๐๐ซ๐ฌ:
โ EfficientNet https://lnkd.in/dTsT44ae
โ ViT https://lnkd.in/dB5yKdaW
โ UNet https://lnkd.in/dnpKVa6T
โ DeepLabV3+ https://lnkd.in/dVvqkmPk
โ YOLOv1: https://lnkd.in/dQ9rs53B
โ YOLOv2: arxiv.org/abs/1612.08242
โ YOLOX: https://lnkd.in/d9ZtsF7g
โ SAM: https://arxiv.org/abs/2304.02643
๐More papers and the full list: https://t.ly/WAwAk
โค34๐19
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช Diffusion Models for Transparency ๐ช
๐MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐ฅบ
๐Review https://t.ly/U98_G
๐Paper arxiv.org/pdf/2312.02970
๐Project www.prafullsharma.net/alchemist/
๐MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐ฅบ
๐Review https://t.ly/U98_G
๐Paper arxiv.org/pdf/2312.02970
๐Project www.prafullsharma.net/alchemist/
๐ฅ17๐4โก1โค1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ๐ฅ SAM v2 is out! ๐ฅ๐ฅ
๐#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses๐
๐Review https://t.ly/oovJZ
๐Paper https://t.ly/sCxMY
๐Demo https://sam2.metademolab.com
๐Project ai.meta.com/blog/segment-anything-2/
๐Models github.com/facebookresearch/segment-anything-2
๐#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses๐
๐Review https://t.ly/oovJZ
๐Paper https://t.ly/sCxMY
๐Demo https://sam2.metademolab.com
๐Project ai.meta.com/blog/segment-anything-2/
๐Models github.com/facebookresearch/segment-anything-2
๐ฅ27โค10๐คฏ4๐2๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Real-time Expressive Hands ๐
๐Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024๐
๐Review https://t.ly/8obbB
๐Project https://lnkd.in/dRtVGe6i
๐Paper https://lnkd.in/daCx2iB7
๐Code https://lnkd.in/dZ9pgzug
๐Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024๐
๐Review https://t.ly/8obbB
๐Project https://lnkd.in/dRtVGe6i
๐Paper https://lnkd.in/daCx2iB7
๐Code https://lnkd.in/dZ9pgzug
๐6๐3โค2๐คฃ2โก1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งช Click-Attention Segmentation ๐งช
๐An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache๐
๐Review https://t.ly/tG05L
๐Paper https://arxiv.org/pdf/2408.06021
๐Code https://github.com/hahamyt/ClickAttention
๐An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache๐
๐Review https://t.ly/tG05L
๐Paper https://arxiv.org/pdf/2408.06021
๐Code https://github.com/hahamyt/ClickAttention
โค12๐ฅ3๐2๐1๐คฉ1