This media is not supported in your browser
VIEW IN TELEGRAM
๐ช
Anomaly Object-Detection๐ช
๐The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying โodd-lookingโ objects relative to the other instances within a multiple-views scene. Code announced๐
๐Review https://t.ly/3dGHp
๐Paper arxiv.org/pdf/2406.20099
๐Repo https://lnkd.in/d9x6FpUq
๐The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying โodd-lookingโ objects relative to the other instances within a multiple-views scene. Code announced๐
๐Review https://t.ly/3dGHp
๐Paper arxiv.org/pdf/2406.20099
๐Repo https://lnkd.in/d9x6FpUq
โค10๐ฅ6๐3๐3โก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชฉ MimicMotion: HQ Motion Generation ๐ชฉ
๐#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available๐
๐Review https://t.ly/XFoin
๐Paper arxiv.org/pdf/2406.19680
๐Project https://lnkd.in/eW-CMg_C
๐Code https://lnkd.in/eZ6SC2bc
๐#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available๐
๐Review https://t.ly/XFoin
๐Paper arxiv.org/pdf/2406.19680
๐Project https://lnkd.in/eW-CMg_C
๐Code https://lnkd.in/eZ6SC2bc
๐ฅ12๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชด CAVIS: SOTA Context-Aware Segmentation๐ชด
๐DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced๐
๐Review https://t.ly/G5obN
๐Paper arxiv.org/pdf/2407.03010
๐Repo github.com/Seung-Hun-Lee/CAVIS
๐Project seung-hun-lee.github.io/projects/CAVIS
๐DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced๐
๐Review https://t.ly/G5obN
๐Paper arxiv.org/pdf/2407.03010
๐Repo github.com/Seung-Hun-Lee/CAVIS
๐Project seung-hun-lee.github.io/projects/CAVIS
โค6๐5๐ฅ4๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Segment Any 4D Gaussians ๐ฅ
๐SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐
๐Review https://t.ly/uw3FS
๐Paper https://arxiv.org/pdf/2407.04504
๐Project https://jsxzs.github.io/sa4d/
๐Repo https://github.com/hustvl/SA4D
๐SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐
๐Review https://t.ly/uw3FS
๐Paper https://arxiv.org/pdf/2407.04504
๐Project https://jsxzs.github.io/sa4d/
๐Repo https://github.com/hustvl/SA4D
๐คฏ5๐3โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ค CODERS: Stereo Detection, 6D & Shape ๐ค
๐CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐
๐Review https://t.ly/Xpizz
๐Paper https://lnkd.in/dr5ZxC46
๐Project xingyoujun.github.io/coders/
๐Repo (TBA)
๐CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐
๐Review https://t.ly/Xpizz
๐Paper https://lnkd.in/dr5ZxC46
๐Project xingyoujun.github.io/coders/
๐Repo (TBA)
๐ฅ12โค1๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ธ Tracking Everything via Decomposition ๐ธ
๐Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐
๐Review https://t.ly/OsFTO
๐Paper https://arxiv.org/pdf/2407.06531
๐Repo github.com/qianduoduolr/DecoMotion
๐Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐
๐Review https://t.ly/OsFTO
๐Paper https://arxiv.org/pdf/2407.06531
๐Repo github.com/qianduoduolr/DecoMotion
๐ฅ9๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐พTAPVid-3D: benchmark for TAP-3D๐พ
๐#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐
๐Review https://t.ly/SsptD
๐Paper arxiv.org/pdf/2407.05921
๐Project tapvid3d.github.io/
๐Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐
๐Review https://t.ly/SsptD
๐Paper arxiv.org/pdf/2407.05921
๐Project tapvid3d.github.io/
๐Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐ฅ3๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ 940+ FPS Multi-Person Pose Estimation ๐ฅ
๐RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐
๐Review https://t.ly/XkBmg
๐Paper arxiv.org/pdf/2407.08634
๐Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
๐RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐
๐Review https://t.ly/XkBmg
๐Paper arxiv.org/pdf/2407.08634
๐Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
โค8๐ฅ4๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅฅ OmniNOCS: largest 3D NOCS ๐ฅฅ
๐OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐
๐Review https://t.ly/xPgBn
๐Paper arxiv.org/pdf/2407.08711
๐Project https://omninocs.github.io/
๐Data github.com/google-deepmind/omninocs
๐OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐
๐Review https://t.ly/xPgBn
๐Paper arxiv.org/pdf/2407.08711
๐Project https://omninocs.github.io/
๐Data github.com/google-deepmind/omninocs
๐ฅ4โค3๐2๐1๐ฅฐ1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ KineTy: Typography Diffusion ๐
๐GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐
๐Review https://t.ly/2FWo9
๐Paper arxiv.org/pdf/2407.10476
๐Project seonmip.github.io/kinety/
๐Repo github.com/SeonmiP/KineTy/tree/main
๐GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐
๐Review https://t.ly/2FWo9
๐Paper arxiv.org/pdf/2407.10476
๐Project seonmip.github.io/kinety/
๐Repo github.com/SeonmiP/KineTy/tree/main
โค4๐1๐ฅ1๐ฅฐ1
๐Gradient Boosting Reinforcement Learning๐
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
โค7๐คฏ4๐3๐ฅ1๐ฅฐ1
Hi folks,
I need you help ๐
๐ Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help ๐
๐ Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearningโฆ | Alessandro Ferrari
๐ฝ ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/codingโฆ
๐5
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฟ Shape of Motion for 4D ๐งฟ
๐ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐
๐Review https://t.ly/d9RsA
๐Project https://shape-of-motion.github.io/
๐Paper arxiv.org/pdf/2407.13764
๐Code github.com/vye16/shape-of-motion/
๐ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐
๐Review https://t.ly/d9RsA
๐Project https://shape-of-motion.github.io/
๐Paper arxiv.org/pdf/2407.13764
๐Code github.com/vye16/shape-of-motion/
โค5๐คฏ4๐ฅ2๐1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ญ TRG: new SOTA 6DoF Head ๐ญ
๐ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐
๐Review https://t.ly/lOIRA
๐Paper https://lnkd.in/dCWEwNyF
๐Code https://lnkd.in/dzRrwKBD
๐ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐
๐Review https://t.ly/lOIRA
๐Paper https://lnkd.in/dCWEwNyF
๐Code https://lnkd.in/dzRrwKBD
๐ฅ5๐คฏ3๐1๐ฅฐ1
๐Who's the REAL SOTA tracker in the world?๐
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐ฅ5๐คฏ5๐2โค1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข TAPTRv2: new SOTA for TAP ๐ข
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐6๐ฅ3๐คฏ3โค2๐ฑ1
๐งฑEAFormer: Scene Text-Segm.๐งฑ
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐ฅ6๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฝ Keypoint Promptable Re-ID ๐ฝ
๐KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐
๐Review https://t.ly/vCXV_
๐Paper https://arxiv.org/pdf/2407.18112
๐Repo github.com/VlSomers/keypoint_promptable_reidentification
๐KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐
๐Review https://t.ly/vCXV_
๐Paper https://arxiv.org/pdf/2407.18112
๐Repo github.com/VlSomers/keypoint_promptable_reidentification
๐ฅ6๐3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ A guide for modern CV ๐
๐In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
๐๐จ๐จ๐ค๐ฌ:
โ DL with Python https://t.ly/VjaVx
โ Python OOP https://t.ly/pTQRm
V๐ข๐๐๐จ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ:
โ Berkeley | Modern CV (2023) https://t.ly/AU7S3
๐๐ข๐๐ซ๐๐ซ๐ข๐๐ฌ:
โ PyTorch https://lnkd.in/dTvJbjAx
โ PyTorchLighting https://lnkd.in/dAruPA6T
โ Albumentations https://albumentations.ai/
๐๐๐ฉ๐๐ซ๐ฌ:
โ EfficientNet https://lnkd.in/dTsT44ae
โ ViT https://lnkd.in/dB5yKdaW
โ UNet https://lnkd.in/dnpKVa6T
โ DeepLabV3+ https://lnkd.in/dVvqkmPk
โ YOLOv1: https://lnkd.in/dQ9rs53B
โ YOLOv2: arxiv.org/abs/1612.08242
โ YOLOX: https://lnkd.in/d9ZtsF7g
โ SAM: https://arxiv.org/abs/2304.02643
๐More papers and the full list: https://t.ly/WAwAk
๐In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
๐๐จ๐จ๐ค๐ฌ:
โ DL with Python https://t.ly/VjaVx
โ Python OOP https://t.ly/pTQRm
V๐ข๐๐๐จ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ:
โ Berkeley | Modern CV (2023) https://t.ly/AU7S3
๐๐ข๐๐ซ๐๐ซ๐ข๐๐ฌ:
โ PyTorch https://lnkd.in/dTvJbjAx
โ PyTorchLighting https://lnkd.in/dAruPA6T
โ Albumentations https://albumentations.ai/
๐๐๐ฉ๐๐ซ๐ฌ:
โ EfficientNet https://lnkd.in/dTsT44ae
โ ViT https://lnkd.in/dB5yKdaW
โ UNet https://lnkd.in/dnpKVa6T
โ DeepLabV3+ https://lnkd.in/dVvqkmPk
โ YOLOv1: https://lnkd.in/dQ9rs53B
โ YOLOv2: arxiv.org/abs/1612.08242
โ YOLOX: https://lnkd.in/d9ZtsF7g
โ SAM: https://arxiv.org/abs/2304.02643
๐More papers and the full list: https://t.ly/WAwAk
โค34๐19
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช Diffusion Models for Transparency ๐ช
๐MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐ฅบ
๐Review https://t.ly/U98_G
๐Paper arxiv.org/pdf/2312.02970
๐Project www.prafullsharma.net/alchemist/
๐MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced๐ฅบ
๐Review https://t.ly/U98_G
๐Paper arxiv.org/pdf/2312.02970
๐Project www.prafullsharma.net/alchemist/
๐ฅ17๐4โก1โค1๐คฏ1