This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎMeshAnything with Transformers๐ฎ
๐MeshAnything converts any 3D representation into Artist-Created Meshes (AMs), i.e., meshes created by human artists. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to transform their results into AMs that can be seamlessly applied in the 3D industry. Source Code available๐
๐Review https://t.ly/HvkD4
๐Paper arxiv.org/pdf/2406.10163
๐Code github.com/buaacyw/MeshAnything
๐MeshAnything converts any 3D representation into Artist-Created Meshes (AMs), i.e., meshes created by human artists. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to transform their results into AMs that can be seamlessly applied in the 3D industry. Source Code available๐
๐Review https://t.ly/HvkD4
๐Paper arxiv.org/pdf/2406.10163
๐Code github.com/buaacyw/MeshAnything
๐คฏ11โค10๐ฅ5๐4๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐พLLaNA: NeRF-LLM assistant๐พ
๐UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.
๐Review https://t.ly/JAfhV
๐Paper arxiv.org/pdf/2406.11840
๐Project andreamaduzzi.github.io/llana/
๐Code & Data coming
๐UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.
๐Review https://t.ly/JAfhV
๐Paper arxiv.org/pdf/2406.11840
๐Project andreamaduzzi.github.io/llana/
๐Code & Data coming
โค16๐ฅ2๐2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Depth Anything v2 is out! ๐ฅ
๐ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available๐
๐Review https://t.ly/QX9Nu
๐Paper arxiv.org/pdf/2406.09414
๐Project depth-anything-v2.github.io/
๐Repo github.com/DepthAnything/Depth-Anything-V2
๐Data huggingface.co/datasets/depth-anything/DA-2K
๐ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available๐
๐Review https://t.ly/QX9Nu
๐Paper arxiv.org/pdf/2406.09414
๐Project depth-anything-v2.github.io/
๐Repo github.com/DepthAnything/Depth-Anything-V2
๐Data huggingface.co/datasets/depth-anything/DA-2K
๐ฅ10๐คฏ9โก1โค1๐1๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช
Anomaly Object-Detection๐ช
๐The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying โodd-lookingโ objects relative to the other instances within a multiple-views scene. Code announced๐
๐Review https://t.ly/3dGHp
๐Paper arxiv.org/pdf/2406.20099
๐Repo https://lnkd.in/d9x6FpUq
๐The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying โodd-lookingโ objects relative to the other instances within a multiple-views scene. Code announced๐
๐Review https://t.ly/3dGHp
๐Paper arxiv.org/pdf/2406.20099
๐Repo https://lnkd.in/d9x6FpUq
โค10๐ฅ6๐3๐3โก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชฉ MimicMotion: HQ Motion Generation ๐ชฉ
๐#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available๐
๐Review https://t.ly/XFoin
๐Paper arxiv.org/pdf/2406.19680
๐Project https://lnkd.in/eW-CMg_C
๐Code https://lnkd.in/eZ6SC2bc
๐#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available๐
๐Review https://t.ly/XFoin
๐Paper arxiv.org/pdf/2406.19680
๐Project https://lnkd.in/eW-CMg_C
๐Code https://lnkd.in/eZ6SC2bc
๐ฅ12๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชด CAVIS: SOTA Context-Aware Segmentation๐ชด
๐DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced๐
๐Review https://t.ly/G5obN
๐Paper arxiv.org/pdf/2407.03010
๐Repo github.com/Seung-Hun-Lee/CAVIS
๐Project seung-hun-lee.github.io/projects/CAVIS
๐DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced๐
๐Review https://t.ly/G5obN
๐Paper arxiv.org/pdf/2407.03010
๐Repo github.com/Seung-Hun-Lee/CAVIS
๐Project seung-hun-lee.github.io/projects/CAVIS
โค6๐5๐ฅ4๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Segment Any 4D Gaussians ๐ฅ
๐SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐
๐Review https://t.ly/uw3FS
๐Paper https://arxiv.org/pdf/2407.04504
๐Project https://jsxzs.github.io/sa4d/
๐Repo https://github.com/hustvl/SA4D
๐SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐
๐Review https://t.ly/uw3FS
๐Paper https://arxiv.org/pdf/2407.04504
๐Project https://jsxzs.github.io/sa4d/
๐Repo https://github.com/hustvl/SA4D
๐คฏ5๐3โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ค CODERS: Stereo Detection, 6D & Shape ๐ค
๐CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐
๐Review https://t.ly/Xpizz
๐Paper https://lnkd.in/dr5ZxC46
๐Project xingyoujun.github.io/coders/
๐Repo (TBA)
๐CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐
๐Review https://t.ly/Xpizz
๐Paper https://lnkd.in/dr5ZxC46
๐Project xingyoujun.github.io/coders/
๐Repo (TBA)
๐ฅ12โค1๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ธ Tracking Everything via Decomposition ๐ธ
๐Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐
๐Review https://t.ly/OsFTO
๐Paper https://arxiv.org/pdf/2407.06531
๐Repo github.com/qianduoduolr/DecoMotion
๐Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐
๐Review https://t.ly/OsFTO
๐Paper https://arxiv.org/pdf/2407.06531
๐Repo github.com/qianduoduolr/DecoMotion
๐ฅ9๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐พTAPVid-3D: benchmark for TAP-3D๐พ
๐#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐
๐Review https://t.ly/SsptD
๐Paper arxiv.org/pdf/2407.05921
๐Project tapvid3d.github.io/
๐Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐
๐Review https://t.ly/SsptD
๐Paper arxiv.org/pdf/2407.05921
๐Project tapvid3d.github.io/
๐Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐ฅ3๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ 940+ FPS Multi-Person Pose Estimation ๐ฅ
๐RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐
๐Review https://t.ly/XkBmg
๐Paper arxiv.org/pdf/2407.08634
๐Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
๐RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐
๐Review https://t.ly/XkBmg
๐Paper arxiv.org/pdf/2407.08634
๐Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
โค8๐ฅ4๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅฅ OmniNOCS: largest 3D NOCS ๐ฅฅ
๐OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐
๐Review https://t.ly/xPgBn
๐Paper arxiv.org/pdf/2407.08711
๐Project https://omninocs.github.io/
๐Data github.com/google-deepmind/omninocs
๐OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐
๐Review https://t.ly/xPgBn
๐Paper arxiv.org/pdf/2407.08711
๐Project https://omninocs.github.io/
๐Data github.com/google-deepmind/omninocs
๐ฅ4โค3๐2๐1๐ฅฐ1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ KineTy: Typography Diffusion ๐
๐GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐
๐Review https://t.ly/2FWo9
๐Paper arxiv.org/pdf/2407.10476
๐Project seonmip.github.io/kinety/
๐Repo github.com/SeonmiP/KineTy/tree/main
๐GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐
๐Review https://t.ly/2FWo9
๐Paper arxiv.org/pdf/2407.10476
๐Project seonmip.github.io/kinety/
๐Repo github.com/SeonmiP/KineTy/tree/main
โค4๐1๐ฅ1๐ฅฐ1
๐Gradient Boosting Reinforcement Learning๐
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
๐#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐
๐Review https://t.ly/zv9pl
๐Paper https://arxiv.org/pdf/2407.08250
๐Code https://github.com/NVlabs/gbrl
โค7๐คฏ4๐3๐ฅ1๐ฅฐ1
Hi folks,
I need you help ๐
๐ Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help ๐
๐ Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearningโฆ | Alessandro Ferrari
๐ฝ ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/codingโฆ
๐5
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฟ Shape of Motion for 4D ๐งฟ
๐ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐
๐Review https://t.ly/d9RsA
๐Project https://shape-of-motion.github.io/
๐Paper arxiv.org/pdf/2407.13764
๐Code github.com/vye16/shape-of-motion/
๐ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐
๐Review https://t.ly/d9RsA
๐Project https://shape-of-motion.github.io/
๐Paper arxiv.org/pdf/2407.13764
๐Code github.com/vye16/shape-of-motion/
โค5๐คฏ4๐ฅ2๐1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ญ TRG: new SOTA 6DoF Head ๐ญ
๐ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐
๐Review https://t.ly/lOIRA
๐Paper https://lnkd.in/dCWEwNyF
๐Code https://lnkd.in/dzRrwKBD
๐ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐
๐Review https://t.ly/lOIRA
๐Paper https://lnkd.in/dCWEwNyF
๐Code https://lnkd.in/dzRrwKBD
๐ฅ5๐คฏ3๐1๐ฅฐ1
๐Who's the REAL SOTA tracker in the world?๐
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐
๐Review https://t.ly/WB9AR
๐Paper https://arxiv.org/pdf/2407.15707
๐Code github.com/BasitAlawode/Best_of_N_Trackers
๐ฅ5๐คฏ5๐2โค1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข TAPTRv2: new SOTA for TAP ๐ข
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐
๐Review https://t.ly/H84ae
๐Paper v1 https://lnkd.in/d4vD_6xx
๐Paper v2 https://lnkd.in/dE_TUzar
๐Project https://taptr.github.io/
๐Code https://lnkd.in/dgfs9Qdy
๐6๐ฅ3๐คฏ3โค2๐ฑ1
๐งฑEAFormer: Scene Text-Segm.๐งฑ
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
๐A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐ค
๐Review https://t.ly/0G2uX
๐Paper arxiv.org/pdf/2407.17020
๐Project hyangyu.github.io/EAFormer/
๐Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐ฅ6๐1๐ฅฐ1