AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧤HOT3D Hand/Object Tracking🧤

👉#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available 💙

👉Review https://t.ly/cD76F
👉Paper https://lnkd.in/e6_7UNny
👉Data https://lnkd.in/e6P-sQFK
🔥93👏3👍2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💦 Self-driving in wet conditions 💦

👉BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.

👉Review https://t.ly/8S93j
👉Paper https://lnkd.in/dnN5MCZC
👉Project https://lnkd.in/dkUaxyEF
👉Data https://lnkd.in/ddhkyXv8
🔥61👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱 TokenHMR : new 3D human pose SOTA 🌱

👉TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.

👉Review https://t.ly/K9_8n
👉Paper arxiv.org/pdf/2404.16752
👉Project tokenhmr.is.tue.mpg.de/
👉Code github.com/saidwivedi/TokenHMR
🤯5👍3😱322🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🤓Glasses-Removal in Videos🤓

👉Lightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.

👉Review https://t.ly/Hgs2d
👉Paper arxiv.org/pdf/2406.14510
👉Project https://v-lasik.github.io/
👉Code github.com/v-lasik/v-lasik-code
💩166🤯5👍3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🧬Event-driven SuperResolution🧬

👉USTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code available💙

👉Review https://t.ly/zlb4c
👉Paper arxiv.org/pdf/2406.13457
👉Code github.com/DachunKai/EvTexture
👍116🤯4🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🐻StableNormal: Stable/Sharp Normal🐻

👉Alibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is available💙

👉Review https://t.ly/FPJlG
👉Paper https://arxiv.org/pdf/2406.16864
👉Demo https://huggingface.co/Stable-X
🔥42👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍦Geometry Guided Depth🍦

👉Depth and #3D reconstruction which can take as input, where available, previously-made estimates of the scene’s geometry

👉Review https://lnkd.in/dMgakzWm
👉Paper https://arxiv.org/pdf/2406.18387
👉Repo (empty) https://github.com/nianticlabs/DoubleTake
👍7🔥71🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🌮MeshAnything with Transformers🌮

👉MeshAnything converts any 3D representation into Artist-Created Meshes (AMs), i.e., meshes created by human artists. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to transform their results into AMs that can be seamlessly applied in the 3D industry. Source Code available💙

👉Review https://t.ly/HvkD4
👉Paper arxiv.org/pdf/2406.10163
👉Code github.com/buaacyw/MeshAnything
🤯1110🔥5👍4👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🌾LLaNA: NeRF-LLM assistant🌾

👉UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.

👉Review https://t.ly/JAfhV
👉Paper arxiv.org/pdf/2406.11840
👉Project andreamaduzzi.github.io/llana/
👉Code & Data coming
16🔥2👏2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Depth Anything v2 is out! 🔥

👉 Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available💙

👉Review https://t.ly/QX9Nu
👉Paper arxiv.org/pdf/2406.09414
👉Project depth-anything-v2.github.io/
👉Repo github.com/DepthAnything/Depth-Anything-V2
👉Data huggingface.co/datasets/depth-anything/DA-2K
🔥10🤯911👍1🥰1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🪅Anomaly Object-Detection🪅

👉The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying ‘odd-looking’ objects relative to the other instances within a multiple-views scene. Code announced💙

👉Review https://t.ly/3dGHp
👉Paper arxiv.org/pdf/2406.20099
👉Repo https://lnkd.in/d9x6FpUq
10🔥6👍3👏31
This media is not supported in your browser
VIEW IN TELEGRAM
🪩 MimicMotion: HQ Motion Generation 🪩

👉#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available💙

👉Review https://t.ly/XFoin
👉Paper arxiv.org/pdf/2406.19680
👉Project https://lnkd.in/eW-CMg_C
👉Code https://lnkd.in/eZ6SC2bc
🔥12🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🪴 CAVIS: SOTA Context-Aware Segmentation🪴

👉DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced💙

👉Review https://t.ly/G5obN
👉Paper arxiv.org/pdf/2407.03010
👉Repo github.com/Seung-Hun-Lee/CAVIS
👉Project seung-hun-lee.github.io/projects/CAVIS
6👍5🔥4👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Segment Any 4D Gaussians 🔥

👉SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024💙

👉Review https://t.ly/uw3FS
👉Paper https://arxiv.org/pdf/2407.04504
👉Project https://jsxzs.github.io/sa4d/
👉Repo https://github.com/hustvl/SA4D
🤯5👍32👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 CODERS: Stereo Detection, 6D & Shape 🤖

👉CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced💙

👉Review https://t.ly/Xpizz
👉Paper https://lnkd.in/dr5ZxC46
👉Project xingyoujun.github.io/coders/
👉Repo (TBA)
🔥121👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🐸 Tracking Everything via Decomposition 🐸

👉Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License💙

👉Review https://t.ly/OsFTO
👉Paper https://arxiv.org/pdf/2407.06531
👉Repo github.com/qianduoduolr/DecoMotion
🔥9👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾TAPVid-3D: benchmark for TAP-3D🍾

👉#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0💙

👉Review https://t.ly/SsptD
👉Paper arxiv.org/pdf/2407.05921
👉Project tapvid3d.github.io/
👉Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
🔥3👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 940+ FPS Multi-Person Pose Estimation 🔥

👉RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models 💙

👉Review https://t.ly/XkBmg
👉Paper arxiv.org/pdf/2407.08634
👉Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
8🔥4👏1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🥥 OmniNOCS: largest 3D NOCS 🥥

👉OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0💙

👉Review https://t.ly/xPgBn
👉Paper arxiv.org/pdf/2407.08711
👉Project https://omninocs.github.io/
👉Data github.com/google-deepmind/omninocs
🔥43👏2👍1🥰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💌 KineTy: Typography Diffusion 💌

👉GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0💙

👉Review https://t.ly/2FWo9
👉Paper arxiv.org/pdf/2407.10476
👉Project seonmip.github.io/kinety/
👉Repo github.com/SeonmiP/KineTy/tree/main
4👍1🔥1🥰1