Artificial Intelligence||DL

💄Pixel-Perfect Depth (SOTA)💄

👉Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0💙

👉Review https://t.ly/75PGo
👉Paper https://lnkd.in/d8wxFpyY
👉Project https://lnkd.in/dV5HhsqH
👉Repo https://lnkd.in/d9JKFBJq
👉Demo https://lnkd.in/d3wBkKJ9

🔥1

148 viewsS, 01:33

👔 Universal Image Restoration 👔

👉LucidFlux by HKUSTGZ is the universal image restoration framework built on a large-scale diffusion transformer that delivers photorealistic restorations of real-world low-quality (LQ) images, outperforming SOTA diffusion-based models across diverse degradations. Repo under custom Non-Commercial License💙

👉Review https://t.ly/Z5cA3
👉Paper https://arxiv.org/pdf/2509.22414
👉Project https://w2genai-lab.github.io/LucidFlux/
👉Repo https://github.com/W2GenAI-Lab/LucidFlux

🔥1

199 viewsS, 02:07

🫧 Detect Anything via MLLM 🫧

👉Rex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0💙

👉Review https://t.ly/DCTk_
👉Paper https://lnkd.in/d4VDD-9j
👉Project https://lnkd.in/d6unEyvq
👉Repo https://lnkd.in/dkYJFe-x

🔥3

179 viewsS, 07:24

0:12

🫙Universal Feature Up-Sampling🫙

👉AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo under CC-4.0💙

👉Review https://t.ly/HvEw9
👉Paper https://arxiv.org/pdf/2510.12764
👉Project https://wimmerth.github.io/anyup/
👉Repo https://github.com/wimmerth/anyup

175 viewsS, 02:44

🦄 City-Tour -> Simulation 🦄

👉UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced 💙

👉Review https://t.ly/UvXNS
👉Paper https://arxiv.org/pdf/2510.15018
👉Project https://urbanverseproject.github.io/
👉Repo TBA

👍2

133 viewsS, 08:14

🌵All-in-One Dense Keypoints🌵

👉DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MIT💙

👉Review https://t.ly/VKGct
👉Paper https://arxiv.org/pdf/2510.17422
👉Repo https://github.com/saktx/DeepDetect

👍2

169 viewsS, 08:15

GitHub - OatmealLiu/UrbanVerse: Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically…

Repo (pretty empty) now online: https://github.com/OatmealLiu/UrbanVerse

GitHub

Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically-Accurate Assets × Real-World City-Tour Layouts) - OatmealLiu/UrbanVerse

🔥1

169 viewsS, 00:50

🏜️Omni Driving Navigation Models🏜️

👉OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0💙

👉Review https://t.ly/ktXvz
👉Paper https://lnkd.in/eFKSZnrc
👉Project https://lnkd.in/eSDfccv8
👉Repo https://lnkd.in/efCSvjtp

🔥1

186 viewsS, 08:33

🦗Character Mixing Generation🦗

👉MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)

👉Review https://t.ly/tN84a
👉Paper https://lnkd.in/dhKMwukv
👉Project https://lnkd.in/dBkJs48h
👉Repo https://lnkd.in/dw_uzgAk

160 viewsS, 07:31

0:03

🦄Unified Region-Level MLLM🦄

👉PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available💙

👉Review https://t.ly/WH4dQ
👉Paper arxiv.org/pdf/2510.23603
👉Project circleradon.github.io/PixelRefer
👉Repo https://github.com/alibaba-damo-academy/PixelRefer

112 viewsShahzod, 05:18

👢Generative View Stitching 👢

👉GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT💙

👉Review https://t.ly/TiN_5
👉Paper https://arxiv.org/pdf/2510.24718
👉Project https://andrewsonga.github.io/gvs/
👉Repo github.com/andrewsonga/generative_view_stitching

136 viewsShahzod, 05:18

Greetings from the SMART CITY WORLD CONGRESS in Barcellona. If you are around, ping me ;)

124 viewsShahzod, 04:33

0:03