This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ TrackVLA++ Visual TrackingβοΈ
πTrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announcedπ
πReview https://t.ly/ruYzc
πPaper https://arxiv.org/pdf/2510.07134
πProject pku-epic.github.io/TrackVLA-plus-plus-Web/
πRepo TBA
πTrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announcedπ
πReview https://t.ly/ruYzc
πPaper https://arxiv.org/pdf/2510.07134
πProject pku-epic.github.io/TrackVLA-plus-plus-Web/
πRepo TBA
π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πPixel-Perfect Depth (SOTA)π
πPixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0π
πReview https://t.ly/75PGo
πPaper https://lnkd.in/d8wxFpyY
πProject https://lnkd.in/dV5HhsqH
πRepo https://lnkd.in/d9JKFBJq
πDemo https://lnkd.in/d3wBkKJ9
πPixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0π
πReview https://t.ly/75PGo
πPaper https://lnkd.in/d8wxFpyY
πProject https://lnkd.in/dV5HhsqH
πRepo https://lnkd.in/d9JKFBJq
πDemo https://lnkd.in/d3wBkKJ9
π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Universal Image Restoration π
πLucidFlux by HKUSTGZ is the universal image restoration framework built on a large-scale diffusion transformer that delivers photorealistic restorations of real-world low-quality (LQ) images, outperforming SOTA diffusion-based models across diverse degradations. Repo under custom Non-Commercial Licenseπ
πReview https://t.ly/Z5cA3
πPaper https://arxiv.org/pdf/2509.22414
πProject https://w2genai-lab.github.io/LucidFlux/
πRepo https://github.com/W2GenAI-Lab/LucidFlux
πLucidFlux by HKUSTGZ is the universal image restoration framework built on a large-scale diffusion transformer that delivers photorealistic restorations of real-world low-quality (LQ) images, outperforming SOTA diffusion-based models across diverse degradations. Repo under custom Non-Commercial Licenseπ
πReview https://t.ly/Z5cA3
πPaper https://arxiv.org/pdf/2509.22414
πProject https://w2genai-lab.github.io/LucidFlux/
πRepo https://github.com/W2GenAI-Lab/LucidFlux
π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«§ Detect Anything via MLLM π«§
πRex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0π
πReview https://t.ly/DCTk_
πPaper https://lnkd.in/d4VDD-9j
πProject https://lnkd.in/d6unEyvq
πRepo https://lnkd.in/dkYJFe-x
πRex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0π
πReview https://t.ly/DCTk_
πPaper https://lnkd.in/d4VDD-9j
πProject https://lnkd.in/d6unEyvq
πRepo https://lnkd.in/dkYJFe-x
π₯3
This media is not supported in your browser
VIEW IN TELEGRAM
π«Universal Feature Up-Samplingπ«
πAnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo under CC-4.0π
πReview https://t.ly/HvEw9
πPaper https://arxiv.org/pdf/2510.12764
πProject https://wimmerth.github.io/anyup/
πRepo https://github.com/wimmerth/anyup
πAnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo under CC-4.0π
πReview https://t.ly/HvEw9
πPaper https://arxiv.org/pdf/2510.12764
πProject https://wimmerth.github.io/anyup/
πRepo https://github.com/wimmerth/anyup
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ City-Tour -> Simulation π¦
πUrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced π
πReview https://t.ly/UvXNS
πPaper https://arxiv.org/pdf/2510.15018
πProject https://urbanverseproject.github.io/
πRepo TBA
πUrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced π
πReview https://t.ly/UvXNS
πPaper https://arxiv.org/pdf/2510.15018
πProject https://urbanverseproject.github.io/
πRepo TBA
π2
π΅All-in-One Dense Keypointsπ΅
πDeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MITπ
πReview https://t.ly/VKGct
πPaper https://arxiv.org/pdf/2510.17422
πRepo https://github.com/saktx/DeepDetect
πDeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MITπ
πReview https://t.ly/VKGct
πPaper https://arxiv.org/pdf/2510.17422
πRepo https://github.com/saktx/DeepDetect
π2
Repo (pretty empty) now online: https://github.com/OatmealLiu/UrbanVerse
GitHub
GitHub - OatmealLiu/UrbanVerse: Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physicallyβ¦
Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically-Accurate Assets Γ Real-World City-Tour Layouts) - OatmealLiu/UrbanVerse
π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈOmni Driving Navigation ModelsποΈ
πOmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0π
πReview https://t.ly/ktXvz
πPaper https://lnkd.in/eFKSZnrc
πProject https://lnkd.in/eSDfccv8
πRepo https://lnkd.in/efCSvjtp
πOmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0π
πReview https://t.ly/ktXvz
πPaper https://lnkd.in/eFKSZnrc
πProject https://lnkd.in/eSDfccv8
πRepo https://lnkd.in/efCSvjtp
π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Character Mixing Generationπ¦
πMBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
πReview https://t.ly/tN84a
πPaper https://lnkd.in/dhKMwukv
πProject https://lnkd.in/dBkJs48h
πRepo https://lnkd.in/dw_uzgAk
πMBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
πReview https://t.ly/tN84a
πPaper https://lnkd.in/dhKMwukv
πProject https://lnkd.in/dBkJs48h
πRepo https://lnkd.in/dw_uzgAk
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Unified Region-Level MLLMπ¦
πPixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset availableπ
πReview https://t.ly/WH4dQ
πPaper arxiv.org/pdf/2510.23603
πProject circleradon.github.io/PixelRefer
πRepo https://github.com/alibaba-damo-academy/PixelRefer
πPixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset availableπ
πReview https://t.ly/WH4dQ
πPaper arxiv.org/pdf/2510.23603
πProject circleradon.github.io/PixelRefer
πRepo https://github.com/alibaba-damo-academy/PixelRefer
This media is not supported in your browser
VIEW IN TELEGRAM
π’Generative View Stitching π’
πGVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MITπ
πReview https://t.ly/TiN_5
πPaper https://arxiv.org/pdf/2510.24718
πProject https://andrewsonga.github.io/gvs/
πRepo github.com/andrewsonga/generative_view_stitching
πGVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MITπ
πReview https://t.ly/TiN_5
πPaper https://arxiv.org/pdf/2510.24718
πProject https://andrewsonga.github.io/gvs/
πRepo github.com/andrewsonga/generative_view_stitching
This media is not supported in your browser
VIEW IN TELEGRAM
πͺTracking Object Transformationsπͺ
π"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MITπ
πReview https://t.ly/NPyW4
πPaper https://lnkd.in/d4pA3bXJ
πProject https://lnkd.in/dgbNfCuj
πRepo https://lnkd.in/dtVWq2z7
π"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MITπ
πReview https://t.ly/NPyW4
πPaper https://lnkd.in/d4pA3bXJ
πProject https://lnkd.in/dgbNfCuj
πRepo https://lnkd.in/dtVWq2z7