This media is not supported in your browser
VIEW IN TELEGRAM
🤬Music vs. Face Recognition🤬
👉At their recent concerts, Massive Attack turned surveillance into art: crafting a haunting audiovisual show that exposes the dangers of facial recognition tech. Audiences were scanned with mock surveillance, confronting them with the intrusive power of #AI in real time
👉Review https://t.ly/VMrPC
👉News https://t.ly/aj1an
👉At their recent concerts, Massive Attack turned surveillance into art: crafting a haunting audiovisual show that exposes the dangers of facial recognition tech. Audiences were scanned with mock surveillance, confronting them with the intrusive power of #AI in real time
👉Review https://t.ly/VMrPC
👉News https://t.ly/aj1an
🔥15❤4👏1
1_mR__xCOs_j8c5A0hMq3B5Q.gif
21.7 MB
🔥SOTA Detection w/ DINOv3🔥
👉DEIMv2 is the evolution of DEIM framework while leveraging DINOv3. Various models, from an ultra-light version up to S, M, L, & X for a wide range of scenarios. Across these variants, DEIMv2 achieves SOTA. Repo Apache2.0💙
👉Review https://t.ly/P7jEH
👉Paper arxiv.org/pdf/2509.20787
👉Repo github.com/Intellindust-AI-Lab/DEIMv2
👉Project intellindust-ai-lab.github.io/projects/DEIMv2
👉DEIMv2 is the evolution of DEIM framework while leveraging DINOv3. Various models, from an ultra-light version up to S, M, L, & X for a wide range of scenarios. Across these variants, DEIMv2 achieves SOTA. Repo Apache2.0💙
👉Review https://t.ly/P7jEH
👉Paper arxiv.org/pdf/2509.20787
👉Repo github.com/Intellindust-AI-Lab/DEIMv2
👉Project intellindust-ai-lab.github.io/projects/DEIMv2
🔥12❤6👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖Real-time Interactive Video🤖
👉LONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial license💙
👉Review https://t.ly/jJkdY
👉Paper arxiv.org/pdf/2509.22622
👉Project nvlabs.github.io/LongLive/
👉Repo github.com/NVlabs/LongLive
🤗huggingface.co/Efficient-Large-Model/LongLive-1.3B
👉LONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial license💙
👉Review https://t.ly/jJkdY
👉Paper arxiv.org/pdf/2509.22622
👉Project nvlabs.github.io/LongLive/
👉Repo github.com/NVlabs/LongLive
🤗huggingface.co/Efficient-Large-Model/LongLive-1.3B
🔥8❤1
This media is not supported in your browser
VIEW IN TELEGRAM
👔 Universal Image Restoration 👔
👉LucidFlux by HKUSTGZ is the universal image restoration framework built on a large-scale diffusion transformer that delivers photorealistic restorations of real-world low-quality (LQ) images, outperforming SOTA diffusion-based models across diverse degradations. Repo under custom Non-Commercial License💙
👉Review https://t.ly/Z5cA3
👉Paper https://arxiv.org/pdf/2509.22414
👉Project https://w2genai-lab.github.io/LucidFlux/
👉Repo https://github.com/W2GenAI-Lab/LucidFlux
👉LucidFlux by HKUSTGZ is the universal image restoration framework built on a large-scale diffusion transformer that delivers photorealistic restorations of real-world low-quality (LQ) images, outperforming SOTA diffusion-based models across diverse degradations. Repo under custom Non-Commercial License💙
👉Review https://t.ly/Z5cA3
👉Paper https://arxiv.org/pdf/2509.22414
👉Project https://w2genai-lab.github.io/LucidFlux/
👉Repo https://github.com/W2GenAI-Lab/LucidFlux
🔥14❤3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👩🦱Physical-Hair Diffusion👩🦱
👉CONTROLHAIR is novel hybrid framework that integrates a physics simulator with conditional video diffusion to enable controllable dynamic hair rendering. Repo announced💙
👉Review https://t.ly/78LHr
👉Paper https://lnkd.in/epm-A9Fq
👉Project https://lnkd.in/evsjz298
👉Repo TBA
👉CONTROLHAIR is novel hybrid framework that integrates a physics simulator with conditional video diffusion to enable controllable dynamic hair rendering. Repo announced💙
👉Review https://t.ly/78LHr
👉Paper https://lnkd.in/epm-A9Fq
👉Project https://lnkd.in/evsjz298
👉Repo TBA
❤7🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔩Code-Agentic Education🔩
👉Show Lab unveils Code2Video: agentic, code-centric framework that generates HQ educational videos from knowledge points. Clarity, coherence & reproducibility. Repo under MIT💙
👉Review https://t.ly/Fv4LJ
👉Paper https://arxiv.org/pdf/2510.01174
👉Repo https://github.com/showlab/Code2Video/
👉Project https://showlab.github.io/Code2Video/
👉Show Lab unveils Code2Video: agentic, code-centric framework that generates HQ educational videos from knowledge points. Clarity, coherence & reproducibility. Repo under MIT💙
👉Review https://t.ly/Fv4LJ
👉Paper https://arxiv.org/pdf/2510.01174
👉Repo https://github.com/showlab/Code2Video/
👉Project https://showlab.github.io/Code2Video/
❤8🔥2
epi_11 (online-video-cutter.com).mp4
1.1 MB
🎷🎷 Clink! Chop! Thud! 🎷🎷
👉Sounding Object Detection: while an environment may contain many objects, only a few are directly involved in producing sound during an interaction. This model detects the sounding object in a video. Code/Data announced 💙
👉Review https://t.ly/VK_1h
👉Paper https://lnkd.in/depNjVXm
👉Project https://lnkd.in/dF63EZFG
👉Repo TBA
👉Sounding Object Detection: while an environment may contain many objects, only a few are directly involved in producing sound during an interaction. This model detects the sounding object in a video. Code/Data announced 💙
👉Review https://t.ly/VK_1h
👉Paper https://lnkd.in/depNjVXm
👉Project https://lnkd.in/dF63EZFG
👉Repo TBA
🔥5❤2😍1
👉 A proof I'm not a bot...
My (short) interview to one of the biggest Italian media: AI in 2016, HPC / Quantum and how I created my startup: https://www.linkedin.com/posts/visionarynet_ai-itw25-ai-activity-7381215486115643392-t7an
Thanks for the support (and of course a new paper coming in a few hours)
My (short) interview to one of the biggest Italian media: AI in 2016, HPC / Quantum and how I created my startup: https://www.linkedin.com/posts/visionarynet_ai-itw25-ai-activity-7381215486115643392-t7an
Thanks for the support (and of course a new paper coming in a few hours)
❤17🔥7👏4😍3⚡1
This media is not supported in your browser
VIEW IN TELEGRAM
🎺Visual Grounding RVOS🎺
👉ReferDINO is a strong RVOS model that inherits region-level vision-language alignment from foundational visual grounding models, and is further endowed with pixel-level dense perception & cross-modal spatio-temporal reasoning. Code, Demo & checkpoints💙
👉Review https://t.ly/rOdkP
👉Paper https://lnkd.in/efuAFQdE
👉Project https://lnkd.in/dK3wMZqv
👉Repo https://lnkd.in/d3i2PsNF
👉ReferDINO is a strong RVOS model that inherits region-level vision-language alignment from foundational visual grounding models, and is further endowed with pixel-level dense perception & cross-modal spatio-temporal reasoning. Code, Demo & checkpoints💙
👉Review https://t.ly/rOdkP
👉Paper https://lnkd.in/efuAFQdE
👉Project https://lnkd.in/dK3wMZqv
👉Repo https://lnkd.in/d3i2PsNF
🔥8❤1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💄Pixel-Perfect Depth (SOTA)💄
👉Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0💙
👉Review https://t.ly/75PGo
👉Paper https://lnkd.in/d8wxFpyY
👉Project https://lnkd.in/dV5HhsqH
👉Repo https://lnkd.in/d9JKFBJq
👉Demo https://lnkd.in/d3wBkKJ9
👉Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0💙
👉Review https://t.ly/75PGo
👉Paper https://lnkd.in/d8wxFpyY
👉Project https://lnkd.in/dV5HhsqH
👉Repo https://lnkd.in/d9JKFBJq
👉Demo https://lnkd.in/d3wBkKJ9
🔥16🤯5❤3
This media is not supported in your browser
VIEW IN TELEGRAM
↗️ TrackVLA++ Visual Tracking↘️
👉TrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announced💙
👉Review https://t.ly/ruYzc
👉Paper https://arxiv.org/pdf/2510.07134
👉Project pku-epic.github.io/TrackVLA-plus-plus-Web/
👉Repo TBA
👉TrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announced💙
👉Review https://t.ly/ruYzc
👉Paper https://arxiv.org/pdf/2510.07134
👉Project pku-epic.github.io/TrackVLA-plus-plus-Web/
👉Repo TBA
🔥5🤣1