🔥Large Language DIFFUSION Model🔥
👉Renmin University introduces LLaDA, a DIFFUSION model trained entirely from scratch, rivaling LLaMA3 8B in performance. Pre-trained from scratch on 2.3T tokens using 0.13M H800 GPU hours, followed by SFT on 4.5M pairs. A new paradigm is born? Repo by the end of Feb.25 💙
👉Review https://t.ly/7Cnrh
👉Paper https://lnkd.in/dCWi3byk
👉Project https://lnkd.in/dB7JRYeA
👉Repo https://lnkd.in/dAqzeCHJ
👉Renmin University introduces LLaDA, a DIFFUSION model trained entirely from scratch, rivaling LLaMA3 8B in performance. Pre-trained from scratch on 2.3T tokens using 0.13M H800 GPU hours, followed by SFT on 4.5M pairs. A new paradigm is born? Repo by the end of Feb.25 💙
👉Review https://t.ly/7Cnrh
👉Paper https://lnkd.in/dCWi3byk
👉Project https://lnkd.in/dB7JRYeA
👉Repo https://lnkd.in/dAqzeCHJ
🤯12❤3🔥3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Unified Low-Level 4D Vision🌈
👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙
👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA
👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙
👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA
🔥5👍2🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 YOLOv12 is out (new SOTA) 🔥
👉YOLOv12 is a novel attention-centric YOLO framework that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. Source Code & Demo released💙
👉Review https://t.ly/jj1oR
👉Paper arxiv.org/pdf/2502.12524
👉Repo github.com/sunsmarterjie/yolov12
🤗Demo https://t.ly/w5rno
👉YOLOv12 is a novel attention-centric YOLO framework that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. Source Code & Demo released💙
👉Review https://t.ly/jj1oR
👉Paper arxiv.org/pdf/2502.12524
👉Repo github.com/sunsmarterjie/yolov12
🤗Demo https://t.ly/w5rno
🔥22👍9🤯8❤4💩3😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
👽Neural-Free Sparse Voxels Rasterization👽
👉#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)💙
👉Review https://t.ly/Nh_ic
👉Paper https://lnkd.in/g8k8Zs6R
👉Project https://lnkd.in/gR-bD4Wx
👉Repo https://lnkd.in/gNHX-w4t
👉#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)💙
👉Review https://t.ly/Nh_ic
👉Paper https://lnkd.in/g8k8Zs6R
👉Project https://lnkd.in/gR-bD4Wx
👉Repo https://lnkd.in/gNHX-w4t
🔥14👍4🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🏉MITracker: Multi-View Track🏉
👉MITracker is a novel Multi-View Integration Tracker to efficiently integrate multi-view object features and provide stable tracking. Code & Dataset announced💙
👉Review https://t.ly/RTNUo
👉Paper arxiv.org/pdf/2502.20111
👉Repo github.com/XuM007/MITracker
👉Project xum007.github.io/MITracker.github.io
👉MITracker is a novel Multi-View Integration Tracker to efficiently integrate multi-view object features and provide stable tracking. Code & Dataset announced💙
👉Review https://t.ly/RTNUo
👉Paper arxiv.org/pdf/2502.20111
👉Repo github.com/XuM007/MITracker
👉Project xum007.github.io/MITracker.github.io
👍11🔥8😍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧠 Distractor-Aware SAM2 🧠
👉A novel distractor-aware memory for SAM2 and an introspection-based update strategy for VOT. Code & Dataset released💙
👉Review https://t.ly/RBRpQ
👉Paper arxiv.org/pdf/2411.17576
👉Project jovanavidenovic.github.io/dam-4-sam
👉Repo github.com/jovanavidenovic/DAM4SAM/
👉A novel distractor-aware memory for SAM2 and an introspection-based update strategy for VOT. Code & Dataset released💙
👉Review https://t.ly/RBRpQ
👉Paper arxiv.org/pdf/2411.17576
👉Project jovanavidenovic.github.io/dam-4-sam
👉Repo github.com/jovanavidenovic/DAM4SAM/
❤8🔥5👍2😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Distill-Any-Depth: SOTA MDE🔥
👉Distill-Any-Depth is the new SOTA monocular depth estimation model trained with a novel knowledge distillation. Authors: ZJUT, WestLake University, LZU & NTU. Source Code, pre-trained models & HF-demo released💙
👉Review https://t.ly/GBJgi
👉Paper arxiv.org/pdf/2502.19204
👉Repo https://lnkd.in/dPtxNrQh
🤗Demo https://lnkd.in/d2TMPf4b
👉Distill-Any-Depth is the new SOTA monocular depth estimation model trained with a novel knowledge distillation. Authors: ZJUT, WestLake University, LZU & NTU. Source Code, pre-trained models & HF-demo released💙
👉Review https://t.ly/GBJgi
👉Paper arxiv.org/pdf/2502.19204
👉Repo https://lnkd.in/dPtxNrQh
🤗Demo https://lnkd.in/d2TMPf4b
❤12🔥5👍3👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍎FindTrack: text-driven VOS 🍎
👉Yonsei University introduces FindTrack, a novel decoupled framework that separates text-driven target ID from mask propagation. Impressive results (even under severe occlusions), new SOTA. Source Code & models to be released💙
👉Review https://t.ly/2smaF
👉Paper arxiv.org/pdf/2503.03492
👉Repo github.com/suhwan-cho/FindTrack
👉Yonsei University introduces FindTrack, a novel decoupled framework that separates text-driven target ID from mask propagation. Impressive results (even under severe occlusions), new SOTA. Source Code & models to be released💙
👉Review https://t.ly/2smaF
👉Paper arxiv.org/pdf/2503.03492
👉Repo github.com/suhwan-cho/FindTrack
🔥10🤯4👍3❤2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
📒 Moving-Camera Diffusion 📒
👉Tencent unveils TrajectoryCrafter, a novel approach to redirect camera trajectories for monocular videos. Impressive results, the future of commercial #adv. Code & Demo released💙
👉Review https://t.ly/L-IoR
👉Paper https://arxiv.org/pdf/2503.05638
👉Project https://trajectorycrafter.github.io/
👉Repo github.com/TrajectoryCrafter/TrajectoryCrafter
🤗Demo https://huggingface.co/spaces/Doubiiu/TrajectoryCrafter
👉Tencent unveils TrajectoryCrafter, a novel approach to redirect camera trajectories for monocular videos. Impressive results, the future of commercial #adv. Code & Demo released💙
👉Review https://t.ly/L-IoR
👉Paper https://arxiv.org/pdf/2503.05638
👉Project https://trajectorycrafter.github.io/
👉Repo github.com/TrajectoryCrafter/TrajectoryCrafter
🤗Demo https://huggingface.co/spaces/Doubiiu/TrajectoryCrafter
🔥12🤩4❤2👍1👏1
💙 Announcing #Py4AI 2025 💙
👉 The second edition of Py4AI conference is official! An all-day, fully free, event for #AI & #Python lovers.
𝐓𝐡𝐞 𝐟𝐢𝐫𝐬𝐭 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🚀Dana Aubakirova | Hugging Face🤗
🚀Yunhao Liu & Ruoya Sheng | ByteDance🔥
🚀Alice Casiraghi | 🌏🌎🌍
🚀Luca Arrotta, PhD | Datapizza🍕
🚀Valeria Zuccoli | Bettini Srl
🚀Mirco Planamente | ARGO Vision
🚀Daniele Zonca | Red Hat
👉 Info & registration: https://t.ly/37wWj
👉 The second edition of Py4AI conference is official! An all-day, fully free, event for #AI & #Python lovers.
𝐓𝐡𝐞 𝐟𝐢𝐫𝐬𝐭 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🚀Dana Aubakirova | Hugging Face🤗
🚀Yunhao Liu & Ruoya Sheng | ByteDance🔥
🚀Alice Casiraghi | 🌏🌎🌍
🚀Luca Arrotta, PhD | Datapizza🍕
🚀Valeria Zuccoli | Bettini Srl
🚀Mirco Planamente | ARGO Vision
🚀Daniele Zonca | Red Hat
👉 Info & registration: https://t.ly/37wWj
LinkedIn
LinkedIn Login, Sign in | LinkedIn
Login to LinkedIn to keep in touch with people you know, share ideas, and build your career.
❤7⚡1👍1🔥1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎯RexSeek: Referring Any Object🎯
👉Novel referring detection model based on multimodal LLM to precisely locate objects based on user-input natural language. Model specialization on humans. Code released 💙
👉Review https://shorturl.at/CGsT2
👉Paper arxiv.org/pdf/2503.08507
👉Code github.com/IDEA-Research/RexSeek
👉Novel referring detection model based on multimodal LLM to precisely locate objects based on user-input natural language. Model specialization on humans. Code released 💙
👉Review https://shorturl.at/CGsT2
👉Paper arxiv.org/pdf/2503.08507
👉Code github.com/IDEA-Research/RexSeek
👍17❤6👏4🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🐶OVTR: E2E Transformer MOT🐶
👉HUST University proposes OVTR (End-to-End Open-Vocabulary Multiple Object Tracking with TRansformer), the first end-to-end open-vocabulary tracker that models motion, appearance, and category simultaneously. Source Code released under MIT💙
👉Review https://t.ly/K3ASX
👉Paper arxiv.org/pdf/2503.10616
👉Code https://github.com/jinyanglii/OVTR
👉HUST University proposes OVTR (End-to-End Open-Vocabulary Multiple Object Tracking with TRansformer), the first end-to-end open-vocabulary tracker that models motion, appearance, and category simultaneously. Source Code released under MIT💙
👉Review https://t.ly/K3ASX
👉Paper arxiv.org/pdf/2503.10616
👉Code https://github.com/jinyanglii/OVTR
🔥11❤2👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🫀HyperFast Mycardium tracking🫀
👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙
👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker
👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙
👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker
👍11❤7🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾 6D Tracking & Pose SOTA 🍾
👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙
👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose
👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙
👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose
👏6❤3
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ VGG Transformer 🖲️
👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙
👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL
👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙
👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL
🤯25👍11🔥6❤2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸 Occluded 3D Reconstruction 🧸
👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙
👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R
👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙
👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R
👍6🔥4❤2🤯2👏1
🌱 #Py4AI: line-up is official 🌱
👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇
𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥Fanny Bouton | OVHcloud
🔥Full list: https://t.ly/JJP8B
👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇
𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥Fanny Bouton | OVHcloud
🔥Full list: https://t.ly/JJP8B
🔥3❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 IMPOSSIBLE Videos 🧞
👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙
👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos
👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙
👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos
🔥6❤2👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🥎LLM Spatial Understanding🥎
👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙
👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research
👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙
👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research
🔥30❤4⚡2🤯2😍2