This media is not supported in your browser
VIEW IN TELEGRAM
🫀HyperFast Mycardium tracking🫀
👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙
👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker
👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙
👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker
👍11❤7🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾 6D Tracking & Pose SOTA 🍾
👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙
👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose
👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙
👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose
👏6❤3
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ VGG Transformer 🖲️
👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙
👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL
👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙
👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL
🤯25👍11🔥6❤2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸 Occluded 3D Reconstruction 🧸
👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙
👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R
👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙
👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R
👍6🔥4❤2🤯2👏1
🌱 #Py4AI: line-up is official 🌱
👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇
𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥Fanny Bouton | OVHcloud
🔥Full list: https://t.ly/JJP8B
👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇
𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥Fanny Bouton | OVHcloud
🔥Full list: https://t.ly/JJP8B
🔥3❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 IMPOSSIBLE Videos 🧞
👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙
👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos
👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙
👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos
🔥6👍2❤1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🥎LLM Spatial Understanding🥎
👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙
👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research
👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙
👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research
🔥30❤4⚡2🤯2😍2
This media is not supported in your browser
VIEW IN TELEGRAM
🙀3D MultiModal Memory🙀
👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET
👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET
🔥10❤4👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Dereflection Any Image 🔥
👉SJTU & #Huawei unveils DAI, novel diffusion-based framework able to recover from a wide range of reflection types. One-step diffusion with deterministic outputs & fast inference. Inference, pretrained models & training released💙
👉Review https://t.ly/PDA9K
👉Paper https://arxiv.org/pdf/2503.17347
👉Project abuuu122.github.io/DAI.github.io/
👉Repo github.com/Abuuu122/Dereflection-Any-Image
👉SJTU & #Huawei unveils DAI, novel diffusion-based framework able to recover from a wide range of reflection types. One-step diffusion with deterministic outputs & fast inference. Inference, pretrained models & training released💙
👉Review https://t.ly/PDA9K
👉Paper https://arxiv.org/pdf/2503.17347
👉Project abuuu122.github.io/DAI.github.io/
👉Repo github.com/Abuuu122/Dereflection-Any-Image
🔥21🤯5👏4❤2👍2😍1
🦎 Scaling Vision to 4K🦎
👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙
👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv
👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙
👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv
🔥14❤4👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏓LATTE-MV: #3D Table Tennis🏓
👉UC Berkeley unveils at #CVPR2025 a novel system for reconstructing monocular video of table tennis in 3D with uncertainty-aware controller that anticipates opponent actions. Code & Dataset announced, to be released💙
👉Review https://t.ly/qPMOU
👉Paper arxiv.org/pdf/2503.20936
👉Project sastry-group.github.io/LATTE-MV/
👉Repo github.com/sastry-group/LATTE-MV
👉UC Berkeley unveils at #CVPR2025 a novel system for reconstructing monocular video of table tennis in 3D with uncertainty-aware controller that anticipates opponent actions. Code & Dataset announced, to be released💙
👉Review https://t.ly/qPMOU
👉Paper arxiv.org/pdf/2503.20936
👉Project sastry-group.github.io/LATTE-MV/
👉Repo github.com/sastry-group/LATTE-MV
🔥8👍2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳MSVA Zero-Shot Multi-View🌳
👉Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announced💙
👉Review https://t.ly/LvuTh
👉Paper https://arxiv.org/pdf/2503.22430
👉Project https://nianticlabs.github.io/mvsanywhere/
👉Repo https://lnkd.in/ddQz9eps
👉Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announced💙
👉Review https://t.ly/LvuTh
👉Paper https://arxiv.org/pdf/2503.22430
👉Project https://nianticlabs.github.io/mvsanywhere/
👉Repo https://lnkd.in/ddQz9eps
🔥12👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐟Segment Any Motion in Video🐟
👉From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT license💙
👉Review https://t.ly/4aYjJ
👉Paper arxiv.org/pdf/2503.22268
👉Project motion-seg.github.io/
👉Repo github.com/nnanhuang/SegAnyMo
👉From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT license💙
👉Review https://t.ly/4aYjJ
👉Paper arxiv.org/pdf/2503.22268
👉Project motion-seg.github.io/
👉Repo github.com/nnanhuang/SegAnyMo
🔥5👍3❤2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💃 Video Motion Graphs 💃
👉#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be released💙
👉Review https://t.ly/r4EGF
👉Paper https://lnkd.in/dK_tHyzh
👉Project https://lnkd.in/dE6c_KYZ
👉Repo TBA
👉#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be released💙
👉Review https://t.ly/r4EGF
👉Paper https://lnkd.in/dK_tHyzh
👉Project https://lnkd.in/dE6c_KYZ
👉Repo TBA
❤15🔥7👍2👏1😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 Compose Anything is out 🌳
👉Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark released💙
👉Review https://t.ly/MEjzL
👉Paper https://arxiv.org/pdf/2504.02436
👉Project skyworkai.github.io/skyreels-a2.github.io/
👉Repo github.com/SkyworkAI/SkyReels-A2
🤗Models https://huggingface.co/Skywork/SkyReels-A2
👉Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark released💙
👉Review https://t.ly/MEjzL
👉Paper https://arxiv.org/pdf/2504.02436
👉Project skyworkai.github.io/skyreels-a2.github.io/
👉Repo github.com/SkyworkAI/SkyReels-A2
🤗Models https://huggingface.co/Skywork/SkyReels-A2
❤9👍3😍2🔥1🤩1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
⛽ VoRA: Vision as LoRA ⛽
👉#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights available💙
👉Review https://t.ly/guNVN
👉Paper arxiv.org/pdf/2503.20680
👉Repo github.com/Hon-Wong/VoRA
👉Project georgeluimmortal.github.io/vora-homepage.github.io/
👉#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights available💙
👉Review https://t.ly/guNVN
👉Paper arxiv.org/pdf/2503.20680
👉Repo github.com/Hon-Wong/VoRA
👉Project georgeluimmortal.github.io/vora-homepage.github.io/
👍15❤7🤯4👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐈 TTT Long Video Generation🐈
👉A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations released💙
👉Review https://t.ly/mhlTN
👉Paper arxiv.org/pdf/2504.05298
👉Project test-time-training.github.io/video-dit/
👉Repo github.com/test-time-training/ttt-video-dit
👉A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations released💙
👉Review https://t.ly/mhlTN
👉Paper arxiv.org/pdf/2504.05298
👉Project test-time-training.github.io/video-dit/
👉Repo github.com/test-time-training/ttt-video-dit
❤12🔥3😍2
This media is not supported in your browser
VIEW IN TELEGRAM
💛 Unified Scalable SVG Generator 💛
👉OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MIT💙
👉Review https://t.ly/JcR3I
👉Paper https://arxiv.org/pdf/2504.06263
👉Project https://omnisvg.github.io/
👉Repo github.com/OmniSVG/OmniSVG
👉Dataset https://huggingface.co/OmniSVG
👉OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MIT💙
👉Review https://t.ly/JcR3I
👉Paper https://arxiv.org/pdf/2504.06263
👉Project https://omnisvg.github.io/
👉Repo github.com/OmniSVG/OmniSVG
👉Dataset https://huggingface.co/OmniSVG
❤15🔥2👍1👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊BoxDreamer Object Pose🧊
👉BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo released💙
👉Review https://t.ly/e-vX9
👉Paper arxiv.org/pdf/2504.07955
👉Project https://lnkd.in/djz8jqn9
👉Repo https://lnkd.in/dfuEawSA
🤗Demo https://lnkd.in/dVYaWGcS
👉BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo released💙
👉Review https://t.ly/e-vX9
👉Paper arxiv.org/pdf/2504.07955
👉Project https://lnkd.in/djz8jqn9
👉Repo https://lnkd.in/dfuEawSA
🤗Demo https://lnkd.in/dVYaWGcS
🔥3❤2👏2👍1