AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🫀HyperFast Mycardium tracking🫀

👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙

👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker
👍117🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾 6D Tracking & Pose SOTA 🍾

👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙

👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose
👏63
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ VGG Transformer 🖲️

👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙

👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL
🤯25👍11🔥62🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸 Occluded 3D Reconstruction 🧸

👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙

👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R
👍6🔥42🤯2👏1
🌱 #Py4AI: line-up is official 🌱

👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇

𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥
Fanny Bouton | OVHcloud
🔥Full list:
https://t.ly/JJP8B
🔥31🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 IMPOSSIBLE Videos 🧞

👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙

👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos
🔥6👍21🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🥎LLM Spatial Understanding🥎

👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙

👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research
🔥3042🤯2😍2
This media is not supported in your browser
VIEW IN TELEGRAM
🙀3D MultiModal Memory🙀

👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET
🔥104👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Dereflection Any Image 🔥

👉SJTU & #Huawei unveils DAI, novel diffusion-based framework able to recover from a wide range of reflection types. One-step diffusion with deterministic outputs & fast inference. Inference, pretrained models & training released💙

👉Review https://t.ly/PDA9K
👉Paper https://arxiv.org/pdf/2503.17347
👉Project abuuu122.github.io/DAI.github.io/
👉Repo github.com/Abuuu122/Dereflection-Any-Image
🔥21🤯5👏42👍2😍1
🦎 Scaling Vision to 4K🦎

👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙

👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv
🔥144👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏓LATTE-MV: #3D Table Tennis🏓

👉UC Berkeley unveils at #CVPR2025 a novel system for reconstructing monocular video of table tennis in 3D with uncertainty-aware controller that anticipates opponent actions. Code & Dataset announced, to be released💙

👉Review https://t.ly/qPMOU
👉Paper arxiv.org/pdf/2503.20936
👉Project sastry-group.github.io/LATTE-MV/
👉Repo github.com/sastry-group/LATTE-MV
🔥8👍2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳MSVA Zero-Shot Multi-View🌳

👉Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announced💙

👉Review https://t.ly/LvuTh
👉Paper https://arxiv.org/pdf/2503.22430
👉Project https://nianticlabs.github.io/mvsanywhere/
👉Repo https://lnkd.in/ddQz9eps
🔥12👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐟Segment Any Motion in Video🐟

👉From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT license💙

👉Review https://t.ly/4aYjJ
👉Paper arxiv.org/pdf/2503.22268
👉Project motion-seg.github.io/
👉Repo github.com/nnanhuang/SegAnyMo
🔥5👍32🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💃 Video Motion Graphs 💃

👉#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be released💙

👉Review https://t.ly/r4EGF
👉Paper https://lnkd.in/dK_tHyzh
👉Project https://lnkd.in/dE6c_KYZ
👉Repo TBA
15🔥7👍2👏1😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 Compose Anything is out 🌳

👉Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark released💙

👉Review https://t.ly/MEjzL
👉Paper https://arxiv.org/pdf/2504.02436
👉Project skyworkai.github.io/skyreels-a2.github.io/
👉Repo github.com/SkyworkAI/SkyReels-A2
🤗Models https://huggingface.co/Skywork/SkyReels-A2
9👍3😍2🔥1🤩1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
VoRA: Vision as LoRA

👉#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights available💙

👉Review https://t.ly/guNVN
👉Paper arxiv.org/pdf/2503.20680
👉Repo github.com/Hon-Wong/VoRA
👉Project georgeluimmortal.github.io/vora-homepage.github.io/
👍157🤯4👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐈 TTT Long Video Generation🐈

👉A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations released💙

👉Review https://t.ly/mhlTN
👉Paper arxiv.org/pdf/2504.05298
👉Project test-time-training.github.io/video-dit/
👉Repo github.com/test-time-training/ttt-video-dit
12🔥3😍2
This media is not supported in your browser
VIEW IN TELEGRAM
💛 Unified Scalable SVG Generator 💛

👉OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MIT💙

👉Review https://t.ly/JcR3I
👉Paper https://arxiv.org/pdf/2504.06263
👉Project https://omnisvg.github.io/
👉Repo github.com/OmniSVG/OmniSVG
👉Dataset https://huggingface.co/OmniSVG
15🔥2👍1👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊BoxDreamer Object Pose🧊

👉BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo released💙

👉Review https://t.ly/e-vX9
👉Paper arxiv.org/pdf/2504.07955
👉Project https://lnkd.in/djz8jqn9
👉Repo https://lnkd.in/dfuEawSA
🤗Demo https://lnkd.in/dVYaWGcS
🔥32👏2👍1