AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
💙 Announcing #Py4AI 2025 💙

👉 The second edition of Py4AI conference is official! An all-day, fully free, event for #AI & #Python lovers.

𝐓𝐡𝐞 𝐟𝐢𝐫𝐬𝐭 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🚀Dana Aubakirova | Hugging Face🤗
🚀Yunhao Liu & Ruoya Sheng | ByteDance🔥
🚀Alice Casiraghi | 🌏🌎🌍
🚀Luca Arrotta, PhD | Datapizza🍕
🚀Valeria Zuccoli | Bettini Srl
🚀Mirco Planamente | ARGO Vision
🚀Daniele Zonca | Red Hat

👉 Info & registration: https://t.ly/37wWj
71👍1🔥1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎯RexSeek: Referring Any Object🎯

👉Novel referring detection model based on multimodal LLM to precisely locate objects based on user-input natural language. Model specialization on humans. Code released 💙

👉Review https://shorturl.at/CGsT2
👉Paper arxiv.org/pdf/2503.08507
👉Code github.com/IDEA-Research/RexSeek
👍176👏4🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🐶OVTR: E2E Transformer MOT🐶

👉HUST University proposes OVTR (End-to-End Open-Vocabulary Multiple Object Tracking with TRansformer), the first end-to-end open-vocabulary tracker that models motion, appearance, and category simultaneously. Source Code released under MIT💙

👉Review https://t.ly/K3ASX
👉Paper arxiv.org/pdf/2503.10616
👉Code https://github.com/jinyanglii/OVTR
🔥112👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🫀HyperFast Mycardium tracking🫀

👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙

👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker
👍117🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾 6D Tracking & Pose SOTA 🍾

👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙

👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose
👏63
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ VGG Transformer 🖲️

👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙

👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL
🤯25👍11🔥62🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸 Occluded 3D Reconstruction 🧸

👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙

👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R
👍6🔥42🤯2👏1
🌱 #Py4AI: line-up is official 🌱

👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇

𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥
Fanny Bouton | OVHcloud
🔥Full list:
https://t.ly/JJP8B
🔥31🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 IMPOSSIBLE Videos 🧞

👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙

👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos
🔥6👍21🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🥎LLM Spatial Understanding🥎

👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙

👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research
🔥3042🤯2😍2
This media is not supported in your browser
VIEW IN TELEGRAM
🙀3D MultiModal Memory🙀

👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET
🔥104👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Dereflection Any Image 🔥

👉SJTU & #Huawei unveils DAI, novel diffusion-based framework able to recover from a wide range of reflection types. One-step diffusion with deterministic outputs & fast inference. Inference, pretrained models & training released💙

👉Review https://t.ly/PDA9K
👉Paper https://arxiv.org/pdf/2503.17347
👉Project abuuu122.github.io/DAI.github.io/
👉Repo github.com/Abuuu122/Dereflection-Any-Image
🔥21🤯5👏42👍2😍1
🦎 Scaling Vision to 4K🦎

👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙

👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv
🔥144👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏓LATTE-MV: #3D Table Tennis🏓

👉UC Berkeley unveils at #CVPR2025 a novel system for reconstructing monocular video of table tennis in 3D with uncertainty-aware controller that anticipates opponent actions. Code & Dataset announced, to be released💙

👉Review https://t.ly/qPMOU
👉Paper arxiv.org/pdf/2503.20936
👉Project sastry-group.github.io/LATTE-MV/
👉Repo github.com/sastry-group/LATTE-MV
🔥8👍2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳MSVA Zero-Shot Multi-View🌳

👉Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announced💙

👉Review https://t.ly/LvuTh
👉Paper https://arxiv.org/pdf/2503.22430
👉Project https://nianticlabs.github.io/mvsanywhere/
👉Repo https://lnkd.in/ddQz9eps
🔥12👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐟Segment Any Motion in Video🐟

👉From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT license💙

👉Review https://t.ly/4aYjJ
👉Paper arxiv.org/pdf/2503.22268
👉Project motion-seg.github.io/
👉Repo github.com/nnanhuang/SegAnyMo
🔥5👍32🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💃 Video Motion Graphs 💃

👉#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be released💙

👉Review https://t.ly/r4EGF
👉Paper https://lnkd.in/dK_tHyzh
👉Project https://lnkd.in/dE6c_KYZ
👉Repo TBA
15🔥7👍2👏1😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 Compose Anything is out 🌳

👉Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark released💙

👉Review https://t.ly/MEjzL
👉Paper https://arxiv.org/pdf/2504.02436
👉Project skyworkai.github.io/skyreels-a2.github.io/
👉Repo github.com/SkyworkAI/SkyReels-A2
🤗Models https://huggingface.co/Skywork/SkyReels-A2
9👍3😍2🔥1🤩1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
VoRA: Vision as LoRA

👉#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights available💙

👉Review https://t.ly/guNVN
👉Paper arxiv.org/pdf/2503.20680
👉Repo github.com/Hon-Wong/VoRA
👉Project georgeluimmortal.github.io/vora-homepage.github.io/
👍157🤯4👏1