AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 IMPOSSIBLE Videos 🧞

πŸ‘‰IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & πŸ€—-Data πŸ’™

πŸ‘‰Review https://t.ly/D7jhm
πŸ‘‰Paper arxiv.org/pdf/2503.14378
πŸ‘‰Project showlab.github.io/Impossible-Videos/
πŸ‘‰Repo github.com/showlab/Impossible-Videos
πŸ”₯6❀2πŸ‘2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯ŽLLM Spatial UnderstandingπŸ₯Ž

πŸ‘‰SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data πŸ’™

πŸ‘‰Review https://t.ly/ejr1s
πŸ‘‰Project manycore-research.github.io/SpatialLM/
πŸ‘‰Code github.com/manycore-research/SpatialLM
πŸ€—Models https://huggingface.co/manycore-research
πŸ”₯30❀4⚑2🀯2😍2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™€3D MultiModal MemoryπŸ™€

πŸ‘‰M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

πŸ‘‰Review https://t.ly/OrXZO
πŸ‘‰Paper arxiv.org/pdf/2503.16413
πŸ‘‰Project https://lnkd.in/dXAZ97KH
πŸ‘‰Repo https://lnkd.in/dWvunCET
πŸ”₯10❀4πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Dereflection Any Image πŸ”₯

πŸ‘‰SJTU & #Huawei unveils DAI, novel diffusion-based framework able to recover from a wide range of reflection types. One-step diffusion with deterministic outputs & fast inference. Inference, pretrained models & training releasedπŸ’™

πŸ‘‰Review https://t.ly/PDA9K
πŸ‘‰Paper https://arxiv.org/pdf/2503.17347
πŸ‘‰Project abuuu122.github.io/DAI.github.io/
πŸ‘‰Repo github.com/Abuuu122/Dereflection-Any-Image
πŸ”₯21🀯5πŸ‘4❀2πŸ‘2😍1
🦎 Scaling Vision to 4K🦎

πŸ‘‰PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & πŸ€— announcedπŸ’™

πŸ‘‰Review https://t.ly/WN479
πŸ‘‰Paper https://lnkd.in/ddWq8UpX
πŸ‘‰Project https://lnkd.in/dMkTY8-k
πŸ‘‰Repo https://lnkd.in/d9YSB6yv
πŸ”₯14❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“LATTE-MV: #3D Table TennisπŸ“

πŸ‘‰UC Berkeley unveils at #CVPR2025 a novel system for reconstructing monocular video of table tennis in 3D with uncertainty-aware controller that anticipates opponent actions. Code & Dataset announced, to be releasedπŸ’™

πŸ‘‰Review https://t.ly/qPMOU
πŸ‘‰Paper arxiv.org/pdf/2503.20936
πŸ‘‰Project sastry-group.github.io/LATTE-MV/
πŸ‘‰Repo github.com/sastry-group/LATTE-MV
πŸ”₯8πŸ‘2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳MSVA Zero-Shot Multi-View🌳

πŸ‘‰Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announcedπŸ’™

πŸ‘‰Review https://t.ly/LvuTh
πŸ‘‰Paper https://arxiv.org/pdf/2503.22430
πŸ‘‰Project https://nianticlabs.github.io/mvsanywhere/
πŸ‘‰Repo https://lnkd.in/ddQz9eps
πŸ”₯12πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐟Segment Any Motion in Video🐟

πŸ‘‰From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/4aYjJ
πŸ‘‰Paper arxiv.org/pdf/2503.22268
πŸ‘‰Project motion-seg.github.io/
πŸ‘‰Repo github.com/nnanhuang/SegAnyMo
πŸ”₯5πŸ‘3❀2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’ƒ Video Motion Graphs πŸ’ƒ

πŸ‘‰#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be releasedπŸ’™

πŸ‘‰Review https://t.ly/r4EGF
πŸ‘‰Paper https://lnkd.in/dK_tHyzh
πŸ‘‰Project https://lnkd.in/dE6c_KYZ
πŸ‘‰Repo TBA
❀15πŸ”₯7πŸ‘2πŸ‘1😍1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 Compose Anything is out 🌳

πŸ‘‰Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark releasedπŸ’™

πŸ‘‰Review https://t.ly/MEjzL
πŸ‘‰Paper https://arxiv.org/pdf/2504.02436
πŸ‘‰Project skyworkai.github.io/skyreels-a2.github.io/
πŸ‘‰Repo github.com/SkyworkAI/SkyReels-A2
πŸ€—Models https://huggingface.co/Skywork/SkyReels-A2
❀9πŸ‘3😍2πŸ”₯1🀩1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
β›½ VoRA: Vision as LoRA β›½

πŸ‘‰#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights availableπŸ’™

πŸ‘‰Review https://t.ly/guNVN
πŸ‘‰Paper arxiv.org/pdf/2503.20680
πŸ‘‰Repo github.com/Hon-Wong/VoRA
πŸ‘‰Project georgeluimmortal.github.io/vora-homepage.github.io/
πŸ‘15❀7🀯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐈 TTT Long Video Generation🐈

πŸ‘‰A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations releasedπŸ’™

πŸ‘‰Review https://t.ly/mhlTN
πŸ‘‰Paper arxiv.org/pdf/2504.05298
πŸ‘‰Project test-time-training.github.io/video-dit/
πŸ‘‰Repo github.com/test-time-training/ttt-video-dit
❀12πŸ”₯3😍2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’› Unified Scalable SVG Generator πŸ’›

πŸ‘‰OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MITπŸ’™

πŸ‘‰Review https://t.ly/JcR3I
πŸ‘‰Paper https://arxiv.org/pdf/2504.06263
πŸ‘‰Project https://omnisvg.github.io/
πŸ‘‰Repo github.com/OmniSVG/OmniSVG
πŸ‘‰Dataset https://huggingface.co/OmniSVG
❀15πŸ”₯2πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊BoxDreamer Object Pose🧊

πŸ‘‰BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo releasedπŸ’™

πŸ‘‰Review https://t.ly/e-vX9
πŸ‘‰Paper arxiv.org/pdf/2504.07955
πŸ‘‰Project https://lnkd.in/djz8jqn9
πŸ‘‰Repo https://lnkd.in/dfuEawSA
πŸ€—Demo https://lnkd.in/dVYaWGcS
πŸ”₯3❀2πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯Š Pose in Combat Sports πŸ₯Š

πŸ‘‰The novel SOTA framework for an accurate physics-based #3D human pose estimation in combat sports w/ sparse multi-cameras setup. Dataset to be released soonπŸ’™

πŸ‘‰Review https://t.ly/EfcGL
πŸ‘‰Paper https://lnkd.in/deMMrKcA
πŸ‘‰Project https://lnkd.in/dkMS_UrH
πŸ‘13πŸ”₯4❀3🀯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯Geo4D: VideoGen 4D SceneπŸ’₯

πŸ‘‰The Oxford VGG unveils Geo4D: video diffusion for monocular 4D reconstruction. Only synthetic data for training, but strong generalization to real world: point maps, depth & ray maps for the new SOTA in dynamic reconstruction. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/X55Uj
πŸ‘‰Paper arxiv.org/pdf/2504.07961
πŸ‘‰Project geo4d.github.io/
πŸ‘‰Code github.com/jzr99/Geo4D
πŸ”₯12❀2πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ„ 4D Mocap Human-Object πŸ„

πŸ‘‰#Adobe unveils HUMOTO, HQ dataset of human-object interactions for motion generation, computer vision, and robotics: 700+ sequences (7,875 seconds @ 30FPS), interactions with 63 precisely modeled objects and 72 articulated parts

πŸ‘‰Review https://t.ly/lCof3
πŸ‘‰Paper https://lnkd.in/dVVBDd_c
πŸ‘‰Project https://lnkd.in/dwBcseDf
❀8πŸ‘2πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏PartField #3D Part Segmentation🍏

πŸ‘‰#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia LicenseπŸ’™

πŸ‘‰Review https://t.ly/fGb2O
πŸ‘‰Paper https://lnkd.in/dGeyKSzG
πŸ‘‰Code https://lnkd.in/dbe57XGH
πŸ‘‰Project https://lnkd.in/dhEgf7X2
❀2πŸ”₯2🀯2
This media is not supported in your browser
VIEW IN TELEGRAM
🐯UniAnimate-DiT: Human Animation🐯

πŸ‘‰UniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original model’s generative skills. Training and inference code releasedπŸ’™

πŸ‘‰Review https://t.ly/1I50N
πŸ‘‰Paper https://arxiv.org/pdf/2504.11289
πŸ‘‰Repo https://github.com/ali-vilab/UniAnimate-DiT
πŸ”₯9😍4πŸ‘2πŸ‘2