AI with Papers - Artificial Intelligence & Deep Learning
15.6K subscribers
145 photos
260 videos
14 files
1.36K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
πŸ• Upsample Anything πŸ•

πŸ‘‰Upsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paperπŸ’™

πŸ‘‰Review https://t.ly/7LE6G
πŸ‘‰Paper https://lnkd.in/dsUfdtih
πŸ”₯8❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦞Single Synthetic Image per Class🦞

πŸ‘‰MIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo availableπŸ’™

πŸ‘‰Review https://t.ly/dD3un
πŸ‘‰Paper arxiv.org/pdf/2511.16674
πŸ‘‰Project linear-gradient-matching.github.io/
πŸ‘‰Repo github.com/GeorgeCazenavette/linear-gradient-matching
1❀6πŸ”₯2πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§ͺ EfficientSAM3 is out πŸ§ͺ

πŸ‘‰Bristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)πŸ’™

πŸ‘‰Review https://t.ly/bfXP2
πŸ‘‰Paper arxiv.org/pdf/2511.15833
πŸ‘‰Project simonzeng7108.github.io/efficientsam3/
πŸ‘‰Repo github.com/SimonZeng7108/efficientsam3
❀6πŸ‘2πŸ”₯1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌩️ Cloud4D in time 🌩️

πŸ‘‰Cloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data releasedπŸ’™

πŸ‘‰Review https://t.ly/w7Zly
πŸ‘‰Paper arxiv.org/pdf/2511.19431
πŸ‘‰Project cloud4d.jacob-lin.com/
πŸ‘‰Data https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
πŸ‘‰Repo TBA
πŸ”₯9❀2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“MotionV2V: Editing Motion in VideoπŸ“

πŸ‘‰ Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soonπŸ’™

πŸ‘‰Review https://t.ly/s0sIT
πŸ‘‰Paper https://arxiv.org/pdf/2511.20640
πŸ‘‰Project https://ryanndagreat.github.io/MotionV2V/
πŸ‘‰Repo https://github.com/RyannDaGreat/MotionV2V
❀8πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Smell Like Vision Spirit πŸ”₯

πŸ‘‰New York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset availableπŸ’™

πŸ‘‰Review https://t.ly/Ycn_B
πŸ‘‰Paper arxiv.org/pdf/2511.20544
πŸ‘‰Project smell.cs.columbia.edu/
❀14πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ•ΆοΈ Seeing without Pixels πŸ•ΆοΈ

πŸ‘‰Is it possible to perceive a video’s content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible questionπŸ’™

πŸ‘‰Review https://t.ly/Ymd1c
πŸ‘‰Paper arxiv.org/pdf/2511.21681
πŸ‘‰Project sites.google.com/view/seeing-without-pixels
πŸ”₯8❀5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌡Instance-Level Video Generation🌡

πŸ‘‰InstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announcedπŸ’™

πŸ‘‰Review https://t.ly/y_TBT
πŸ‘‰Paper arxiv.org/pdf/2511.23146
πŸ‘‰Project aliothchen.github.io/projects/InstanceV/
πŸ‘‰Repo TBA
❀9πŸ‘4
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯­3D Point Motion EditingπŸ₯­

πŸ‘‰Edit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevantπŸ’™

πŸ‘‰Review https://t.ly/GJHJ5
πŸ‘‰Paper arxiv.org/pdf/2512.02015
πŸ‘‰Project edit-by-track.github.io/
πŸ”₯4❀3🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦„ Native Unified Multimodal πŸ¦„

πŸ‘‰META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal reviewπŸ’™

πŸ‘‰Review https://t.ly/7wmKP
πŸ‘‰Paper https://lnkd.in/djT4WGEU
πŸ‘‰Project https://tuna-ai.org/
πŸ‘‰Repo github.com/wren93/tuna
❀6πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
✌️SOTA Generative SLP✌️

πŸ‘‰Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data πŸ’™

πŸ‘‰Review https://t.ly/yKZhn
πŸ‘‰Paper arxiv.org/pdf/2512.04048
πŸ‘‰Project stablesigner.github.io/
πŸ‘‰Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
❀5πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐘TTSC for 3D Generative🐘

πŸ‘‰SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/1zrah
πŸ‘‰Paper https://lnkd.in/dEWh3vep
πŸ‘‰Project https://lnkd.in/dScftUmm
πŸ‘‰Repo TBA
❀8πŸ”₯2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷Layered PSD Diffusion🎷

πŸ‘‰OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online DemoπŸ’™

πŸ‘‰Review https://t.ly/YNRAC
πŸ‘‰Paper arxiv.org/pdf/2512.09247
πŸ‘‰Project showlab.github.io/OmniPSD/
πŸ‘‰Demo https://www.lovart.ai/it
πŸ”₯9❀7πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
🧱Pixel Art Volumetric Rendering🧱

πŸ‘‰Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/qPyNl
πŸ‘‰Paper https://lnkd.in/du5ikJGN
πŸ‘‰Project https://lnkd.in/dpiAjj5m
πŸ‘‰Repo TBA
❀6πŸ”₯4
This media is not supported in your browser
VIEW IN TELEGRAM
🫎 MoCapAnything is out 🫎

πŸ‘‰MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced πŸ₯²

πŸ‘‰Review https://t.ly/_Tw6t
πŸ‘‰Paper arxiv.org/pdf/2512.10881
πŸ‘‰Project animotionlab.github.io/MoCapAnything
❀11πŸ‘4πŸ”₯4πŸ‘1🀯1😒1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’š MatAnyone 2 is out! πŸ’š

πŸ‘‰MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/vxOBO
πŸ‘‰Paper arxiv.org/pdf/2512.11782
πŸ‘‰Project pq-yang.github.io/projects/MatAnyone2
πŸ‘‰Repo github.com/pq-yang/MatAnyone2
πŸ”₯5❀4πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’· SOTA Zero-Shot Stereo MatchingπŸ’·

πŸ‘‰Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announcedπŸ’™

πŸ‘‰Review https://t.ly/XD6pO
πŸ‘‰Paper https://lnkd.in/d9_YKW2A
πŸ‘‰Project https://lnkd.in/dKDxm7EX
πŸ‘‰Repo https://lnkd.in/dR4-PdsW
2πŸ”₯11❀4
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘€DriverGaze360: Driver SOTAπŸ‘€

πŸ‘‰DriverGaze360 is a large-scale 360β—¦ field of view driver attention dataset, containing ∼1M gaze-labeled frames. Code & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/ZcoUw
πŸ‘‰Paper arxiv.org/pdf/2512.14266
πŸ‘‰Project av.dfki.de/drivergaze360/
πŸ‘‰Repo github.com/dfki-av/drivergaze360
πŸ‘‰Data av.dfki.de/drivergaze360/dataset
πŸ”₯11❀4
This media is not supported in your browser
VIEW IN TELEGRAM
🫠FlexAvatar: 3D Heads🫠

πŸ‘‰TUM introduces FlexAvatar, a novel method for creating HQ and complete 3D head avatars from a single image. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/Rkdtd
πŸ‘‰Paper arxiv.org/pdf/2512.15599
πŸ‘‰Project tobias-kirschstein.github.io/flexavatar/
πŸ‘‰Repo TBA
πŸ”₯7❀3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🏜️ Depth Any Panoramas 🏜️

πŸ‘‰DAP is the new SOTA foundation model for panoramic depth estimation with a large scale dataset. Data & Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/LaUmd
πŸ‘‰Paper arxiv.org/pdf/2512.16913
πŸ‘‰Project https://lnkd.in/dvqNV9jx
πŸ‘‰Repo https://lnkd.in/dmNzhb-7
πŸ‘‰Demo https://lnkd.in/dDwjMF3u
πŸ”₯8❀4πŸ‘2