AI with Papers - Artificial Intelligence & Deep Learning
14.7K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’ƒ Video Motion Graphs πŸ’ƒ

πŸ‘‰#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be releasedπŸ’™

πŸ‘‰Review https://t.ly/r4EGF
πŸ‘‰Paper https://lnkd.in/dK_tHyzh
πŸ‘‰Project https://lnkd.in/dE6c_KYZ
πŸ‘‰Repo TBA
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 Compose Anything is out 🌳

πŸ‘‰Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark releasedπŸ’™

πŸ‘‰Review https://t.ly/MEjzL
πŸ‘‰Paper https://arxiv.org/pdf/2504.02436
πŸ‘‰Project skyworkai.github.io/skyreels-a2.github.io/
πŸ‘‰Repo github.com/SkyworkAI/SkyReels-A2
πŸ€—Models https://huggingface.co/Skywork/SkyReels-A2
This media is not supported in your browser
VIEW IN TELEGRAM
β›½ VoRA: Vision as LoRA β›½

πŸ‘‰#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights availableπŸ’™

πŸ‘‰Review https://t.ly/guNVN
πŸ‘‰Paper arxiv.org/pdf/2503.20680
πŸ‘‰Repo github.com/Hon-Wong/VoRA
πŸ‘‰Project georgeluimmortal.github.io/vora-homepage.github.io/
This media is not supported in your browser
VIEW IN TELEGRAM
🐈 TTT Long Video Generation🐈

πŸ‘‰A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations releasedπŸ’™

πŸ‘‰Review https://t.ly/mhlTN
πŸ‘‰Paper arxiv.org/pdf/2504.05298
πŸ‘‰Project test-time-training.github.io/video-dit/
πŸ‘‰Repo github.com/test-time-training/ttt-video-dit
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’› Unified Scalable SVG Generator πŸ’›

πŸ‘‰OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MITπŸ’™

πŸ‘‰Review https://t.ly/JcR3I
πŸ‘‰Paper https://arxiv.org/pdf/2504.06263
πŸ‘‰Project https://omnisvg.github.io/
πŸ‘‰Repo github.com/OmniSVG/OmniSVG
πŸ‘‰Dataset https://huggingface.co/OmniSVG
This media is not supported in your browser
VIEW IN TELEGRAM
🧊BoxDreamer Object Pose🧊

πŸ‘‰BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo releasedπŸ’™

πŸ‘‰Review https://t.ly/e-vX9
πŸ‘‰Paper arxiv.org/pdf/2504.07955
πŸ‘‰Project https://lnkd.in/djz8jqn9
πŸ‘‰Repo https://lnkd.in/dfuEawSA
πŸ€—Demo https://lnkd.in/dVYaWGcS
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯Š Pose in Combat Sports πŸ₯Š

πŸ‘‰The novel SOTA framework for an accurate physics-based #3D human pose estimation in combat sports w/ sparse multi-cameras setup. Dataset to be released soonπŸ’™

πŸ‘‰Review https://t.ly/EfcGL
πŸ‘‰Paper https://lnkd.in/deMMrKcA
πŸ‘‰Project https://lnkd.in/dkMS_UrH
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯Geo4D: VideoGen 4D SceneπŸ’₯

πŸ‘‰The Oxford VGG unveils Geo4D: video diffusion for monocular 4D reconstruction. Only synthetic data for training, but strong generalization to real world: point maps, depth & ray maps for the new SOTA in dynamic reconstruction. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/X55Uj
πŸ‘‰Paper arxiv.org/pdf/2504.07961
πŸ‘‰Project geo4d.github.io/
πŸ‘‰Code github.com/jzr99/Geo4D
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ„ 4D Mocap Human-Object πŸ„

πŸ‘‰#Adobe unveils HUMOTO, HQ dataset of human-object interactions for motion generation, computer vision, and robotics: 700+ sequences (7,875 seconds @ 30FPS), interactions with 63 precisely modeled objects and 72 articulated parts

πŸ‘‰Review https://t.ly/lCof3
πŸ‘‰Paper https://lnkd.in/dVVBDd_c
πŸ‘‰Project https://lnkd.in/dwBcseDf
This media is not supported in your browser
VIEW IN TELEGRAM
🍏PartField #3D Part Segmentation🍏

πŸ‘‰#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia LicenseπŸ’™

πŸ‘‰Review https://t.ly/fGb2O
πŸ‘‰Paper https://lnkd.in/dGeyKSzG
πŸ‘‰Code https://lnkd.in/dbe57XGH
πŸ‘‰Project https://lnkd.in/dhEgf7X2
This media is not supported in your browser
VIEW IN TELEGRAM
🐯UniAnimate-DiT: Human Animation🐯

πŸ‘‰UniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original model’s generative skills. Training and inference code releasedπŸ’™

πŸ‘‰Review https://t.ly/1I50N
πŸ‘‰Paper https://arxiv.org/pdf/2504.11289
πŸ‘‰Repo https://github.com/ali-vilab/UniAnimate-DiT
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯General attention-based objectπŸ”₯

πŸ‘‰GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.

πŸ‘‰Review https://t.ly/O7wqH
πŸ‘‰Paper https://lnkd.in/dc5VTUj9
πŸ‘‰Project https://lnkd.in/dzrt-qQV
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”Event Blurry Super-ResolutionπŸ”

πŸ‘‰USTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under ApacheπŸ’™

πŸ‘‰Review https://t.ly/x6hRs
πŸ‘‰Paper https://lnkd.in/dzbkCJMh
πŸ‘‰Repo https://lnkd.in/dmvsc-yS
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ #Apple Co-Motion is out! πŸ”₯

πŸ‘‰Apple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for researchπŸ’™

πŸ‘‰Review https://t.ly/-86CO
πŸ‘‰Paper https://lnkd.in/dQsVGY7q
πŸ‘‰Repo https://lnkd.in/dh7j7N89
This media is not supported in your browser
VIEW IN TELEGRAM
🧊TAP in Persistent 3D Geometry🧊

πŸ‘‰TAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under ApacheπŸ’™

πŸ‘‰Review https://t.ly/oooMy
πŸ‘‰Paper https://lnkd.in/d8uqjdE4
πŸ‘‰Project https://tapip3d.github.io/
πŸ‘‰Repo https://lnkd.in/dsvHP_8u
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

πŸ‘‰Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on πŸ€—

πŸ‘‰Review https://t.ly/la4JD
πŸ‘‰Paper https://lnkd.in/dZh82xtV
πŸ‘‰Project https://lnkd.in/dcv9V2ZF
πŸ‘‰Repo https://lnkd.in/dJB9Ehtb
πŸ€—Demo https://lnkd.in/dXDb2MWU
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Moving Points -> DepthπŸ“

πŸ‘‰KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be releasedπŸ’™

πŸ‘‰Review https://t.ly/qA2P5
πŸ‘‰Paper https://lnkd.in/dpXDaQtM
πŸ‘‰Project https://lnkd.in/d9qWYsjP
πŸ‘‰Repo https://lnkd.in/dZEMDiJh
This media is not supported in your browser
VIEW IN TELEGRAM
🌼SOTA Textured 3D-Guided VTON🌼

πŸ‘‰#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπŸ’™

πŸ‘‰Review https://t.ly/0tjdC
πŸ‘‰Paper https://lnkd.in/dFseYSXz
πŸ‘‰Project https://lnkd.in/djtqzrzs
πŸ‘‰Repo TBA
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

πŸ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπŸ’™

πŸ‘‰Review https://t.ly/wrcb0
πŸ‘‰Paper https://lnkd.in/dycGjAyy
πŸ‘‰Project https://lnkd.in/dDZ2Ej_Q
πŸ€—Data https://lnkd.in/d8yUSB7m
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ S3MOT: SOTA 3D MOT πŸ”₯

πŸ‘‰S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/H_JPv
πŸ‘‰Paper https://arxiv.org/pdf/2504.18068
πŸ‘‰Repo https://github.com/bytepioneerX/s3mot