AI with Papers - Artificial Intelligence & Deep Learning

💃 Video Motion Graphs 💃

👉#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be released💙

👉Review https://t.ly/r4EGF
👉Paper https://lnkd.in/dK_tHyzh
👉Project https://lnkd.in/dE6c_KYZ
👉Repo TBA

❤15🔥7👍2👏1😍1🤣1

7.64K views06:51

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌳 Compose Anything is out 🌳

👉Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark released💙

👉Review https://t.ly/MEjzL
👉Paper https://arxiv.org/pdf/2504.02436
👉Project skyworkai.github.io/skyreels-a2.github.io/
👉Repo github.com/SkyworkAI/SkyReels-A2
🤗Models https://huggingface.co/Skywork/SkyReels-A2

❤9👍3😍2🔥1🤩1🤣1

7.77K views07:33

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

⛽ VoRA: Vision as LoRA ⛽

👉#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights available💙

👉Review https://t.ly/guNVN
👉Paper arxiv.org/pdf/2503.20680
👉Repo github.com/Hon-Wong/VoRA
👉Project georgeluimmortal.github.io/vora-homepage.github.io/

👍15❤7🤯4👏1

7.82K viewsedited 06:59

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐈 TTT Long Video Generation🐈

👉A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations released💙

👉Review https://t.ly/mhlTN
👉Paper arxiv.org/pdf/2504.05298
👉Project test-time-training.github.io/video-dit/
👉Repo github.com/test-time-training/ttt-video-dit

❤12🔥3😍2

6.8K views06:45

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

💛 Unified Scalable SVG Generator 💛

👉OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MIT💙

👉Review https://t.ly/JcR3I
👉Paper https://arxiv.org/pdf/2504.06263
👉Project https://omnisvg.github.io/
👉Repo github.com/OmniSVG/OmniSVG
👉Dataset https://huggingface.co/OmniSVG

❤15🔥2👍1👏1😍1

7.04K viewsedited 12:43

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧊BoxDreamer Object Pose🧊

👉BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo released💙

👉Review https://t.ly/e-vX9
👉Paper arxiv.org/pdf/2504.07955
👉Project https://lnkd.in/djz8jqn9
👉Repo https://lnkd.in/dfuEawSA
🤗Demo https://lnkd.in/dVYaWGcS

🔥3❤2👏2👍1

6.23K views06:35

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🥊 Pose in Combat Sports 🥊

👉The novel SOTA framework for an accurate physics-based #3D human pose estimation in combat sports w/ sparse multi-cameras setup. Dataset to be released soon💙

👉Review https://t.ly/EfcGL
👉Paper https://lnkd.in/deMMrKcA
👉Project https://lnkd.in/dkMS_UrH

👍13🔥4❤3🤯2

6.08K viewsedited 06:53

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

💥Geo4D: VideoGen 4D Scene💥

👉The Oxford VGG unveils Geo4D: video diffusion for monocular 4D reconstruction. Only synthetic data for training, but strong generalization to real world: point maps, depth & ray maps for the new SOTA in dynamic reconstruction. Code released💙

👉Review https://t.ly/X55Uj
👉Paper arxiv.org/pdf/2504.07961
👉Project geo4d.github.io/
👉Code github.com/jzr99/Geo4D

🔥12❤2👏1😍1

6.26K viewsedited 12:03

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍄 4D Mocap Human-Object 🍄

👉#Adobe unveils HUMOTO, HQ dataset of human-object interactions for motion generation, computer vision, and robotics: 700+ sequences (7,875 seconds @ 30FPS), interactions with 63 precisely modeled objects and 72 articulated parts

👉Review https://t.ly/lCof3
👉Paper https://lnkd.in/dVVBDd_c
👉Project https://lnkd.in/dwBcseDf

❤8👍2🔥1👏1

6.31K viewsedited 07:00

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🍏PartField #3D Part Segmentation🍏

👉#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License💙

👉Review https://t.ly/fGb2O
👉Paper https://lnkd.in/dGeyKSzG
👉Code https://lnkd.in/dbe57XGH
👉Project https://lnkd.in/dhEgf7X2

❤2🔥2🤯2

6.87K viewsedited 06:50

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐯UniAnimate-DiT: Human Animation🐯

👉UniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original model’s generative skills. Training and inference code released💙

👉Review https://t.ly/1I50N
👉Paper https://arxiv.org/pdf/2504.11289
👉Repo https://github.com/ali-vilab/UniAnimate-DiT

🔥9😍4👍2👏2

6.82K views12:29

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥General attention-based object🔥

👉GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.

👉Review https://t.ly/O7wqH
👉Paper https://lnkd.in/dc5VTUj9
👉Project https://lnkd.in/dzrt-qQV

🔥8👍3👏1😍1

7.15K views06:48

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔍Event Blurry Super-Resolution🔍

👉USTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under Apache💙

👉Review https://t.ly/x6hRs
👉Paper https://lnkd.in/dzbkCJMh
👉Repo https://lnkd.in/dmvsc-yS

🔥19❤8🤯5🤩1😍1

7.41K views07:18

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 #Apple Co-Motion is out! 🔥

👉Apple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for research💙

👉Review https://t.ly/-86CO
👉Paper https://lnkd.in/dQsVGY7q
👉Repo https://lnkd.in/dh7j7N89

👍7🤣6❤5🔥2😍1

6.44K viewsedited 06:49

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧊TAP in Persistent 3D Geometry🧊

👉TAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under Apache💙

👉Review https://t.ly/oooMy
👉Paper https://lnkd.in/d8uqjdE4
👉Project https://tapip3d.github.io/
👉Repo https://lnkd.in/dsvHP_8u

🔥7❤2😍2👍1👏1

6.42K views06:44

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦧 #Nvidia Describe Anything 🦧

👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗

👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU

🔥10👍5❤1

7.45K viewsedited 09:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

📍Moving Points -> Depth📍

👉KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released💙

👉Review https://t.ly/qA2P5
👉Paper https://lnkd.in/dpXDaQtM
👉Project https://lnkd.in/d9qWYsjP
👉Repo https://lnkd.in/dZEMDiJh

❤8🔥3👍1👏1

7.52K views07:03

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌼SOTA Textured 3D-Guided VTON🌼

👉#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released💙

👉Review https://t.ly/0tjdC
👉Paper https://lnkd.in/dFseYSXz
👉Project https://lnkd.in/djtqzrzs
👉Repo TBA

🤯9👍7❤4🔥2👏1

8.51K views07:19

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍏#Nvidia Dynamic Pose 🍏

👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙

👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m

🔥4👍2❤1🤯1😍1

8.22K viewsedited 07:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 S3MOT: SOTA 3D MOT 🔥

👉S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license💙

👉Review https://t.ly/H_JPv
👉Paper https://arxiv.org/pdf/2504.18068
👉Repo https://github.com/bytepioneerX/s3mot

🔥7😍2👍1

9.2K viewsedited 07:45

About

Blog

Apps

Platform