Computer Science and Programming

🌴🌴Direct-a-Video: driving Video Generation🌴🌴

👉Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion. Authors: City University of HK, Kuaishou Tech & Tianjin.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Decoupling camera/object motion in gen-AI
✅Allowing users to independently/jointly control
✅Novel temporal cross-attention for cam motion
✅Training-free spatial cross-attention for objects
✅Driving object generation via bounding boxes

hashtag#artificialintelligence hashtag#machinelearning hashtag#ml hashtag#AI hashtag#deeplearning hashtag#computervision hashtag#AIwithPapers hashtag#metaverse

👉Channel: @MachineLearning_Programming
👉Paper https://arxiv.org/pdf/2402.03162.pdf
👉Project https://direct-a-video.github.io/

👍9❤1

10.4K views05:11

LeGrad: Layerwise Explainability GRADient method for large ViT transformer architectures

Explore More:
💻DEMO: you may use demo
📖Read the Paper: Access Here
💻Source Code: Explore on GitHub

Relevance: #AI #machinelearning #deeplearning #computervision

join our community:
👉 @MachineLearning_Programming

👍7🔥1

8.82K views08:16

Computer Science and Programming

🧠 VGGT: Visual Geometry Grounded Transformer

⚡️ Fast, accurate 3D scene understanding from just a few images — no post-processing needed!

🎯 Outputs: camera pose, depth, point cloud & 3D tracking
🔥 Boosts tasks like view synthesis & non-rigid tracking

📄 Read the paper
💻 Code on GitHub

#AI #3D #Transformer #VGGT #ComputerVision

✅ @MachineLearning_Programming

❤7👍3🔥1

6.24K viewsedited 10:45

About

Blog

Apps

Platform