This media is not supported in your browser
VIEW IN TELEGRAM
🌴🌴Direct-a-Video: driving Video Generation🌴🌴
👉Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion. Authors: City University of HK, Kuaishou Tech & Tianjin.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Decoupling camera/object motion in gen-AI
✅Allowing users to independently/jointly control
✅Novel temporal cross-attention for cam motion
✅Training-free spatial cross-attention for objects
✅Driving object generation via bounding boxes
hashtag#artificialintelligence hashtag#machinelearning hashtag#ml hashtag#AI hashtag#deeplearning hashtag#computervision hashtag#AIwithPapers hashtag#metaverse
👉Channel: @MachineLearning_Programming
👉Paper https://arxiv.org/pdf/2402.03162.pdf
👉Project https://direct-a-video.github.io/
👉Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion. Authors: City University of HK, Kuaishou Tech & Tianjin.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Decoupling camera/object motion in gen-AI
✅Allowing users to independently/jointly control
✅Novel temporal cross-attention for cam motion
✅Training-free spatial cross-attention for objects
✅Driving object generation via bounding boxes
hashtag#artificialintelligence hashtag#machinelearning hashtag#ml hashtag#AI hashtag#deeplearning hashtag#computervision hashtag#AIwithPapers hashtag#metaverse
👉Channel: @MachineLearning_Programming
👉Paper https://arxiv.org/pdf/2402.03162.pdf
👉Project https://direct-a-video.github.io/
LeGrad: Layerwise Explainability GRADient method for large ViT transformer architectures
Explore More:
💻DEMO: you may use demo
📖Read the Paper: Access Here
💻Source Code: Explore on GitHub
Relevance: #AI #machinelearning #deeplearning #computervision
join our community:
👉 @MachineLearning_Programming
Explore More:
💻DEMO: you may use demo
📖Read the Paper: Access Here
💻Source Code: Explore on GitHub
Relevance: #AI #machinelearning #deeplearning #computervision
join our community:
👉 @MachineLearning_Programming
🧠 VGGT: Visual Geometry Grounded Transformer
⚡️ Fast, accurate 3D scene understanding from just a few images — no post-processing needed!
🎯 Outputs: camera pose, depth, point cloud & 3D tracking
🔥 Boosts tasks like view synthesis & non-rigid tracking
📄 Read the paper
💻 Code on GitHub
#AI #3D #Transformer #VGGT #ComputerVision
✅ @MachineLearning_Programming
⚡️ Fast, accurate 3D scene understanding from just a few images — no post-processing needed!
🎯 Outputs: camera pose, depth, point cloud & 3D tracking
🔥 Boosts tasks like view synthesis & non-rigid tracking
📄 Read the paper
💻 Code on GitHub
#AI #3D #Transformer #VGGT #ComputerVision
✅ @MachineLearning_Programming