This media is not supported in your browser
VIEW IN TELEGRAM
π΄ Direct-a-Video Generation π΄
πDirect-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
πReview https://t.ly/dZSLs
πPaper arxiv.org/pdf/2402.03162.pdf
πProject https://direct-a-video.github.io/
πDirect-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
πReview https://t.ly/dZSLs
πPaper arxiv.org/pdf/2402.03162.pdf
πProject https://direct-a-video.github.io/
π₯7π3β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π Graph Neural Network in TF π
π#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license π
πReview https://t.ly/TQfg-
πCode github.com/tensorflow/gnn
πBlog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
π#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license π
πReview https://t.ly/TQfg-
πCode github.com/tensorflow/gnn
πBlog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
β€17π4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Magic-Me: ID-Specific Video π
π#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
πReview https://t.ly/qjJ2O
πPaper arxiv.org/pdf/2402.09368.pdf
πProject magic-me-webpage.github.io
πCode github.com/Zhen-Dong/Magic-Me
π#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
πReview https://t.ly/qjJ2O
πPaper arxiv.org/pdf/2402.09368.pdf
πProject magic-me-webpage.github.io
πCode github.com/Zhen-Dong/Magic-Me
β€6π₯°1π€―1π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Breaking: GEMINI 1.5 is out π₯
πGemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview π«
πReview https://t.ly/Vblvx
πMore: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
πGemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview π«
πReview https://t.ly/Vblvx
πMore: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
π€―17π4π±2
AI with Papers - Artificial Intelligence & Deep Learning
π Seeing Through Occlusions π πNovel NSF to see through occlusions, reflection suppression & shadow removal. πReview https://t.ly/5jcIG πProject https://light.princeton.edu/publication/nsf πPaper https://arxiv.org/pdf/2312.14235.pdf πRepo https://giβ¦
π₯ Seeing Through Occlusions: code is out π₯
πRepo: https://github.com/princeton-computational-imaging/NSF
πRepo: https://github.com/princeton-computational-imaging/NSF
β€4π₯3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ One2Avatar: Pic -> 3D Avatar βοΈ
π#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
πReview https://t.ly/AS1oc
πPaper arxiv.org/pdf/2402.11909.pdf
πProject zhixuany.github.io/one2avatar_webpage/
π#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
πReview https://t.ly/AS1oc
πPaper arxiv.org/pdf/2402.11909.pdf
πProject zhixuany.github.io/one2avatar_webpage/
π12β€3π€©3π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ BOG: Fine Geometric Views πͺ
π #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
πReview https://t.ly/E6T0W
πPaper https://lnkd.in/dQEq3zy6
πProject https://lnkd.in/dYYCadx9
πDemo https://lnkd.in/d92R6QME
π #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
πReview https://t.ly/E6T0W
πPaper https://lnkd.in/dQEq3zy6
πProject https://lnkd.in/dYYCadx9
πDemo https://lnkd.in/d92R6QME
π₯8π€―4π3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦₯Neuromorphic Video Binarizationπ¦₯
π University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
πReview https://t.ly/V-NFa
πPaper arxiv.org/pdf/2402.12644.pdf
πProject github.com/eleboss/EBR
π University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
πReview https://t.ly/V-NFa
πPaper arxiv.org/pdf/2402.12644.pdf
πProject github.com/eleboss/EBR
β€15π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©» Pose via Ray Diffusion π©»
πNovel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released π
πReview https://t.ly/qBsFK
πPaper arxiv.org/pdf/2402.14817.pdf
πProject jasonyzhang.com/RayDiffusion
πCode github.com/jasonyzhang/RayDiffusion
πNovel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released π
πReview https://t.ly/qBsFK
πPaper arxiv.org/pdf/2402.14817.pdf
πProject jasonyzhang.com/RayDiffusion
πCode github.com/jasonyzhang/RayDiffusion
π₯17β€6π€―3π1π1πΎ1
ποΈ MATH-Vision Dataset ποΈ
πMATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released π
πReview https://t.ly/gmIAu
πPaper arxiv.org/pdf/2402.14804.pdf
πProject mathvision-cuhk.github.io/
πCode github.com/mathvision-cuhk/MathVision
πMATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released π
πReview https://t.ly/gmIAu
πPaper arxiv.org/pdf/2402.14804.pdf
πProject mathvision-cuhk.github.io/
πCode github.com/mathvision-cuhk/MathVision
π€―8π₯4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«
FlowMDM: Human Compositionπ«
πFlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
πReview https://t.ly/pr2g_
πPaper https://lnkd.in/daYRftdF
πProject https://lnkd.in/dcRkv5Pc
πRepo https://lnkd.in/dw-3JJks
πFlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
πReview https://t.ly/pr2g_
πPaper https://lnkd.in/daYRftdF
πProject https://lnkd.in/dcRkv5Pc
πRepo https://lnkd.in/dw-3JJks
β€9π₯6π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π·EMO: talking/singing Gen-AI π·
πEMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
πReview https://t.ly/4IYj5
πPaper https://lnkd.in/dGPX2-Yc
πProject https://lnkd.in/dyf6p_N3
πRepo (empty) github.com/HumanAIGC/EMO
πEMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
πReview https://t.ly/4IYj5
πPaper https://lnkd.in/dGPX2-Yc
πProject https://lnkd.in/dyf6p_N3
πRepo (empty) github.com/HumanAIGC/EMO
β€18π₯7π4π€―3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-LoRA Composition π
πTwo novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released π
πReview https://t.ly/GFy3Z
πPaper arxiv.org/pdf/2402.16843.pdf
πCode github.com/maszhongming/Multi-LoRA-Composition
πTwo novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released π
πReview https://t.ly/GFy3Z
πPaper arxiv.org/pdf/2402.16843.pdf
πCode github.com/maszhongming/Multi-LoRA-Composition
π11β€6π₯2π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ MM-AU: Video Accident π₯
πMM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced π
πReview https://t.ly/a-jKI
πPaper arxiv.org/pdf/2403.00436.pdf
πDataset http://www.lotvsmmau.net/MMAU/demo
πMM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced π
πReview https://t.ly/a-jKI
πPaper arxiv.org/pdf/2403.00436.pdf
πDataset http://www.lotvsmmau.net/MMAU/demo
π11β€2π₯2π€―2
π₯ SOTA: Stable Diffusion 3 is out! π₯
πStable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released π
πReview https://t.ly/a1koo
πPaper https://lnkd.in/d4i-9Bte
πBlog https://lnkd.in/d-bEX-ww
πStable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released π
πReview https://t.ly/a1koo
πPaper https://lnkd.in/d4i-9Bte
πBlog https://lnkd.in/d-bEX-ww
π₯19β€5π3β‘1π1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π§΅E-LoFTR: new Feats-Matching SOTAπ§΅
πA novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5Γ faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
πReview https://t.ly/7SPmC
πPaper https://arxiv.org/pdf/2403.04765.pdf
πProject https://zju3dv.github.io/efficientloftr/
πRepo https://github.com/zju3dv/efficientloftr
πA novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5Γ faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
πReview https://t.ly/7SPmC
πPaper https://arxiv.org/pdf/2403.04765.pdf
πProject https://zju3dv.github.io/efficientloftr/
πRepo https://github.com/zju3dv/efficientloftr
π₯13π4π€―2β€1
π¦StableDrag: Point-based Editingπ¦
π#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
πReview https://t.ly/eUI05
πPaper https://lnkd.in/dz8-ymck
πProject stabledrag.github.io/
π#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
πReview https://t.ly/eUI05
πPaper https://lnkd.in/dz8-ymck
πProject stabledrag.github.io/
β€2π1π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ PIXART-Ξ£: 4K Generation ποΈ
πPixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced π
πReview https://t.ly/Cm2Qh
πPaper arxiv.org/pdf/2403.04692.pdf
πProject pixart-alpha.github.io/PixArt-sigma-project/
πRepo (empty) github.com/PixArt-alpha/PixArt-sigma
π€-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
πPixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced π
πReview https://t.ly/Cm2Qh
πPaper arxiv.org/pdf/2403.04692.pdf
πProject pixart-alpha.github.io/PixArt-sigma-project/
πRepo (empty) github.com/PixArt-alpha/PixArt-sigma
π€-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
π₯7β‘1β€1π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πΊ Can GPT-4 play DOOM? πΊ
πApparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
πReview https://t.ly/W8-0F
πPaper https://lnkd.in/dmsB7bjA
πProject https://lnkd.in/ddDPwjQB
πApparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
πReview https://t.ly/W8-0F
πPaper https://lnkd.in/dmsB7bjA
πProject https://lnkd.in/ddDPwjQB
π€―8π©7π₯2π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺRT Humanoid from Head-Mounted Sensorsπͺ
π#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
πReview https://t.ly/Si2Mp
πPaper arxiv.org/pdf/2403.06862.pdf
πProject www.zhengyiluo.com/SimXR/
π#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
πReview https://t.ly/Si2Mp
πPaper arxiv.org/pdf/2403.06862.pdf
πProject www.zhengyiluo.com/SimXR/
β€12β‘1π1