This media is not supported in your browser
VIEW IN TELEGRAM
๐ต G-Splatting Portraits ๐ต
๐From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction
๐Review https://t.ly/fq71w
๐Paper https://arxiv.org/pdf/2402.03723.pdf
๐Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
๐From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction
๐Review https://t.ly/fq71w
๐Paper https://arxiv.org/pdf/2402.03723.pdf
๐Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
๐ฅ13โค3๐1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Up to 69x Faster SAM ๐
๐EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโs lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
๐Review https://t.ly/zGiE9
๐Paper arxiv.org/pdf/2402.05008.pdf
๐Code github.com/mit-han-lab/efficientvit
๐EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโs lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
๐Review https://t.ly/zGiE9
๐Paper arxiv.org/pdf/2402.05008.pdf
๐Code github.com/mit-han-lab/efficientvit
๐ฅ19๐7โค4๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ด Direct-a-Video Generation ๐ด
๐Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
๐Review https://t.ly/dZSLs
๐Paper arxiv.org/pdf/2402.03162.pdf
๐Project https://direct-a-video.github.io/
๐Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
๐Review https://t.ly/dZSLs
๐Paper arxiv.org/pdf/2402.03162.pdf
๐Project https://direct-a-video.github.io/
๐ฅ7๐3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Graph Neural Network in TF ๐
๐#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license ๐
๐Review https://t.ly/TQfg-
๐Code github.com/tensorflow/gnn
๐Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
๐#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license ๐
๐Review https://t.ly/TQfg-
๐Code github.com/tensorflow/gnn
๐Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
โค17๐4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Magic-Me: ID-Specific Video ๐
๐#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
๐Review https://t.ly/qjJ2O
๐Paper arxiv.org/pdf/2402.09368.pdf
๐Project magic-me-webpage.github.io
๐Code github.com/Zhen-Dong/Magic-Me
๐#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
๐Review https://t.ly/qjJ2O
๐Paper arxiv.org/pdf/2402.09368.pdf
๐Project magic-me-webpage.github.io
๐Code github.com/Zhen-Dong/Magic-Me
โค6๐ฅฐ1๐คฏ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Breaking: GEMINI 1.5 is out ๐ฅ
๐Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview ๐ซ
๐Review https://t.ly/Vblvx
๐More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
๐Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview ๐ซ
๐Review https://t.ly/Vblvx
๐More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
๐คฏ17๐4๐ฑ2
AI with Papers - Artificial Intelligence & Deep Learning
๐ Seeing Through Occlusions ๐ ๐Novel NSF to see through occlusions, reflection suppression & shadow removal. ๐Review https://t.ly/5jcIG ๐Project https://light.princeton.edu/publication/nsf ๐Paper https://arxiv.org/pdf/2312.14235.pdf ๐Repo https://giโฆ
๐ฅ Seeing Through Occlusions: code is out ๐ฅ
๐Repo: https://github.com/princeton-computational-imaging/NSF
๐Repo: https://github.com/princeton-computational-imaging/NSF
โค4๐ฅ3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธ One2Avatar: Pic -> 3D Avatar โ๏ธ
๐#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
๐Review https://t.ly/AS1oc
๐Paper arxiv.org/pdf/2402.11909.pdf
๐Project zhixuany.github.io/one2avatar_webpage/
๐#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
๐Review https://t.ly/AS1oc
๐Paper arxiv.org/pdf/2402.11909.pdf
๐Project zhixuany.github.io/one2avatar_webpage/
๐12โค3๐คฉ3๐ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช BOG: Fine Geometric Views ๐ช
๐ #Google (+Tรผbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
๐Review https://t.ly/E6T0W
๐Paper https://lnkd.in/dQEq3zy6
๐Project https://lnkd.in/dYYCadx9
๐Demo https://lnkd.in/d92R6QME
๐ #Google (+Tรผbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
๐Review https://t.ly/E6T0W
๐Paper https://lnkd.in/dQEq3zy6
๐Project https://lnkd.in/dYYCadx9
๐Demo https://lnkd.in/d92R6QME
๐ฅ8๐คฏ4๐3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆฅNeuromorphic Video Binarization๐ฆฅ
๐ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
๐Review https://t.ly/V-NFa
๐Paper arxiv.org/pdf/2402.12644.pdf
๐Project github.com/eleboss/EBR
๐ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
๐Review https://t.ly/V-NFa
๐Paper arxiv.org/pdf/2402.12644.pdf
๐Project github.com/eleboss/EBR
โค15๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฉป Pose via Ray Diffusion ๐ฉป
๐Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released ๐
๐Review https://t.ly/qBsFK
๐Paper arxiv.org/pdf/2402.14817.pdf
๐Project jasonyzhang.com/RayDiffusion
๐Code github.com/jasonyzhang/RayDiffusion
๐Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released ๐
๐Review https://t.ly/qBsFK
๐Paper arxiv.org/pdf/2402.14817.pdf
๐Project jasonyzhang.com/RayDiffusion
๐Code github.com/jasonyzhang/RayDiffusion
๐ฅ17โค6๐คฏ3๐1๐1๐พ1
๐๏ธ MATH-Vision Dataset ๐๏ธ
๐MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐
๐Review https://t.ly/gmIAu
๐Paper arxiv.org/pdf/2402.14804.pdf
๐Project mathvision-cuhk.github.io/
๐Code github.com/mathvision-cuhk/MathVision
๐MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐
๐Review https://t.ly/gmIAu
๐Paper arxiv.org/pdf/2402.14804.pdf
๐Project mathvision-cuhk.github.io/
๐Code github.com/mathvision-cuhk/MathVision
๐คฏ8๐ฅ4๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ
FlowMDM: Human Composition๐ซ
๐FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
๐Review https://t.ly/pr2g_
๐Paper https://lnkd.in/daYRftdF
๐Project https://lnkd.in/dcRkv5Pc
๐Repo https://lnkd.in/dw-3JJks
๐FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
๐Review https://t.ly/pr2g_
๐Paper https://lnkd.in/daYRftdF
๐Project https://lnkd.in/dcRkv5Pc
๐Repo https://lnkd.in/dw-3JJks
โค9๐ฅ6๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ทEMO: talking/singing Gen-AI ๐ท
๐EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
๐Review https://t.ly/4IYj5
๐Paper https://lnkd.in/dGPX2-Yc
๐Project https://lnkd.in/dyf6p_N3
๐Repo (empty) github.com/HumanAIGC/EMO
๐EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
๐Review https://t.ly/4IYj5
๐Paper https://lnkd.in/dGPX2-Yc
๐Project https://lnkd.in/dyf6p_N3
๐Repo (empty) github.com/HumanAIGC/EMO
โค18๐ฅ7๐4๐คฏ3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Multi-LoRA Composition ๐
๐Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐
๐Review https://t.ly/GFy3Z
๐Paper arxiv.org/pdf/2402.16843.pdf
๐Code github.com/maszhongming/Multi-LoRA-Composition
๐Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐
๐Review https://t.ly/GFy3Z
๐Paper arxiv.org/pdf/2402.16843.pdf
๐Code github.com/maszhongming/Multi-LoRA-Composition
๐11โค6๐ฅ2๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ MM-AU: Video Accident ๐ฅ
๐MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐
๐Review https://t.ly/a-jKI
๐Paper arxiv.org/pdf/2403.00436.pdf
๐Dataset http://www.lotvsmmau.net/MMAU/demo
๐MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐
๐Review https://t.ly/a-jKI
๐Paper arxiv.org/pdf/2403.00436.pdf
๐Dataset http://www.lotvsmmau.net/MMAU/demo
๐11โค2๐ฅ2๐คฏ2
๐ฅ SOTA: Stable Diffusion 3 is out! ๐ฅ
๐Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐
๐Review https://t.ly/a1koo
๐Paper https://lnkd.in/d4i-9Bte
๐Blog https://lnkd.in/d-bEX-ww
๐Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐
๐Review https://t.ly/a1koo
๐Paper https://lnkd.in/d4i-9Bte
๐Blog https://lnkd.in/d-bEX-ww
๐ฅ19โค5๐3โก1๐1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งตE-LoFTR: new Feats-Matching SOTA๐งต
๐A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
๐Review https://t.ly/7SPmC
๐Paper https://arxiv.org/pdf/2403.04765.pdf
๐Project https://zju3dv.github.io/efficientloftr/
๐Repo https://github.com/zju3dv/efficientloftr
๐A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
๐Review https://t.ly/7SPmC
๐Paper https://arxiv.org/pdf/2403.04765.pdf
๐Project https://zju3dv.github.io/efficientloftr/
๐Repo https://github.com/zju3dv/efficientloftr
๐ฅ13๐4๐คฏ2โค1
๐ฆStableDrag: Point-based Editing๐ฆ
๐#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
๐Review https://t.ly/eUI05
๐Paper https://lnkd.in/dz8-ymck
๐Project stabledrag.github.io/
๐#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
๐Review https://t.ly/eUI05
๐Paper https://lnkd.in/dz8-ymck
๐Project stabledrag.github.io/
โค2๐1๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธ PIXART-ฮฃ: 4K Generation ๐๏ธ
๐PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐
๐Review https://t.ly/Cm2Qh
๐Paper arxiv.org/pdf/2403.04692.pdf
๐Project pixart-alpha.github.io/PixArt-sigma-project/
๐Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐ค-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐
๐Review https://t.ly/Cm2Qh
๐Paper arxiv.org/pdf/2403.04692.pdf
๐Project pixart-alpha.github.io/PixArt-sigma-project/
๐Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐ค-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐ฅ7โก1โค1๐1๐คฏ1