This media is not supported in your browser
VIEW IN TELEGRAM
π Bootstrapping TAP π
π#Deepmind shows how large-scale, unlabeled, uncurated real-world data can improve TAP with minimal architectural changes, via a self-supervised student-teacher setup. Source Code released π
πReview https://t.ly/-S_ZL
πPaper arxiv.org/pdf/2402.00847.pdf
πCode https://github.com/google-deepmind/tapnet
π#Deepmind shows how large-scale, unlabeled, uncurated real-world data can improve TAP with minimal architectural changes, via a self-supervised student-teacher setup. Source Code released π
πReview https://t.ly/-S_ZL
πPaper arxiv.org/pdf/2402.00847.pdf
πCode https://github.com/google-deepmind/tapnet
π₯5π3π₯°1π€©1
π₯Py4AI 2x Speakers, 2x Ticketsπ₯
β Doubling the speakers (6 -> 12!)
β A new track (2 tracks in parallel)
β A new batch of 100 tickets!
π More: https://t.ly/WmVrM
β Doubling the speakers (6 -> 12!)
β A new track (2 tracks in parallel)
β A new batch of 100 tickets!
π More: https://t.ly/WmVrM
β€7π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ΅ HASSOD Object Detection πͺ΅
π HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.
πReview https://t.ly/66qHF
πPaper arxiv.org/pdf/2402.03311.pdf
πProject hassod-neurips23.github.io/
πRepo github.com/Shengcao-Cao/HASSOD
π HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.
πReview https://t.ly/66qHF
πPaper arxiv.org/pdf/2402.03311.pdf
πProject hassod-neurips23.github.io/
πRepo github.com/Shengcao-Cao/HASSOD
π₯13β€5π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π΅ G-Splatting Portraits π΅
πFrom monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction
πReview https://t.ly/fq71w
πPaper https://arxiv.org/pdf/2402.03723.pdf
πProject shahrukhathar.github.io/2024/02/05/Rig3DGS.html
πFrom monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction
πReview https://t.ly/fq71w
πPaper https://arxiv.org/pdf/2402.03723.pdf
πProject shahrukhathar.github.io/2024/02/05/Rig3DGS.html
π₯13β€3π1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π Up to 69x Faster SAM π
πEfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMβs lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
πReview https://t.ly/zGiE9
πPaper arxiv.org/pdf/2402.05008.pdf
πCode github.com/mit-han-lab/efficientvit
πEfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMβs lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
πReview https://t.ly/zGiE9
πPaper arxiv.org/pdf/2402.05008.pdf
πCode github.com/mit-han-lab/efficientvit
π₯19π7β€4π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π΄ Direct-a-Video Generation π΄
πDirect-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
πReview https://t.ly/dZSLs
πPaper arxiv.org/pdf/2402.03162.pdf
πProject https://direct-a-video.github.io/
πDirect-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
πReview https://t.ly/dZSLs
πPaper arxiv.org/pdf/2402.03162.pdf
πProject https://direct-a-video.github.io/
π₯7π3β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π Graph Neural Network in TF π
π#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license π
πReview https://t.ly/TQfg-
πCode github.com/tensorflow/gnn
πBlog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
π#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license π
πReview https://t.ly/TQfg-
πCode github.com/tensorflow/gnn
πBlog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
β€17π4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Magic-Me: ID-Specific Video π
π#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
πReview https://t.ly/qjJ2O
πPaper arxiv.org/pdf/2402.09368.pdf
πProject magic-me-webpage.github.io
πCode github.com/Zhen-Dong/Magic-Me
π#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
πReview https://t.ly/qjJ2O
πPaper arxiv.org/pdf/2402.09368.pdf
πProject magic-me-webpage.github.io
πCode github.com/Zhen-Dong/Magic-Me
β€6π₯°1π€―1π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Breaking: GEMINI 1.5 is out π₯
πGemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview π«
πReview https://t.ly/Vblvx
πMore: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
πGemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview π«
πReview https://t.ly/Vblvx
πMore: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
π€―17π4π±2
AI with Papers - Artificial Intelligence & Deep Learning
π Seeing Through Occlusions π πNovel NSF to see through occlusions, reflection suppression & shadow removal. πReview https://t.ly/5jcIG πProject https://light.princeton.edu/publication/nsf πPaper https://arxiv.org/pdf/2312.14235.pdf πRepo https://giβ¦
π₯ Seeing Through Occlusions: code is out π₯
πRepo: https://github.com/princeton-computational-imaging/NSF
πRepo: https://github.com/princeton-computational-imaging/NSF
β€4π₯3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ One2Avatar: Pic -> 3D Avatar βοΈ
π#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
πReview https://t.ly/AS1oc
πPaper arxiv.org/pdf/2402.11909.pdf
πProject zhixuany.github.io/one2avatar_webpage/
π#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
πReview https://t.ly/AS1oc
πPaper arxiv.org/pdf/2402.11909.pdf
πProject zhixuany.github.io/one2avatar_webpage/
π12β€3π€©3π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ BOG: Fine Geometric Views πͺ
π #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
πReview https://t.ly/E6T0W
πPaper https://lnkd.in/dQEq3zy6
πProject https://lnkd.in/dYYCadx9
πDemo https://lnkd.in/d92R6QME
π #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
πReview https://t.ly/E6T0W
πPaper https://lnkd.in/dQEq3zy6
πProject https://lnkd.in/dYYCadx9
πDemo https://lnkd.in/d92R6QME
π₯8π€―4π3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦₯Neuromorphic Video Binarizationπ¦₯
π University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
πReview https://t.ly/V-NFa
πPaper arxiv.org/pdf/2402.12644.pdf
πProject github.com/eleboss/EBR
π University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
πReview https://t.ly/V-NFa
πPaper arxiv.org/pdf/2402.12644.pdf
πProject github.com/eleboss/EBR
β€15π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©» Pose via Ray Diffusion π©»
πNovel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released π
πReview https://t.ly/qBsFK
πPaper arxiv.org/pdf/2402.14817.pdf
πProject jasonyzhang.com/RayDiffusion
πCode github.com/jasonyzhang/RayDiffusion
πNovel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released π
πReview https://t.ly/qBsFK
πPaper arxiv.org/pdf/2402.14817.pdf
πProject jasonyzhang.com/RayDiffusion
πCode github.com/jasonyzhang/RayDiffusion
π₯17β€6π€―3π1π1πΎ1
ποΈ MATH-Vision Dataset ποΈ
πMATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released π
πReview https://t.ly/gmIAu
πPaper arxiv.org/pdf/2402.14804.pdf
πProject mathvision-cuhk.github.io/
πCode github.com/mathvision-cuhk/MathVision
πMATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released π
πReview https://t.ly/gmIAu
πPaper arxiv.org/pdf/2402.14804.pdf
πProject mathvision-cuhk.github.io/
πCode github.com/mathvision-cuhk/MathVision
π€―8π₯4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π«
FlowMDM: Human Compositionπ«
πFlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
πReview https://t.ly/pr2g_
πPaper https://lnkd.in/daYRftdF
πProject https://lnkd.in/dcRkv5Pc
πRepo https://lnkd.in/dw-3JJks
πFlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
πReview https://t.ly/pr2g_
πPaper https://lnkd.in/daYRftdF
πProject https://lnkd.in/dcRkv5Pc
πRepo https://lnkd.in/dw-3JJks
β€9π₯6π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π·EMO: talking/singing Gen-AI π·
πEMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
πReview https://t.ly/4IYj5
πPaper https://lnkd.in/dGPX2-Yc
πProject https://lnkd.in/dyf6p_N3
πRepo (empty) github.com/HumanAIGC/EMO
πEMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
πReview https://t.ly/4IYj5
πPaper https://lnkd.in/dGPX2-Yc
πProject https://lnkd.in/dyf6p_N3
πRepo (empty) github.com/HumanAIGC/EMO
β€18π₯7π4π€―3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π Multi-LoRA Composition π
πTwo novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released π
πReview https://t.ly/GFy3Z
πPaper arxiv.org/pdf/2402.16843.pdf
πCode github.com/maszhongming/Multi-LoRA-Composition
πTwo novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released π
πReview https://t.ly/GFy3Z
πPaper arxiv.org/pdf/2402.16843.pdf
πCode github.com/maszhongming/Multi-LoRA-Composition
π11β€6π₯2π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ MM-AU: Video Accident π₯
πMM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced π
πReview https://t.ly/a-jKI
πPaper arxiv.org/pdf/2403.00436.pdf
πDataset http://www.lotvsmmau.net/MMAU/demo
πMM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced π
πReview https://t.ly/a-jKI
πPaper arxiv.org/pdf/2403.00436.pdf
πDataset http://www.lotvsmmau.net/MMAU/demo
π11β€2π₯2π€―2
π₯ SOTA: Stable Diffusion 3 is out! π₯
πStable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released π
πReview https://t.ly/a1koo
πPaper https://lnkd.in/d4i-9Bte
πBlog https://lnkd.in/d-bEX-ww
πStable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released π
πReview https://t.ly/a1koo
πPaper https://lnkd.in/d4i-9Bte
πBlog https://lnkd.in/d-bEX-ww
π₯19β€5π3β‘1π1π±1