This media is not supported in your browser
VIEW IN TELEGRAM
↘️ SEELE: "moving" the subjects ➡️
👉Subject repositioning: manipulating an input image to reposition one of its subjects to a desired location while preserving the image’s fidelity. SEELE is a single diffusion model to address this novel generative sub-tasks
👉Review https://t.ly/4FS4H
👉Paper arxiv.org/pdf/2401.16861.pdf
👉Project yikai-wang.github.io/seele/
👉Subject repositioning: manipulating an input image to reposition one of its subjects to a desired location while preserving the image’s fidelity. SEELE is a single diffusion model to address this novel generative sub-tasks
👉Review https://t.ly/4FS4H
👉Paper arxiv.org/pdf/2401.16861.pdf
👉Project yikai-wang.github.io/seele/
👍20❤3🤯3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎉 ADΔER: Event-Camera Suite 🎉
👉ADΔER: a novel/unified framework for event-based video. Encoder / transcoder / decoder for ADΔER (Address, Decimation, Δt Event Representation) video streams. Source code (RUST) released 💙
👉Review https://t.ly/w5_KC
👉Paper arxiv.org/pdf/2401.17151.pdf
👉Repo github.com/ac-freeman/adder-codec-rs
👉ADΔER: a novel/unified framework for event-based video. Encoder / transcoder / decoder for ADΔER (Address, Decimation, Δt Event Representation) video streams. Source code (RUST) released 💙
👉Review https://t.ly/w5_KC
👉Paper arxiv.org/pdf/2401.17151.pdf
👉Repo github.com/ac-freeman/adder-codec-rs
❤7👍3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🚦(add) Anything in Any Video🚦
👉 XPeng Motors announced Anything in Any Scene: novel #AI for realistic video simulation that seamlessly inserts any object into an existing dynamic video. Strong emphasis on realism, the objects in the BBs don't exist. Source Code released 💙
👉Review https://t.ly/UYhl0
👉Code https://lnkd.in/gyi7Dhkn
👉Paper https://lnkd.in/gXyAJ6GZ
👉Project https://lnkd.in/gVA5vduD
👉 XPeng Motors announced Anything in Any Scene: novel #AI for realistic video simulation that seamlessly inserts any object into an existing dynamic video. Strong emphasis on realism, the objects in the BBs don't exist. Source Code released 💙
👉Review https://t.ly/UYhl0
👉Code https://lnkd.in/gyi7Dhkn
👉Paper https://lnkd.in/gXyAJ6GZ
👉Project https://lnkd.in/gVA5vduD
🔥12🤯6👍5🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🍬 ABS: SOTA collision-free 🍬
👉ABS (Agile But Safe): learning-based control framework for agile and collision-free locomotion for quadrupedal robot. Source Code announced (coming) 💙
👉Review https://t.ly/AYu-Z
👉Paper arxiv.org/pdf/2401.17583.pdf
👉Project agile-but-safe.github.io/
👉Repo github.com/LeCAR-Lab/ABS
👉ABS (Agile But Safe): learning-based control framework for agile and collision-free locomotion for quadrupedal robot. Source Code announced (coming) 💙
👉Review https://t.ly/AYu-Z
👉Paper arxiv.org/pdf/2401.17583.pdf
👉Project agile-but-safe.github.io/
👉Repo github.com/LeCAR-Lab/ABS
😍11👏3👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🏇 Bootstrapping TAP 🏇
👉#Deepmind shows how large-scale, unlabeled, uncurated real-world data can improve TAP with minimal architectural changes, via a self-supervised student-teacher setup. Source Code released 💙
👉Review https://t.ly/-S_ZL
👉Paper arxiv.org/pdf/2402.00847.pdf
👉Code https://github.com/google-deepmind/tapnet
👉#Deepmind shows how large-scale, unlabeled, uncurated real-world data can improve TAP with minimal architectural changes, via a self-supervised student-teacher setup. Source Code released 💙
👉Review https://t.ly/-S_ZL
👉Paper arxiv.org/pdf/2402.00847.pdf
👉Code https://github.com/google-deepmind/tapnet
🔥5👍3🥰1🤩1
💥Py4AI 2x Speakers, 2x Tickets💥
✅Doubling the speakers (6 -> 12!)
✅A new track (2 tracks in parallel)
✅A new batch of 100 tickets!
👉 More: https://t.ly/WmVrM
✅Doubling the speakers (6 -> 12!)
✅A new track (2 tracks in parallel)
✅A new batch of 100 tickets!
👉 More: https://t.ly/WmVrM
❤7👍2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🪵 HASSOD Object Detection 🪵
👉 HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.
👉Review https://t.ly/66qHF
👉Paper arxiv.org/pdf/2402.03311.pdf
👉Project hassod-neurips23.github.io/
👉Repo github.com/Shengcao-Cao/HASSOD
👉 HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.
👉Review https://t.ly/66qHF
👉Paper arxiv.org/pdf/2402.03311.pdf
👉Project hassod-neurips23.github.io/
👉Repo github.com/Shengcao-Cao/HASSOD
🔥13❤5👍3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌵 G-Splatting Portraits 🌵
👉From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction
👉Review https://t.ly/fq71w
👉Paper https://arxiv.org/pdf/2402.03723.pdf
👉Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
👉From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction
👉Review https://t.ly/fq71w
👉Paper https://arxiv.org/pdf/2402.03723.pdf
👉Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
🔥13❤3👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🌆 Up to 69x Faster SAM 🌆
👉EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
👉Review https://t.ly/zGiE9
👉Paper arxiv.org/pdf/2402.05008.pdf
👉Code github.com/mit-han-lab/efficientvit
👉EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia
👉Review https://t.ly/zGiE9
👉Paper arxiv.org/pdf/2402.05008.pdf
👉Code github.com/mit-han-lab/efficientvit
🔥19👍7❤4🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🌴 Direct-a-Video Generation 🌴
👉Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
👉Review https://t.ly/dZSLs
👉Paper arxiv.org/pdf/2402.03162.pdf
👉Project https://direct-a-video.github.io/
👉Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion
👉Review https://t.ly/dZSLs
👉Paper arxiv.org/pdf/2402.03162.pdf
👉Project https://direct-a-video.github.io/
🔥7👏3❤1
This media is not supported in your browser
VIEW IN TELEGRAM
🍇 Graph Neural Network in TF 🍇
👉#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license 💙
👉Review https://t.ly/TQfg-
👉Code github.com/tensorflow/gnn
👉Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
👉#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license 💙
👉Review https://t.ly/TQfg-
👉Code github.com/tensorflow/gnn
👉Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
❤17👍4👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🆔 Magic-Me: ID-Specific Video 🆔
👉#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
👉Review https://t.ly/qjJ2O
👉Paper arxiv.org/pdf/2402.09368.pdf
👉Project magic-me-webpage.github.io
👉Code github.com/Zhen-Dong/Magic-Me
👉#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
👉Review https://t.ly/qjJ2O
👉Paper arxiv.org/pdf/2402.09368.pdf
👉Project magic-me-webpage.github.io
👉Code github.com/Zhen-Dong/Magic-Me
❤6🥰1🤯1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Breaking: GEMINI 1.5 is out 🔥
👉Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠
👉Review https://t.ly/Vblvx
👉More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
👉Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠
👉Review https://t.ly/Vblvx
👉More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
🤯17👍4😱2
AI with Papers - Artificial Intelligence & Deep Learning
🈚 Seeing Through Occlusions 🈚 👉Novel NSF to see through occlusions, reflection suppression & shadow removal. 👉Review https://t.ly/5jcIG 👉Project https://light.princeton.edu/publication/nsf 👉Paper https://arxiv.org/pdf/2312.14235.pdf 👉Repo https://gi…
🔥 Seeing Through Occlusions: code is out 🔥
👉Repo: https://github.com/princeton-computational-imaging/NSF
👉Repo: https://github.com/princeton-computational-imaging/NSF
❤4🔥3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
☀️ One2Avatar: Pic -> 3D Avatar ☀️
👉#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
👉Review https://t.ly/AS1oc
👉Paper arxiv.org/pdf/2402.11909.pdf
👉Project zhixuany.github.io/one2avatar_webpage/
👉#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
👉Review https://t.ly/AS1oc
👉Paper arxiv.org/pdf/2402.11909.pdf
👉Project zhixuany.github.io/one2avatar_webpage/
👏12❤3🤩3🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🪟 BOG: Fine Geometric Views 🪟
👉 #Google (+Tübingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
👉Review https://t.ly/E6T0W
👉Paper https://lnkd.in/dQEq3zy6
👉Project https://lnkd.in/dYYCadx9
👉Demo https://lnkd.in/d92R6QME
👉 #Google (+Tübingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
👉Review https://t.ly/E6T0W
👉Paper https://lnkd.in/dQEq3zy6
👉Project https://lnkd.in/dYYCadx9
👉Demo https://lnkd.in/d92R6QME
🔥8🤯4👏3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🦥Neuromorphic Video Binarization🦥
👉 University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
👉Review https://t.ly/V-NFa
👉Paper arxiv.org/pdf/2402.12644.pdf
👉Project github.com/eleboss/EBR
👉 University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
👉Review https://t.ly/V-NFa
👉Paper arxiv.org/pdf/2402.12644.pdf
👉Project github.com/eleboss/EBR
❤15👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🩻 Pose via Ray Diffusion 🩻
👉Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released 💙
👉Review https://t.ly/qBsFK
👉Paper arxiv.org/pdf/2402.14817.pdf
👉Project jasonyzhang.com/RayDiffusion
👉Code github.com/jasonyzhang/RayDiffusion
👉Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released 💙
👉Review https://t.ly/qBsFK
👉Paper arxiv.org/pdf/2402.14817.pdf
👉Project jasonyzhang.com/RayDiffusion
👉Code github.com/jasonyzhang/RayDiffusion
🔥17❤6🤯3👍1👏1🍾1
🗃️ MATH-Vision Dataset 🗃️
👉MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released 💙
👉Review https://t.ly/gmIAu
👉Paper arxiv.org/pdf/2402.14804.pdf
👉Project mathvision-cuhk.github.io/
👉Code github.com/mathvision-cuhk/MathVision
👉MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released 💙
👉Review https://t.ly/gmIAu
👉Paper arxiv.org/pdf/2402.14804.pdf
👉Project mathvision-cuhk.github.io/
👉Code github.com/mathvision-cuhk/MathVision
🤯8🔥4👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🫅FlowMDM: Human Composition🫅
👉FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
👉Review https://t.ly/pr2g_
👉Paper https://lnkd.in/daYRftdF
👉Project https://lnkd.in/dcRkv5Pc
👉Repo https://lnkd.in/dw-3JJks
👉FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
👉Review https://t.ly/pr2g_
👉Paper https://lnkd.in/daYRftdF
👉Project https://lnkd.in/dcRkv5Pc
👉Repo https://lnkd.in/dw-3JJks
❤9🔥6👍1👏1