AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‹ Diffutoon: new SOTA video πŸ‹

πŸ‘‰Diffutoon is a cartoon shading approach, aiming to transform photorealistic videos in anime styles. It can handle exceptionally high resolutions and rapid motions. Source code released!

πŸ‘‰Review https://t.ly/sim2O
πŸ‘‰Paper https://lnkd.in/dPcSnAUu
πŸ‘‰Code https://lnkd.in/d9B_dGrf
πŸ‘‰Project https://lnkd.in/dpcsJcX2
πŸ”₯19❀3🀯3πŸ‘1πŸ₯°1🀩1πŸ’©1🍾1
πŸ₯“ RANSAC -> PARSAC (neural) πŸ₯“

πŸ‘‰Neural PARSAC: estimating multiple vanishing points (V), fundamental matrices (F) or homographies (H) at the speed of light! Source Code released πŸ’™

πŸ‘‰Review https://t.ly/r9ngg
πŸ‘‰Paper https://lnkd.in/dadQ4Qec
πŸ‘‰Code https://lnkd.in/dYp6gADd
❀14πŸ‘3⚑1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
β†˜οΈ SEELE: "moving" the subjects ➑️

πŸ‘‰Subject repositioning: manipulating an input image to reposition one of its subjects to a desired location while preserving the image’s fidelity. SEELE is a single diffusion model to address this novel generative sub-tasks

πŸ‘‰Review https://t.ly/4FS4H
πŸ‘‰Paper arxiv.org/pdf/2401.16861.pdf
πŸ‘‰Project yikai-wang.github.io/seele/
πŸ‘20❀3🀯3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽ‰ ADΞ”ER: Event-Camera Suite πŸŽ‰

πŸ‘‰ADΞ”ER: a novel/unified framework for event-based video. Encoder / transcoder / decoder for ADΞ”ER (Address, Decimation, Ξ”t Event Representation) video streams. Source code (RUST) released πŸ’™

πŸ‘‰Review https://t.ly/w5_KC
πŸ‘‰Paper arxiv.org/pdf/2401.17151.pdf
πŸ‘‰Repo github.com/ac-freeman/adder-codec-rs
❀7πŸ‘3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🚦(add) Anything in Any Video🚦

πŸ‘‰ XPeng Motors announced Anything in Any Scene: novel #AI for realistic video simulation that seamlessly inserts any object into an existing dynamic video. Strong emphasis on realism, the objects in the BBs don't exist. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/UYhl0
πŸ‘‰Code https://lnkd.in/gyi7Dhkn
πŸ‘‰Paper https://lnkd.in/gXyAJ6GZ
πŸ‘‰Project https://lnkd.in/gVA5vduD
πŸ”₯12🀯6πŸ‘5πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🍬 ABS: SOTA collision-free 🍬

πŸ‘‰ABS (Agile But Safe): learning-based control framework for agile and collision-free locomotion for quadrupedal robot. Source Code announced (coming) πŸ’™

πŸ‘‰Review https://t.ly/AYu-Z
πŸ‘‰Paper arxiv.org/pdf/2401.17583.pdf
πŸ‘‰Project agile-but-safe.github.io/
πŸ‘‰Repo github.com/LeCAR-Lab/ABS
😍11πŸ‘3πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡ Bootstrapping TAP πŸ‡

πŸ‘‰#Deepmind shows how large-scale, unlabeled, uncurated real-world data can improve TAP with minimal architectural changes, via a self-supervised student-teacher setup. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/-S_ZL
πŸ‘‰Paper arxiv.org/pdf/2402.00847.pdf
πŸ‘‰Code https://github.com/google-deepmind/tapnet
πŸ”₯5πŸ‘3πŸ₯°1🀩1
πŸ’₯Py4AI 2x Speakers, 2x TicketsπŸ’₯

βœ…Doubling the speakers (6 -> 12!)
βœ…A new track (2 tracks in parallel)
βœ…A new batch of 100 tickets!

πŸ‘‰ More: https://t.ly/WmVrM
❀7πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ΅ HASSOD Object Detection πŸͺ΅

πŸ‘‰ HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.

πŸ‘‰Review https://t.ly/66qHF
πŸ‘‰Paper arxiv.org/pdf/2402.03311.pdf
πŸ‘‰Project hassod-neurips23.github.io/
πŸ‘‰Repo github.com/Shengcao-Cao/HASSOD
πŸ”₯13❀5πŸ‘3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌡 G-Splatting Portraits 🌡

πŸ‘‰From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction

πŸ‘‰Review https://t.ly/fq71w
πŸ‘‰Paper https://arxiv.org/pdf/2402.03723.pdf
πŸ‘‰Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
πŸ”₯13❀3πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŒ† Up to 69x Faster SAM πŸŒ†

πŸ‘‰EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

πŸ‘‰Review https://t.ly/zGiE9
πŸ‘‰Paper arxiv.org/pdf/2402.05008.pdf
πŸ‘‰Code github.com/mit-han-lab/efficientvit
πŸ”₯19πŸ‘7❀4πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🌴 Direct-a-Video Generation 🌴

πŸ‘‰Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion

πŸ‘‰Review https://t.ly/dZSLs
πŸ‘‰Paper arxiv.org/pdf/2402.03162.pdf
πŸ‘‰Project https://direct-a-video.github.io/
πŸ”₯7πŸ‘3❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡ Graph Neural Network in TF πŸ‡

πŸ‘‰#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license πŸ’™

πŸ‘‰Review https://t.ly/TQfg-
πŸ‘‰Code github.com/tensorflow/gnn
πŸ‘‰Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
❀17πŸ‘4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ†” Magic-Me: ID-Specific Video πŸ†”

πŸ‘‰#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt

πŸ‘‰Review https://t.ly/qjJ2O
πŸ‘‰Paper arxiv.org/pdf/2402.09368.pdf
πŸ‘‰Project magic-me-webpage.github.io
πŸ‘‰Code github.com/Zhen-Dong/Magic-Me
❀6πŸ₯°1🀯1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Breaking: GEMINI 1.5 is out πŸ”₯

πŸ‘‰Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠

πŸ‘‰Review https://t.ly/Vblvx
πŸ‘‰More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
🀯17πŸ‘4😱2
This media is not supported in your browser
VIEW IN TELEGRAM
β˜€οΈ One2Avatar: Pic -> 3D Avatar β˜€οΈ

πŸ‘‰#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.

πŸ‘‰Review https://t.ly/AS1oc
πŸ‘‰Paper arxiv.org/pdf/2402.11909.pdf
πŸ‘‰Project zhixuany.github.io/one2avatar_webpage/
πŸ‘12❀3🀩3πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺŸ BOG: Fine Geometric Views πŸͺŸ

πŸ‘‰ #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).

πŸ‘‰Review https://t.ly/E6T0W
πŸ‘‰Paper https://lnkd.in/dQEq3zy6
πŸ‘‰Project https://lnkd.in/dYYCadx9
πŸ‘‰Demo https://lnkd.in/d92R6QME
πŸ”₯8🀯4πŸ‘3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦₯Neuromorphic Video BinarizationπŸ¦₯

πŸ‘‰ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!

πŸ‘‰Review https://t.ly/V-NFa
πŸ‘‰Paper arxiv.org/pdf/2402.12644.pdf
πŸ‘‰Project github.com/eleboss/EBR
❀15πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩻 Pose via Ray Diffusion 🩻

πŸ‘‰Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released πŸ’™

πŸ‘‰Review https://t.ly/qBsFK
πŸ‘‰Paper arxiv.org/pdf/2402.14817.pdf
πŸ‘‰Project jasonyzhang.com/RayDiffusion
πŸ‘‰Code github.com/jasonyzhang/RayDiffusion
πŸ”₯17❀6🀯3πŸ‘1πŸ‘1🍾1