AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡ Bootstrapping TAP πŸ‡

πŸ‘‰#Deepmind shows how large-scale, unlabeled, uncurated real-world data can improve TAP with minimal architectural changes, via a self-supervised student-teacher setup. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/-S_ZL
πŸ‘‰Paper arxiv.org/pdf/2402.00847.pdf
πŸ‘‰Code https://github.com/google-deepmind/tapnet
πŸ”₯5πŸ‘3πŸ₯°1🀩1
πŸ’₯Py4AI 2x Speakers, 2x TicketsπŸ’₯

βœ…Doubling the speakers (6 -> 12!)
βœ…A new track (2 tracks in parallel)
βœ…A new batch of 100 tickets!

πŸ‘‰ More: https://t.ly/WmVrM
❀7πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ΅ HASSOD Object Detection πŸͺ΅

πŸ‘‰ HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.

πŸ‘‰Review https://t.ly/66qHF
πŸ‘‰Paper arxiv.org/pdf/2402.03311.pdf
πŸ‘‰Project hassod-neurips23.github.io/
πŸ‘‰Repo github.com/Shengcao-Cao/HASSOD
πŸ”₯13❀5πŸ‘3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌡 G-Splatting Portraits 🌡

πŸ‘‰From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction

πŸ‘‰Review https://t.ly/fq71w
πŸ‘‰Paper https://arxiv.org/pdf/2402.03723.pdf
πŸ‘‰Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
πŸ”₯13❀3πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŒ† Up to 69x Faster SAM πŸŒ†

πŸ‘‰EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

πŸ‘‰Review https://t.ly/zGiE9
πŸ‘‰Paper arxiv.org/pdf/2402.05008.pdf
πŸ‘‰Code github.com/mit-han-lab/efficientvit
πŸ”₯19πŸ‘7❀4πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🌴 Direct-a-Video Generation 🌴

πŸ‘‰Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion

πŸ‘‰Review https://t.ly/dZSLs
πŸ‘‰Paper arxiv.org/pdf/2402.03162.pdf
πŸ‘‰Project https://direct-a-video.github.io/
πŸ”₯7πŸ‘3❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡ Graph Neural Network in TF πŸ‡

πŸ‘‰#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license πŸ’™

πŸ‘‰Review https://t.ly/TQfg-
πŸ‘‰Code github.com/tensorflow/gnn
πŸ‘‰Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
❀17πŸ‘4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ†” Magic-Me: ID-Specific Video πŸ†”

πŸ‘‰#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt

πŸ‘‰Review https://t.ly/qjJ2O
πŸ‘‰Paper arxiv.org/pdf/2402.09368.pdf
πŸ‘‰Project magic-me-webpage.github.io
πŸ‘‰Code github.com/Zhen-Dong/Magic-Me
❀6πŸ₯°1🀯1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Breaking: GEMINI 1.5 is out πŸ”₯

πŸ‘‰Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠

πŸ‘‰Review https://t.ly/Vblvx
πŸ‘‰More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
🀯17πŸ‘4😱2
This media is not supported in your browser
VIEW IN TELEGRAM
β˜€οΈ One2Avatar: Pic -> 3D Avatar β˜€οΈ

πŸ‘‰#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.

πŸ‘‰Review https://t.ly/AS1oc
πŸ‘‰Paper arxiv.org/pdf/2402.11909.pdf
πŸ‘‰Project zhixuany.github.io/one2avatar_webpage/
πŸ‘12❀3🀩3πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺŸ BOG: Fine Geometric Views πŸͺŸ

πŸ‘‰ #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).

πŸ‘‰Review https://t.ly/E6T0W
πŸ‘‰Paper https://lnkd.in/dQEq3zy6
πŸ‘‰Project https://lnkd.in/dYYCadx9
πŸ‘‰Demo https://lnkd.in/d92R6QME
πŸ”₯8🀯4πŸ‘3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦₯Neuromorphic Video BinarizationπŸ¦₯

πŸ‘‰ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!

πŸ‘‰Review https://t.ly/V-NFa
πŸ‘‰Paper arxiv.org/pdf/2402.12644.pdf
πŸ‘‰Project github.com/eleboss/EBR
❀15πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩻 Pose via Ray Diffusion 🩻

πŸ‘‰Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released πŸ’™

πŸ‘‰Review https://t.ly/qBsFK
πŸ‘‰Paper arxiv.org/pdf/2402.14817.pdf
πŸ‘‰Project jasonyzhang.com/RayDiffusion
πŸ‘‰Code github.com/jasonyzhang/RayDiffusion
πŸ”₯17❀6🀯3πŸ‘1πŸ‘1🍾1
πŸ—ƒοΈ MATH-Vision Dataset πŸ—ƒοΈ

πŸ‘‰MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released πŸ’™

πŸ‘‰Review https://t.ly/gmIAu
πŸ‘‰Paper arxiv.org/pdf/2402.14804.pdf
πŸ‘‰Project mathvision-cuhk.github.io/
πŸ‘‰Code github.com/mathvision-cuhk/MathVision
🀯8πŸ”₯4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«…FlowMDM: Human CompositionπŸ«…

πŸ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

πŸ‘‰Review https://t.ly/pr2g_
πŸ‘‰Paper https://lnkd.in/daYRftdF
πŸ‘‰Project https://lnkd.in/dcRkv5Pc
πŸ‘‰Repo https://lnkd.in/dw-3JJks
❀9πŸ”₯6πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷EMO: talking/singing Gen-AI 🎷

πŸ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

πŸ‘‰Review https://t.ly/4IYj5
πŸ‘‰Paper https://lnkd.in/dGPX2-Yc
πŸ‘‰Project https://lnkd.in/dyf6p_N3
πŸ‘‰Repo (empty) github.com/HumanAIGC/EMO
❀18πŸ”₯7πŸ‘4🀯3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’Œ Multi-LoRA Composition πŸ’Œ

πŸ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/GFy3Z
πŸ‘‰Paper arxiv.org/pdf/2402.16843.pdf
πŸ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
πŸ‘11❀6πŸ”₯2πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯ MM-AU: Video Accident πŸ’₯

πŸ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced πŸ’™

πŸ‘‰Review https://t.ly/a-jKI
πŸ‘‰Paper arxiv.org/pdf/2403.00436.pdf
πŸ‘‰Dataset http://www.lotvsmmau.net/MMAU/demo
πŸ‘11❀2πŸ”₯2🀯2
πŸ”₯ SOTA: Stable Diffusion 3 is out! πŸ”₯

πŸ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released πŸ’™

πŸ‘‰Review https://t.ly/a1koo
πŸ‘‰Paper https://lnkd.in/d4i-9Bte
πŸ‘‰Blog https://lnkd.in/d-bEX-ww
πŸ”₯19❀5πŸ‘3⚑1πŸ‘1😱1