AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒต G-Splatting Portraits ๐ŸŒต

๐Ÿ‘‰From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction

๐Ÿ‘‰Review https://t.ly/fq71w
๐Ÿ‘‰Paper https://arxiv.org/pdf/2402.03723.pdf
๐Ÿ‘‰Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
๐Ÿ”ฅ13โค3๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ† Up to 69x Faster SAM ๐ŸŒ†

๐Ÿ‘‰EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโ€™s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

๐Ÿ‘‰Review https://t.ly/zGiE9
๐Ÿ‘‰Paper arxiv.org/pdf/2402.05008.pdf
๐Ÿ‘‰Code github.com/mit-han-lab/efficientvit
๐Ÿ”ฅ19๐Ÿ‘7โค4๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒด Direct-a-Video Generation ๐ŸŒด

๐Ÿ‘‰Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion

๐Ÿ‘‰Review https://t.ly/dZSLs
๐Ÿ‘‰Paper arxiv.org/pdf/2402.03162.pdf
๐Ÿ‘‰Project https://direct-a-video.github.io/
๐Ÿ”ฅ7๐Ÿ‘3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‡ Graph Neural Network in TF ๐Ÿ‡

๐Ÿ‘‰#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/TQfg-
๐Ÿ‘‰Code github.com/tensorflow/gnn
๐Ÿ‘‰Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
โค17๐Ÿ‘4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ†” Magic-Me: ID-Specific Video ๐Ÿ†”

๐Ÿ‘‰#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt

๐Ÿ‘‰Review https://t.ly/qjJ2O
๐Ÿ‘‰Paper arxiv.org/pdf/2402.09368.pdf
๐Ÿ‘‰Project magic-me-webpage.github.io
๐Ÿ‘‰Code github.com/Zhen-Dong/Magic-Me
โค6๐Ÿฅฐ1๐Ÿคฏ1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Breaking: GEMINI 1.5 is out ๐Ÿ”ฅ

๐Ÿ‘‰Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview ๐Ÿซ 

๐Ÿ‘‰Review https://t.ly/Vblvx
๐Ÿ‘‰More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
๐Ÿคฏ17๐Ÿ‘4๐Ÿ˜ฑ2
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜€๏ธ One2Avatar: Pic -> 3D Avatar โ˜€๏ธ

๐Ÿ‘‰#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.

๐Ÿ‘‰Review https://t.ly/AS1oc
๐Ÿ‘‰Paper arxiv.org/pdf/2402.11909.pdf
๐Ÿ‘‰Project zhixuany.github.io/one2avatar_webpage/
๐Ÿ‘12โค3๐Ÿคฉ3๐Ÿ”ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸชŸ BOG: Fine Geometric Views ๐ŸชŸ

๐Ÿ‘‰ #Google (+Tรผbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).

๐Ÿ‘‰Review https://t.ly/E6T0W
๐Ÿ‘‰Paper https://lnkd.in/dQEq3zy6
๐Ÿ‘‰Project https://lnkd.in/dYYCadx9
๐Ÿ‘‰Demo https://lnkd.in/d92R6QME
๐Ÿ”ฅ8๐Ÿคฏ4๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆฅNeuromorphic Video Binarization๐Ÿฆฅ

๐Ÿ‘‰ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!

๐Ÿ‘‰Review https://t.ly/V-NFa
๐Ÿ‘‰Paper arxiv.org/pdf/2402.12644.pdf
๐Ÿ‘‰Project github.com/eleboss/EBR
โค15๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฉป Pose via Ray Diffusion ๐Ÿฉป

๐Ÿ‘‰Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/qBsFK
๐Ÿ‘‰Paper arxiv.org/pdf/2402.14817.pdf
๐Ÿ‘‰Project jasonyzhang.com/RayDiffusion
๐Ÿ‘‰Code github.com/jasonyzhang/RayDiffusion
๐Ÿ”ฅ17โค6๐Ÿคฏ3๐Ÿ‘1๐Ÿ‘1๐Ÿพ1
๐Ÿ—ƒ๏ธ MATH-Vision Dataset ๐Ÿ—ƒ๏ธ

๐Ÿ‘‰MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/gmIAu
๐Ÿ‘‰Paper arxiv.org/pdf/2402.14804.pdf
๐Ÿ‘‰Project mathvision-cuhk.github.io/
๐Ÿ‘‰Code github.com/mathvision-cuhk/MathVision
๐Ÿคฏ8๐Ÿ”ฅ4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ…FlowMDM: Human Composition๐Ÿซ…

๐Ÿ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

๐Ÿ‘‰Review https://t.ly/pr2g_
๐Ÿ‘‰Paper https://lnkd.in/daYRftdF
๐Ÿ‘‰Project https://lnkd.in/dcRkv5Pc
๐Ÿ‘‰Repo https://lnkd.in/dw-3JJks
โค9๐Ÿ”ฅ6๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽทEMO: talking/singing Gen-AI ๐ŸŽท

๐Ÿ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

๐Ÿ‘‰Review https://t.ly/4IYj5
๐Ÿ‘‰Paper https://lnkd.in/dGPX2-Yc
๐Ÿ‘‰Project https://lnkd.in/dyf6p_N3
๐Ÿ‘‰Repo (empty) github.com/HumanAIGC/EMO
โค18๐Ÿ”ฅ7๐Ÿ‘4๐Ÿคฏ3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Œ Multi-LoRA Composition ๐Ÿ’Œ

๐Ÿ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GFy3Z
๐Ÿ‘‰Paper arxiv.org/pdf/2402.16843.pdf
๐Ÿ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
๐Ÿ‘11โค6๐Ÿ”ฅ2๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฅ MM-AU: Video Accident ๐Ÿ’ฅ

๐Ÿ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a-jKI
๐Ÿ‘‰Paper arxiv.org/pdf/2403.00436.pdf
๐Ÿ‘‰Dataset http://www.lotvsmmau.net/MMAU/demo
๐Ÿ‘11โค2๐Ÿ”ฅ2๐Ÿคฏ2
๐Ÿ”ฅ SOTA: Stable Diffusion 3 is out! ๐Ÿ”ฅ

๐Ÿ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a1koo
๐Ÿ‘‰Paper https://lnkd.in/d4i-9Bte
๐Ÿ‘‰Blog https://lnkd.in/d-bEX-ww
๐Ÿ”ฅ19โค5๐Ÿ‘3โšก1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตE-LoFTR: new Feats-Matching SOTA๐Ÿงต

๐Ÿ‘‰A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร— faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

๐Ÿ‘‰Review https://t.ly/7SPmC
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.04765.pdf
๐Ÿ‘‰Project https://zju3dv.github.io/efficientloftr/
๐Ÿ‘‰Repo https://github.com/zju3dv/efficientloftr
๐Ÿ”ฅ13๐Ÿ‘4๐Ÿคฏ2โค1
๐ŸฆStableDrag: Point-based Editing๐Ÿฆ

๐Ÿ‘‰#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

๐Ÿ‘‰Review https://t.ly/eUI05
๐Ÿ‘‰Paper https://lnkd.in/dz8-ymck
๐Ÿ‘‰Project stabledrag.github.io/
โค2๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›๏ธ PIXART-ฮฃ: 4K Generation ๐Ÿ›๏ธ

๐Ÿ‘‰PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Cm2Qh
๐Ÿ‘‰Paper arxiv.org/pdf/2403.04692.pdf
๐Ÿ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
๐Ÿ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐Ÿค—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐Ÿ”ฅ7โšก1โค1๐Ÿ‘1๐Ÿคฏ1