AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺŸ BOG: Fine Geometric Views πŸͺŸ

πŸ‘‰ #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).

πŸ‘‰Review https://t.ly/E6T0W
πŸ‘‰Paper https://lnkd.in/dQEq3zy6
πŸ‘‰Project https://lnkd.in/dYYCadx9
πŸ‘‰Demo https://lnkd.in/d92R6QME
πŸ”₯8🀯4πŸ‘3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦₯Neuromorphic Video BinarizationπŸ¦₯

πŸ‘‰ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!

πŸ‘‰Review https://t.ly/V-NFa
πŸ‘‰Paper arxiv.org/pdf/2402.12644.pdf
πŸ‘‰Project github.com/eleboss/EBR
❀15πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩻 Pose via Ray Diffusion 🩻

πŸ‘‰Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released πŸ’™

πŸ‘‰Review https://t.ly/qBsFK
πŸ‘‰Paper arxiv.org/pdf/2402.14817.pdf
πŸ‘‰Project jasonyzhang.com/RayDiffusion
πŸ‘‰Code github.com/jasonyzhang/RayDiffusion
πŸ”₯17❀6🀯3πŸ‘1πŸ‘1🍾1
πŸ—ƒοΈ MATH-Vision Dataset πŸ—ƒοΈ

πŸ‘‰MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released πŸ’™

πŸ‘‰Review https://t.ly/gmIAu
πŸ‘‰Paper arxiv.org/pdf/2402.14804.pdf
πŸ‘‰Project mathvision-cuhk.github.io/
πŸ‘‰Code github.com/mathvision-cuhk/MathVision
🀯8πŸ”₯4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«…FlowMDM: Human CompositionπŸ«…

πŸ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

πŸ‘‰Review https://t.ly/pr2g_
πŸ‘‰Paper https://lnkd.in/daYRftdF
πŸ‘‰Project https://lnkd.in/dcRkv5Pc
πŸ‘‰Repo https://lnkd.in/dw-3JJks
❀9πŸ”₯6πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷EMO: talking/singing Gen-AI 🎷

πŸ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

πŸ‘‰Review https://t.ly/4IYj5
πŸ‘‰Paper https://lnkd.in/dGPX2-Yc
πŸ‘‰Project https://lnkd.in/dyf6p_N3
πŸ‘‰Repo (empty) github.com/HumanAIGC/EMO
❀18πŸ”₯7πŸ‘4🀯3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’Œ Multi-LoRA Composition πŸ’Œ

πŸ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/GFy3Z
πŸ‘‰Paper arxiv.org/pdf/2402.16843.pdf
πŸ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
πŸ‘11❀6πŸ”₯2πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯ MM-AU: Video Accident πŸ’₯

πŸ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced πŸ’™

πŸ‘‰Review https://t.ly/a-jKI
πŸ‘‰Paper arxiv.org/pdf/2403.00436.pdf
πŸ‘‰Dataset http://www.lotvsmmau.net/MMAU/demo
πŸ‘11❀2πŸ”₯2🀯2
πŸ”₯ SOTA: Stable Diffusion 3 is out! πŸ”₯

πŸ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released πŸ’™

πŸ‘‰Review https://t.ly/a1koo
πŸ‘‰Paper https://lnkd.in/d4i-9Bte
πŸ‘‰Blog https://lnkd.in/d-bEX-ww
πŸ”₯19❀5πŸ‘3⚑1πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🧡E-LoFTR: new Feats-Matching SOTA🧡

πŸ‘‰A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5Γ— faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

πŸ‘‰Review https://t.ly/7SPmC
πŸ‘‰Paper https://arxiv.org/pdf/2403.04765.pdf
πŸ‘‰Project https://zju3dv.github.io/efficientloftr/
πŸ‘‰Repo https://github.com/zju3dv/efficientloftr
πŸ”₯13πŸ‘4🀯2❀1
🦁StableDrag: Point-based Editing🦁

πŸ‘‰#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

πŸ‘‰Review https://t.ly/eUI05
πŸ‘‰Paper https://lnkd.in/dz8-ymck
πŸ‘‰Project stabledrag.github.io/
❀2πŸ‘1πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›οΈ PIXART-Ξ£: 4K Generation πŸ›οΈ

πŸ‘‰PixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced πŸ’™

πŸ‘‰Review https://t.ly/Cm2Qh
πŸ‘‰Paper arxiv.org/pdf/2403.04692.pdf
πŸ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
πŸ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
πŸ€—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
πŸ”₯7⚑1❀1πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘Ί Can GPT-4 play DOOM? πŸ‘Ί

πŸ‘‰Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

πŸ‘‰Review https://t.ly/W8-0F
πŸ‘‰Paper https://lnkd.in/dmsB7bjA
πŸ‘‰Project https://lnkd.in/ddDPwjQB
🀯8πŸ’©7πŸ”₯2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ–RT Humanoid from Head-Mounted SensorsπŸͺ–

πŸ‘‰#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

πŸ‘‰Review https://t.ly/Si2Mp
πŸ‘‰Paper arxiv.org/pdf/2403.06862.pdf
πŸ‘‰Project www.zhengyiluo.com/SimXR/
❀12⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🏷️ Face Foundation Model 🏷️

πŸ‘‰Arc2Face, the first foundation model for human faces. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://lnkd.in/dViE_tCd
πŸ‘‰Project https://lnkd.in/d4MHdEZK
πŸ‘‰Code https://lnkd.in/dv9ZtDfA
❀12πŸ‘3πŸ‘1🀩1
πŸͺΌFaceXFormer: Unified Face-TransformerπŸͺΌ

πŸ‘‰FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://arxiv.org/pdf/2403.12960.pdf
πŸ‘‰Project kartik-3004.github.io/facexformer_web/
πŸ‘‰Code github.com/Kartik-3004/facexformer
πŸ‘11❀4πŸ₯°2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦• DINO-based Video Tracking πŸ¦•

πŸ‘‰The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)πŸ’™

πŸ‘‰Review https://t.ly/_GIMT
πŸ‘‰Paper https://lnkd.in/dsGVDcar
πŸ‘‰Project dino-tracker.github.io/
πŸ‘‰Code https://github.com/AssafSinger94/dino-tracker
πŸ”₯18❀3🀯2πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦– T-Rex 2: a new SOTA is out! πŸ¦–

πŸ‘‰A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/fYw8D
πŸ‘‰Paper https://lnkd.in/dpmRh2zh
πŸ‘‰Project https://lnkd.in/dnR_jPcR
πŸ‘‰Code https://lnkd.in/dnZnGRUn
πŸ‘‰Demo https://lnkd.in/drDUEDYh
πŸ”₯23πŸ‘3🀯2❀1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’„TinyBeauty: 460 FPS Make-upπŸ’„

πŸ‘‰TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!

πŸ‘‰Review https://t.ly/LG5ok
πŸ‘‰Paper https://arxiv.org/pdf/2403.15033.pdf
πŸ‘‰Project https://tinybeauty.github.io/TinyBeauty/
πŸ‘7🀯4😍2⚑1πŸ”₯1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
β˜” AiOS: All-in-One-Stage Humans β˜”

πŸ‘‰All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.

πŸ‘‰Review https://t.ly/ekNd4
πŸ‘‰Paper https://arxiv.org/pdf/2403.17934.pdf
πŸ‘‰Project https://ttxskk.github.io/AiOS/
πŸ‘‰Code/Demo (announced)
❀6πŸ‘1πŸ‘1