AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
๐Ÿ—ƒ๏ธ MATH-Vision Dataset ๐Ÿ—ƒ๏ธ

๐Ÿ‘‰MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/gmIAu
๐Ÿ‘‰Paper arxiv.org/pdf/2402.14804.pdf
๐Ÿ‘‰Project mathvision-cuhk.github.io/
๐Ÿ‘‰Code github.com/mathvision-cuhk/MathVision
๐Ÿคฏ8๐Ÿ”ฅ4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ…FlowMDM: Human Composition๐Ÿซ…

๐Ÿ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

๐Ÿ‘‰Review https://t.ly/pr2g_
๐Ÿ‘‰Paper https://lnkd.in/daYRftdF
๐Ÿ‘‰Project https://lnkd.in/dcRkv5Pc
๐Ÿ‘‰Repo https://lnkd.in/dw-3JJks
โค9๐Ÿ”ฅ6๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽทEMO: talking/singing Gen-AI ๐ŸŽท

๐Ÿ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

๐Ÿ‘‰Review https://t.ly/4IYj5
๐Ÿ‘‰Paper https://lnkd.in/dGPX2-Yc
๐Ÿ‘‰Project https://lnkd.in/dyf6p_N3
๐Ÿ‘‰Repo (empty) github.com/HumanAIGC/EMO
โค18๐Ÿ”ฅ7๐Ÿ‘4๐Ÿคฏ3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Œ Multi-LoRA Composition ๐Ÿ’Œ

๐Ÿ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GFy3Z
๐Ÿ‘‰Paper arxiv.org/pdf/2402.16843.pdf
๐Ÿ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
๐Ÿ‘11โค6๐Ÿ”ฅ2๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฅ MM-AU: Video Accident ๐Ÿ’ฅ

๐Ÿ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a-jKI
๐Ÿ‘‰Paper arxiv.org/pdf/2403.00436.pdf
๐Ÿ‘‰Dataset http://www.lotvsmmau.net/MMAU/demo
๐Ÿ‘11โค2๐Ÿ”ฅ2๐Ÿคฏ2
๐Ÿ”ฅ SOTA: Stable Diffusion 3 is out! ๐Ÿ”ฅ

๐Ÿ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a1koo
๐Ÿ‘‰Paper https://lnkd.in/d4i-9Bte
๐Ÿ‘‰Blog https://lnkd.in/d-bEX-ww
๐Ÿ”ฅ19โค5๐Ÿ‘3โšก1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตE-LoFTR: new Feats-Matching SOTA๐Ÿงต

๐Ÿ‘‰A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร— faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

๐Ÿ‘‰Review https://t.ly/7SPmC
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.04765.pdf
๐Ÿ‘‰Project https://zju3dv.github.io/efficientloftr/
๐Ÿ‘‰Repo https://github.com/zju3dv/efficientloftr
๐Ÿ”ฅ13๐Ÿ‘4๐Ÿคฏ2โค1
๐ŸฆStableDrag: Point-based Editing๐Ÿฆ

๐Ÿ‘‰#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

๐Ÿ‘‰Review https://t.ly/eUI05
๐Ÿ‘‰Paper https://lnkd.in/dz8-ymck
๐Ÿ‘‰Project stabledrag.github.io/
โค2๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›๏ธ PIXART-ฮฃ: 4K Generation ๐Ÿ›๏ธ

๐Ÿ‘‰PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Cm2Qh
๐Ÿ‘‰Paper arxiv.org/pdf/2403.04692.pdf
๐Ÿ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
๐Ÿ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐Ÿค—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐Ÿ”ฅ7โšก1โค1๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘บ Can GPT-4 play DOOM? ๐Ÿ‘บ

๐Ÿ‘‰Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

๐Ÿ‘‰Review https://t.ly/W8-0F
๐Ÿ‘‰Paper https://lnkd.in/dmsB7bjA
๐Ÿ‘‰Project https://lnkd.in/ddDPwjQB
๐Ÿคฏ8๐Ÿ’ฉ7๐Ÿ”ฅ2๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช–RT Humanoid from Head-Mounted Sensors๐Ÿช–

๐Ÿ‘‰#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

๐Ÿ‘‰Review https://t.ly/Si2Mp
๐Ÿ‘‰Paper arxiv.org/pdf/2403.06862.pdf
๐Ÿ‘‰Project www.zhengyiluo.com/SimXR/
โค12โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿท๏ธ Face Foundation Model ๐Ÿท๏ธ

๐Ÿ‘‰Arc2Face, the first foundation model for human faces. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/MfAFI
๐Ÿ‘‰Paper https://lnkd.in/dViE_tCd
๐Ÿ‘‰Project https://lnkd.in/d4MHdEZK
๐Ÿ‘‰Code https://lnkd.in/dv9ZtDfA
โค12๐Ÿ‘3๐Ÿ‘1๐Ÿคฉ1
๐ŸชผFaceXFormer: Unified Face-Transformer๐Ÿชผ

๐Ÿ‘‰FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

๐Ÿ‘‰Review https://t.ly/MfAFI
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.12960.pdf
๐Ÿ‘‰Project kartik-3004.github.io/facexformer_web/
๐Ÿ‘‰Code github.com/Kartik-3004/facexformer
๐Ÿ‘11โค4๐Ÿฅฐ2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ• DINO-based Video Tracking ๐Ÿฆ•

๐Ÿ‘‰The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_GIMT
๐Ÿ‘‰Paper https://lnkd.in/dsGVDcar
๐Ÿ‘‰Project dino-tracker.github.io/
๐Ÿ‘‰Code https://github.com/AssafSinger94/dino-tracker
๐Ÿ”ฅ18โค3๐Ÿคฏ2๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ– T-Rex 2: a new SOTA is out! ๐Ÿฆ–

๐Ÿ‘‰A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/fYw8D
๐Ÿ‘‰Paper https://lnkd.in/dpmRh2zh
๐Ÿ‘‰Project https://lnkd.in/dnR_jPcR
๐Ÿ‘‰Code https://lnkd.in/dnZnGRUn
๐Ÿ‘‰Demo https://lnkd.in/drDUEDYh
๐Ÿ”ฅ23๐Ÿ‘3๐Ÿคฏ2โค1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’„TinyBeauty: 460 FPS Make-up๐Ÿ’„

๐Ÿ‘‰TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!

๐Ÿ‘‰Review https://t.ly/LG5ok
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.15033.pdf
๐Ÿ‘‰Project https://tinybeauty.github.io/TinyBeauty/
๐Ÿ‘7๐Ÿคฏ4๐Ÿ˜2โšก1๐Ÿ”ฅ1๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜” AiOS: All-in-One-Stage Humans โ˜”

๐Ÿ‘‰All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.

๐Ÿ‘‰Review https://t.ly/ekNd4
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.17934.pdf
๐Ÿ‘‰Project https://ttxskk.github.io/AiOS/
๐Ÿ‘‰Code/Demo (announced)
โค6๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ€ MAVOS Object Segmentation ๐Ÿ€

๐Ÿ‘‰MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SKaRG
๐Ÿ‘‰Paper https://lnkd.in/dQyifKa3
๐Ÿ‘‰Project github.com/Amshaker/MAVOS
๐Ÿ”ฅ10๐Ÿ‘2โค1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฆ ObjectDrop: automagical objects removal ๐Ÿ’ฆ

๐Ÿ‘‰#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!

๐Ÿ‘‰Review https://t.ly/ZJ6NN
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.18818.pdf
๐Ÿ‘‰Project https://objectdrop.github.io/
๐Ÿ‘14๐Ÿคฏ8โค4๐Ÿ”ฅ3๐Ÿพ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชผ Universal Mono Metric Depth ๐Ÿชผ

๐Ÿ‘‰ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/5C8eq
๐Ÿ‘‰Paper arxiv.org/pdf/2403.18913.pdf
๐Ÿ‘‰Code github.com/lpiccinelli-eth/unidepth
๐Ÿ”ฅ10๐Ÿ‘1๐Ÿคฃ1