AI with Papers - Artificial Intelligence & Deep Learning
14.8K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›οΈ PIXART-Ξ£: 4K Generation πŸ›οΈ

πŸ‘‰PixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced πŸ’™

πŸ‘‰Review https://t.ly/Cm2Qh
πŸ‘‰Paper arxiv.org/pdf/2403.04692.pdf
πŸ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
πŸ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
πŸ€—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘Ί Can GPT-4 play DOOM? πŸ‘Ί

πŸ‘‰Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

πŸ‘‰Review https://t.ly/W8-0F
πŸ‘‰Paper https://lnkd.in/dmsB7bjA
πŸ‘‰Project https://lnkd.in/ddDPwjQB
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ–RT Humanoid from Head-Mounted SensorsπŸͺ–

πŸ‘‰#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

πŸ‘‰Review https://t.ly/Si2Mp
πŸ‘‰Paper arxiv.org/pdf/2403.06862.pdf
πŸ‘‰Project www.zhengyiluo.com/SimXR/
This media is not supported in your browser
VIEW IN TELEGRAM
🏷️ Face Foundation Model 🏷️

πŸ‘‰Arc2Face, the first foundation model for human faces. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://lnkd.in/dViE_tCd
πŸ‘‰Project https://lnkd.in/d4MHdEZK
πŸ‘‰Code https://lnkd.in/dv9ZtDfA
πŸͺΌFaceXFormer: Unified Face-TransformerπŸͺΌ

πŸ‘‰FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://arxiv.org/pdf/2403.12960.pdf
πŸ‘‰Project kartik-3004.github.io/facexformer_web/
πŸ‘‰Code github.com/Kartik-3004/facexformer
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦• DINO-based Video Tracking πŸ¦•

πŸ‘‰The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)πŸ’™

πŸ‘‰Review https://t.ly/_GIMT
πŸ‘‰Paper https://lnkd.in/dsGVDcar
πŸ‘‰Project dino-tracker.github.io/
πŸ‘‰Code https://github.com/AssafSinger94/dino-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦– T-Rex 2: a new SOTA is out! πŸ¦–

πŸ‘‰A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/fYw8D
πŸ‘‰Paper https://lnkd.in/dpmRh2zh
πŸ‘‰Project https://lnkd.in/dnR_jPcR
πŸ‘‰Code https://lnkd.in/dnZnGRUn
πŸ‘‰Demo https://lnkd.in/drDUEDYh
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’„TinyBeauty: 460 FPS Make-upπŸ’„

πŸ‘‰TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!

πŸ‘‰Review https://t.ly/LG5ok
πŸ‘‰Paper https://arxiv.org/pdf/2403.15033.pdf
πŸ‘‰Project https://tinybeauty.github.io/TinyBeauty/
This media is not supported in your browser
VIEW IN TELEGRAM
β˜” AiOS: All-in-One-Stage Humans β˜”

πŸ‘‰All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.

πŸ‘‰Review https://t.ly/ekNd4
πŸ‘‰Paper https://arxiv.org/pdf/2403.17934.pdf
πŸ‘‰Project https://ttxskk.github.io/AiOS/
πŸ‘‰Code/Demo (announced)
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€ MAVOS Object Segmentation πŸ€

πŸ‘‰MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)πŸ’™

πŸ‘‰Review https://t.ly/SKaRG
πŸ‘‰Paper https://lnkd.in/dQyifKa3
πŸ‘‰Project github.com/Amshaker/MAVOS
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’¦ ObjectDrop: automagical objects removal πŸ’¦

πŸ‘‰#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!

πŸ‘‰Review https://t.ly/ZJ6NN
πŸ‘‰Paper https://arxiv.org/pdf/2403.18818.pdf
πŸ‘‰Project https://objectdrop.github.io/
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺΌ Universal Mono Metric Depth πŸͺΌ

πŸ‘‰ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code releasedπŸ’™

πŸ‘‰Review https://t.ly/5C8eq
πŸ‘‰Paper arxiv.org/pdf/2403.18913.pdf
πŸ‘‰Code github.com/lpiccinelli-eth/unidepth
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”˜ RELI11D: Multimodal Humans πŸ”˜

πŸ‘‰RELI11D is the ultimate and high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera. Dataset & Source Code to be released soonπŸ’™

πŸ‘‰Review https://t.ly/5EG6X
πŸ‘‰Paper https://lnkd.in/ep6Utcik
πŸ‘‰Project https://lnkd.in/eDhNHYBb
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ ECoDepth: SOTA Diffusive Mono-Depth πŸ”₯

πŸ‘‰New SIDE model using a diffusion backbone conditioned on ViT embeddings. It's the new SOTA in SIDE. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/s2pbB
πŸ‘‰Paper https://lnkd.in/eYt5yr_q
πŸ‘‰Code https://lnkd.in/eEcyPQcd
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ•·οΈ Gen-NeRF2NeRF Translation πŸ•·οΈ

πŸ‘‰GenN2N: unified NeRF-to-NeRF translation for editing tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc.

πŸ‘‰Review https://t.ly/VMWAH
πŸ‘‰Paper arxiv.org/pdf/2404.02788.pdf
πŸ‘‰Project xiangyueliu.github.io/GenN2N/
πŸ‘‰Code github.com/Lxiangyue/GenN2N
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘†iSeg: Interactive 3D SegmentationπŸ‘†

πŸ‘‰ iSeg: interactive segmentation technique for 3D shapes operating entirely in 3D. It accepts both positive/negative clicks directly on the shape's surface, indicating inclusion & exclusion of regions.

πŸ‘‰Review https://t.ly/tyFnD
πŸ‘‰Paper https://lnkd.in/dydAz8zp
πŸ‘‰Project https://lnkd.in/de-h6SRi
πŸ‘‰Code (coming)
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— Neural Bodies with Clothes πŸ‘—

πŸ‘‰Neural-ABC is a novel parametric model based on neural implicit functions that can represent clothed human bodies with disentangled latent spaces for ID, clothing, shape, and pose.

πŸ‘‰Review https://t.ly/Un1wc
πŸ‘‰Project https://lnkd.in/dhDG6FF5
πŸ‘‰Paper https://lnkd.in/dhcfK7jZ
πŸ‘‰Code https://lnkd.in/dQvXWysP
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”Œ BodyMAP: human body & pressure πŸ”Œ

πŸ‘‰#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming πŸ’™

πŸ‘‰Review https://t.ly/8926S
πŸ‘‰Project bodymap3d.github.io/
πŸ‘‰Paper https://lnkd.in/gCxH4ev3
πŸ‘‰Code https://lnkd.in/gaifdy3q
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 XComposer2: 4K Vision-Language 🧞

πŸ‘‰InternLMXComposer2-4KHD brings LVLM resolution capabilities up to 4K HD (3840Γ—1600) and beyond. Authors: Shanghai AI Lab, CUHK, SenseTime & Tsinghua. Source Code & Models released πŸ’™

πŸ‘‰Review https://t.ly/GCHsz
πŸ‘‰Paper arxiv.org/pdf/2404.06512.pdf
πŸ‘‰Code github.com/InternLM/InternLM-XComposer