This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ PIXART-Ξ£: 4K Generation ποΈ
πPixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced π
πReview https://t.ly/Cm2Qh
πPaper arxiv.org/pdf/2403.04692.pdf
πProject pixart-alpha.github.io/PixArt-sigma-project/
πRepo (empty) github.com/PixArt-alpha/PixArt-sigma
π€-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
πPixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced π
πReview https://t.ly/Cm2Qh
πPaper arxiv.org/pdf/2403.04692.pdf
πProject pixart-alpha.github.io/PixArt-sigma-project/
πRepo (empty) github.com/PixArt-alpha/PixArt-sigma
π€-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
This media is not supported in your browser
VIEW IN TELEGRAM
πΊ Can GPT-4 play DOOM? πΊ
πApparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
πReview https://t.ly/W8-0F
πPaper https://lnkd.in/dmsB7bjA
πProject https://lnkd.in/ddDPwjQB
πApparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
πReview https://t.ly/W8-0F
πPaper https://lnkd.in/dmsB7bjA
πProject https://lnkd.in/ddDPwjQB
This media is not supported in your browser
VIEW IN TELEGRAM
πͺRT Humanoid from Head-Mounted Sensorsπͺ
π#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
πReview https://t.ly/Si2Mp
πPaper arxiv.org/pdf/2403.06862.pdf
πProject www.zhengyiluo.com/SimXR/
π#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
πReview https://t.ly/Si2Mp
πPaper arxiv.org/pdf/2403.06862.pdf
πProject www.zhengyiluo.com/SimXR/
This media is not supported in your browser
VIEW IN TELEGRAM
π·οΈ Face Foundation Model π·οΈ
πArc2Face, the first foundation model for human faces. Source Code released π
πReview https://t.ly/MfAFI
πPaper https://lnkd.in/dViE_tCd
πProject https://lnkd.in/d4MHdEZK
πCode https://lnkd.in/dv9ZtDfA
πArc2Face, the first foundation model for human faces. Source Code released π
πReview https://t.ly/MfAFI
πPaper https://lnkd.in/dViE_tCd
πProject https://lnkd.in/d4MHdEZK
πCode https://lnkd.in/dv9ZtDfA
πͺΌFaceXFormer: Unified Face-TransformerπͺΌ
πFaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.
πReview https://t.ly/MfAFI
πPaper https://arxiv.org/pdf/2403.12960.pdf
πProject kartik-3004.github.io/facexformer_web/
πCode github.com/Kartik-3004/facexformer
πFaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.
πReview https://t.ly/MfAFI
πPaper https://arxiv.org/pdf/2403.12960.pdf
πProject kartik-3004.github.io/facexformer_web/
πCode github.com/Kartik-3004/facexformer
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ DINO-based Video Tracking π¦
πThe Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)π
πReview https://t.ly/_GIMT
πPaper https://lnkd.in/dsGVDcar
πProject dino-tracker.github.io/
πCode https://github.com/AssafSinger94/dino-tracker
πThe Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)π
πReview https://t.ly/_GIMT
πPaper https://lnkd.in/dsGVDcar
πProject dino-tracker.github.io/
πCode https://github.com/AssafSinger94/dino-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ T-Rex 2: a new SOTA is out! π¦
πA novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code releasedπ
πReview https://t.ly/fYw8D
πPaper https://lnkd.in/dpmRh2zh
πProject https://lnkd.in/dnR_jPcR
πCode https://lnkd.in/dnZnGRUn
πDemo https://lnkd.in/drDUEDYh
πA novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code releasedπ
πReview https://t.ly/fYw8D
πPaper https://lnkd.in/dpmRh2zh
πProject https://lnkd.in/dnR_jPcR
πCode https://lnkd.in/dnZnGRUn
πDemo https://lnkd.in/drDUEDYh
This media is not supported in your browser
VIEW IN TELEGRAM
πTinyBeauty: 460 FPS Make-upπ
πTinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!
πReview https://t.ly/LG5ok
πPaper https://arxiv.org/pdf/2403.15033.pdf
πProject https://tinybeauty.github.io/TinyBeauty/
πTinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!
πReview https://t.ly/LG5ok
πPaper https://arxiv.org/pdf/2403.15033.pdf
πProject https://tinybeauty.github.io/TinyBeauty/
This media is not supported in your browser
VIEW IN TELEGRAM
β AiOS: All-in-One-Stage Humans β
πAll-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.
πReview https://t.ly/ekNd4
πPaper https://arxiv.org/pdf/2403.17934.pdf
πProject https://ttxskk.github.io/AiOS/
πCode/Demo (announced)
πAll-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.
πReview https://t.ly/ekNd4
πPaper https://arxiv.org/pdf/2403.17934.pdf
πProject https://ttxskk.github.io/AiOS/
πCode/Demo (announced)
This media is not supported in your browser
VIEW IN TELEGRAM
π MAVOS Object Segmentation π
πMAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)π
πReview https://t.ly/SKaRG
πPaper https://lnkd.in/dQyifKa3
πProject github.com/Amshaker/MAVOS
πMAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)π
πReview https://t.ly/SKaRG
πPaper https://lnkd.in/dQyifKa3
πProject github.com/Amshaker/MAVOS
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ ObjectDrop: automagical objects removal π¦
π#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!
πReview https://t.ly/ZJ6NN
πPaper https://arxiv.org/pdf/2403.18818.pdf
πProject https://objectdrop.github.io/
π#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!
πReview https://t.ly/ZJ6NN
πPaper https://arxiv.org/pdf/2403.18818.pdf
πProject https://objectdrop.github.io/
This media is not supported in your browser
VIEW IN TELEGRAM
πͺΌ Universal Mono Metric Depth πͺΌ
πETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code releasedπ
πReview https://t.ly/5C8eq
πPaper arxiv.org/pdf/2403.18913.pdf
πCode github.com/lpiccinelli-eth/unidepth
πETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code releasedπ
πReview https://t.ly/5C8eq
πPaper arxiv.org/pdf/2403.18913.pdf
πCode github.com/lpiccinelli-eth/unidepth
This media is not supported in your browser
VIEW IN TELEGRAM
π RELI11D: Multimodal Humans π
πRELI11D is the ultimate and high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera. Dataset & Source Code to be released soonπ
πReview https://t.ly/5EG6X
πPaper https://lnkd.in/ep6Utcik
πProject https://lnkd.in/eDhNHYBb
πRELI11D is the ultimate and high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera. Dataset & Source Code to be released soonπ
πReview https://t.ly/5EG6X
πPaper https://lnkd.in/ep6Utcik
πProject https://lnkd.in/eDhNHYBb
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ ECoDepth: SOTA Diffusive Mono-Depth π₯
πNew SIDE model using a diffusion backbone conditioned on ViT embeddings. It's the new SOTA in SIDE. Source Code released π
πReview https://t.ly/s2pbB
πPaper https://lnkd.in/eYt5yr_q
πCode https://lnkd.in/eEcyPQcd
πNew SIDE model using a diffusion backbone conditioned on ViT embeddings. It's the new SOTA in SIDE. Source Code released π
πReview https://t.ly/s2pbB
πPaper https://lnkd.in/eYt5yr_q
πCode https://lnkd.in/eEcyPQcd
AI with Papers - Artificial Intelligence & Deep Learning
π¦ DINO-based Video Tracking π¦ πThe Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)π πReview https://t.ly/_GIMT πPaper https://lnkd.in/dsGVDcar πProject dino-tracker.github.io/β¦
GitHub
GitHub - AssafSinger94/dino-tracker: Official Pytorch Implementation for βDINO-Tracker: Taming DINO for Self-Supervised Point Trackingβ¦
Official Pytorch Implementation for βDINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Videoβ (ECCV 2024) - AssafSinger94/dino-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
π·οΈ Gen-NeRF2NeRF Translation π·οΈ
πGenN2N: unified NeRF-to-NeRF translation for editing tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc.
πReview https://t.ly/VMWAH
πPaper arxiv.org/pdf/2404.02788.pdf
πProject xiangyueliu.github.io/GenN2N/
πCode github.com/Lxiangyue/GenN2N
πGenN2N: unified NeRF-to-NeRF translation for editing tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc.
πReview https://t.ly/VMWAH
πPaper arxiv.org/pdf/2404.02788.pdf
πProject xiangyueliu.github.io/GenN2N/
πCode github.com/Lxiangyue/GenN2N
This media is not supported in your browser
VIEW IN TELEGRAM
πiSeg: Interactive 3D Segmentationπ
π iSeg: interactive segmentation technique for 3D shapes operating entirely in 3D. It accepts both positive/negative clicks directly on the shape's surface, indicating inclusion & exclusion of regions.
πReview https://t.ly/tyFnD
πPaper https://lnkd.in/dydAz8zp
πProject https://lnkd.in/de-h6SRi
πCode (coming)
π iSeg: interactive segmentation technique for 3D shapes operating entirely in 3D. It accepts both positive/negative clicks directly on the shape's surface, indicating inclusion & exclusion of regions.
πReview https://t.ly/tyFnD
πPaper https://lnkd.in/dydAz8zp
πProject https://lnkd.in/de-h6SRi
πCode (coming)
This media is not supported in your browser
VIEW IN TELEGRAM
π Neural Bodies with Clothes π
πNeural-ABC is a novel parametric model based on neural implicit functions that can represent clothed human bodies with disentangled latent spaces for ID, clothing, shape, and pose.
πReview https://t.ly/Un1wc
πProject https://lnkd.in/dhDG6FF5
πPaper https://lnkd.in/dhcfK7jZ
πCode https://lnkd.in/dQvXWysP
πNeural-ABC is a novel parametric model based on neural implicit functions that can represent clothed human bodies with disentangled latent spaces for ID, clothing, shape, and pose.
πReview https://t.ly/Un1wc
πProject https://lnkd.in/dhDG6FF5
πPaper https://lnkd.in/dhcfK7jZ
πCode https://lnkd.in/dQvXWysP
This media is not supported in your browser
VIEW IN TELEGRAM
π BodyMAP: human body & pressure π
π#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming π
πReview https://t.ly/8926S
πProject bodymap3d.github.io/
πPaper https://lnkd.in/gCxH4ev3
πCode https://lnkd.in/gaifdy3q
π#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming π
πReview https://t.ly/8926S
πProject bodymap3d.github.io/
πPaper https://lnkd.in/gCxH4ev3
πCode https://lnkd.in/gaifdy3q
This media is not supported in your browser
VIEW IN TELEGRAM
π§ XComposer2: 4K Vision-Language π§
πInternLMXComposer2-4KHD brings LVLM resolution capabilities up to 4K HD (3840Γ1600) and beyond. Authors: Shanghai AI Lab, CUHK, SenseTime & Tsinghua. Source Code & Models released π
πReview https://t.ly/GCHsz
πPaper arxiv.org/pdf/2404.06512.pdf
πCode github.com/InternLM/InternLM-XComposer
πInternLMXComposer2-4KHD brings LVLM resolution capabilities up to 4K HD (3840Γ1600) and beyond. Authors: Shanghai AI Lab, CUHK, SenseTime & Tsinghua. Source Code & Models released π
πReview https://t.ly/GCHsz
πPaper arxiv.org/pdf/2404.06512.pdf
πCode github.com/InternLM/InternLM-XComposer