๐๏ธ MATH-Vision Dataset ๐๏ธ
๐MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐
๐Review https://t.ly/gmIAu
๐Paper arxiv.org/pdf/2402.14804.pdf
๐Project mathvision-cuhk.github.io/
๐Code github.com/mathvision-cuhk/MathVision
๐MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐
๐Review https://t.ly/gmIAu
๐Paper arxiv.org/pdf/2402.14804.pdf
๐Project mathvision-cuhk.github.io/
๐Code github.com/mathvision-cuhk/MathVision
๐คฏ8๐ฅ4๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ
FlowMDM: Human Composition๐ซ
๐FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
๐Review https://t.ly/pr2g_
๐Paper https://lnkd.in/daYRftdF
๐Project https://lnkd.in/dcRkv5Pc
๐Repo https://lnkd.in/dw-3JJks
๐FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.
๐Review https://t.ly/pr2g_
๐Paper https://lnkd.in/daYRftdF
๐Project https://lnkd.in/dcRkv5Pc
๐Repo https://lnkd.in/dw-3JJks
โค9๐ฅ6๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ทEMO: talking/singing Gen-AI ๐ท
๐EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
๐Review https://t.ly/4IYj5
๐Paper https://lnkd.in/dGPX2-Yc
๐Project https://lnkd.in/dyf6p_N3
๐Repo (empty) github.com/HumanAIGC/EMO
๐EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
๐Review https://t.ly/4IYj5
๐Paper https://lnkd.in/dGPX2-Yc
๐Project https://lnkd.in/dyf6p_N3
๐Repo (empty) github.com/HumanAIGC/EMO
โค18๐ฅ7๐4๐คฏ3๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Multi-LoRA Composition ๐
๐Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐
๐Review https://t.ly/GFy3Z
๐Paper arxiv.org/pdf/2402.16843.pdf
๐Code github.com/maszhongming/Multi-LoRA-Composition
๐Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐
๐Review https://t.ly/GFy3Z
๐Paper arxiv.org/pdf/2402.16843.pdf
๐Code github.com/maszhongming/Multi-LoRA-Composition
๐11โค6๐ฅ2๐ฅฐ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ MM-AU: Video Accident ๐ฅ
๐MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐
๐Review https://t.ly/a-jKI
๐Paper arxiv.org/pdf/2403.00436.pdf
๐Dataset http://www.lotvsmmau.net/MMAU/demo
๐MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐
๐Review https://t.ly/a-jKI
๐Paper arxiv.org/pdf/2403.00436.pdf
๐Dataset http://www.lotvsmmau.net/MMAU/demo
๐11โค2๐ฅ2๐คฏ2
๐ฅ SOTA: Stable Diffusion 3 is out! ๐ฅ
๐Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐
๐Review https://t.ly/a1koo
๐Paper https://lnkd.in/d4i-9Bte
๐Blog https://lnkd.in/d-bEX-ww
๐Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐
๐Review https://t.ly/a1koo
๐Paper https://lnkd.in/d4i-9Bte
๐Blog https://lnkd.in/d-bEX-ww
๐ฅ19โค5๐3โก1๐1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งตE-LoFTR: new Feats-Matching SOTA๐งต
๐A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
๐Review https://t.ly/7SPmC
๐Paper https://arxiv.org/pdf/2403.04765.pdf
๐Project https://zju3dv.github.io/efficientloftr/
๐Repo https://github.com/zju3dv/efficientloftr
๐A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
๐Review https://t.ly/7SPmC
๐Paper https://arxiv.org/pdf/2403.04765.pdf
๐Project https://zju3dv.github.io/efficientloftr/
๐Repo https://github.com/zju3dv/efficientloftr
๐ฅ13๐4๐คฏ2โค1
๐ฆStableDrag: Point-based Editing๐ฆ
๐#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
๐Review https://t.ly/eUI05
๐Paper https://lnkd.in/dz8-ymck
๐Project stabledrag.github.io/
๐#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
๐Review https://t.ly/eUI05
๐Paper https://lnkd.in/dz8-ymck
๐Project stabledrag.github.io/
โค2๐1๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธ PIXART-ฮฃ: 4K Generation ๐๏ธ
๐PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐
๐Review https://t.ly/Cm2Qh
๐Paper arxiv.org/pdf/2403.04692.pdf
๐Project pixart-alpha.github.io/PixArt-sigma-project/
๐Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐ค-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐
๐Review https://t.ly/Cm2Qh
๐Paper arxiv.org/pdf/2403.04692.pdf
๐Project pixart-alpha.github.io/PixArt-sigma-project/
๐Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐ค-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐ฅ7โก1โค1๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐บ Can GPT-4 play DOOM? ๐บ
๐Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
๐Review https://t.ly/W8-0F
๐Paper https://lnkd.in/dmsB7bjA
๐Project https://lnkd.in/ddDPwjQB
๐Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
๐Review https://t.ly/W8-0F
๐Paper https://lnkd.in/dmsB7bjA
๐Project https://lnkd.in/ddDPwjQB
๐คฏ8๐ฉ7๐ฅ2๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชRT Humanoid from Head-Mounted Sensors๐ช
๐#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
๐Review https://t.ly/Si2Mp
๐Paper arxiv.org/pdf/2403.06862.pdf
๐Project www.zhengyiluo.com/SimXR/
๐#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
๐Review https://t.ly/Si2Mp
๐Paper arxiv.org/pdf/2403.06862.pdf
๐Project www.zhengyiluo.com/SimXR/
โค12โก1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ท๏ธ Face Foundation Model ๐ท๏ธ
๐Arc2Face, the first foundation model for human faces. Source Code released ๐
๐Review https://t.ly/MfAFI
๐Paper https://lnkd.in/dViE_tCd
๐Project https://lnkd.in/d4MHdEZK
๐Code https://lnkd.in/dv9ZtDfA
๐Arc2Face, the first foundation model for human faces. Source Code released ๐
๐Review https://t.ly/MfAFI
๐Paper https://lnkd.in/dViE_tCd
๐Project https://lnkd.in/d4MHdEZK
๐Code https://lnkd.in/dv9ZtDfA
โค12๐3๐1๐คฉ1
๐ชผFaceXFormer: Unified Face-Transformer๐ชผ
๐FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.
๐Review https://t.ly/MfAFI
๐Paper https://arxiv.org/pdf/2403.12960.pdf
๐Project kartik-3004.github.io/facexformer_web/
๐Code github.com/Kartik-3004/facexformer
๐FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.
๐Review https://t.ly/MfAFI
๐Paper https://arxiv.org/pdf/2403.12960.pdf
๐Project kartik-3004.github.io/facexformer_web/
๐Code github.com/Kartik-3004/facexformer
๐11โค4๐ฅฐ2๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ DINO-based Video Tracking ๐ฆ
๐The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)๐
๐Review https://t.ly/_GIMT
๐Paper https://lnkd.in/dsGVDcar
๐Project dino-tracker.github.io/
๐Code https://github.com/AssafSinger94/dino-tracker
๐The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)๐
๐Review https://t.ly/_GIMT
๐Paper https://lnkd.in/dsGVDcar
๐Project dino-tracker.github.io/
๐Code https://github.com/AssafSinger94/dino-tracker
๐ฅ18โค3๐คฏ2๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ T-Rex 2: a new SOTA is out! ๐ฆ
๐A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code released๐
๐Review https://t.ly/fYw8D
๐Paper https://lnkd.in/dpmRh2zh
๐Project https://lnkd.in/dnR_jPcR
๐Code https://lnkd.in/dnZnGRUn
๐Demo https://lnkd.in/drDUEDYh
๐A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code released๐
๐Review https://t.ly/fYw8D
๐Paper https://lnkd.in/dpmRh2zh
๐Project https://lnkd.in/dnR_jPcR
๐Code https://lnkd.in/dnZnGRUn
๐Demo https://lnkd.in/drDUEDYh
๐ฅ23๐3๐คฏ2โค1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐TinyBeauty: 460 FPS Make-up๐
๐TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!
๐Review https://t.ly/LG5ok
๐Paper https://arxiv.org/pdf/2403.15033.pdf
๐Project https://tinybeauty.github.io/TinyBeauty/
๐TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!
๐Review https://t.ly/LG5ok
๐Paper https://arxiv.org/pdf/2403.15033.pdf
๐Project https://tinybeauty.github.io/TinyBeauty/
๐7๐คฏ4๐2โก1๐ฅ1๐ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ AiOS: All-in-One-Stage Humans โ
๐All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.
๐Review https://t.ly/ekNd4
๐Paper https://arxiv.org/pdf/2403.17934.pdf
๐Project https://ttxskk.github.io/AiOS/
๐Code/Demo (announced)
๐All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.
๐Review https://t.ly/ekNd4
๐Paper https://arxiv.org/pdf/2403.17934.pdf
๐Project https://ttxskk.github.io/AiOS/
๐Code/Demo (announced)
โค6๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ MAVOS Object Segmentation ๐
๐MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)๐
๐Review https://t.ly/SKaRG
๐Paper https://lnkd.in/dQyifKa3
๐Project github.com/Amshaker/MAVOS
๐MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)๐
๐Review https://t.ly/SKaRG
๐Paper https://lnkd.in/dQyifKa3
๐Project github.com/Amshaker/MAVOS
๐ฅ10๐2โค1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ ObjectDrop: automagical objects removal ๐ฆ
๐#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!
๐Review https://t.ly/ZJ6NN
๐Paper https://arxiv.org/pdf/2403.18818.pdf
๐Project https://objectdrop.github.io/
๐#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!
๐Review https://t.ly/ZJ6NN
๐Paper https://arxiv.org/pdf/2403.18818.pdf
๐Project https://objectdrop.github.io/
๐14๐คฏ8โค4๐ฅ3๐พ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชผ Universal Mono Metric Depth ๐ชผ
๐ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code released๐
๐Review https://t.ly/5C8eq
๐Paper arxiv.org/pdf/2403.18913.pdf
๐Code github.com/lpiccinelli-eth/unidepth
๐ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code released๐
๐Review https://t.ly/5C8eq
๐Paper arxiv.org/pdf/2403.18913.pdf
๐Code github.com/lpiccinelli-eth/unidepth
๐ฅ10๐1๐คฃ1