This media is not supported in your browser
VIEW IN TELEGRAM
☀️ Universal Relightable Avatars ☀️
👉#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!
👉Review https://t.ly/U-ESX
👉Paper arxiv.org/pdf/2410.24223
👉Project junxuan-li.github.io/urgca-website
👉#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!
👉Review https://t.ly/U-ESX
👉Paper arxiv.org/pdf/2410.24223
👉Project junxuan-li.github.io/urgca-website
❤11🔥5⚡1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🏣 CityGaussianV2: Large-Scale City 🏣
👉A novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code released💙
👉Review https://t.ly/Xgn59
👉Paper arxiv.org/pdf/2411.00771
👉Project dekuliutesla.github.io/CityGaussianV2/
👉Code github.com/DekuLiuTesla/CityGaussian
👉A novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code released💙
👉Review https://t.ly/Xgn59
👉Paper arxiv.org/pdf/2411.00771
👉Project dekuliutesla.github.io/CityGaussianV2/
👉Code github.com/DekuLiuTesla/CityGaussian
👍15🔥9❤2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💪 Muscles in Time Dataset 💪
👉Muscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon 💙
👉Review https://t.ly/108g6
👉Paper arxiv.org/pdf/2411.00128
👉Project davidschneider.ai/mint
👉Code github.com/simplexsigil/MusclesInTime
👉Muscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon 💙
👉Review https://t.ly/108g6
👉Paper arxiv.org/pdf/2411.00128
👉Project davidschneider.ai/mint
👉Code github.com/simplexsigil/MusclesInTime
🔥8❤3👍3
This media is not supported in your browser
VIEW IN TELEGRAM
🧠 Single Neuron Reconstruction 🧠
👉SIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience 💙
👉Review https://t.ly/Y5Xu0
👉Paper https://arxiv.org/pdf/2411.04715
👉Repo github.com/beanli161514/neurofly
👉SIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience 💙
👉Review https://t.ly/Y5Xu0
👉Paper https://arxiv.org/pdf/2411.04715
👉Repo github.com/beanli161514/neurofly
❤4🔥1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🫠 X-Portrait 2: SOTA(?) Portrait Animation 🫠
👉ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
👉Review https://t.ly/8Owh9 [UPDATE]
👉Paper ?
👉Project byteaigc.github.io/X-Portrait2/
👉Repo ?
👉ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
👉Review https://t.ly/8Owh9 [UPDATE]
👉Paper ?
👉Project byteaigc.github.io/X-Portrait2/
👉Repo ?
🔥13🤯5👍4❤1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
❄️Don’t Look Twice: ViT by RLT❄️
👉CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced 💙
👉Review https://t.ly/ccSwN
👉Paper https://lnkd.in/d6VXur_q
👉Project https://lnkd.in/d4tXwM5T
👉Repo TBA
👉CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced 💙
👉Review https://t.ly/ccSwN
👉Paper https://lnkd.in/d6VXur_q
👉Project https://lnkd.in/d4tXwM5T
👉Repo TBA
🔥9👍3❤1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐔SeedEdit: foundational T2I🐔
👉ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is online💙
👉Review https://t.ly/hPlnN
👉Paper https://arxiv.org/pdf/2411.06686
👉Project team.doubao.com/en/special/seededit
🤗Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
👉ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is online💙
👉Review https://t.ly/hPlnN
👉Paper https://arxiv.org/pdf/2411.06686
👉Project team.doubao.com/en/special/seededit
🤗Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
🔥10❤6🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 4 NanoSeconds inference 🔥
👉LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
👉Review https://t.ly/GflOW
👉Paper https://lnkd.in/dAZQr3dW
👉Full clip https://lnkd.in/dvDJ3j-u
👉LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
👉Review https://t.ly/GflOW
👉Paper https://lnkd.in/dAZQr3dW
👉Full clip https://lnkd.in/dvDJ3j-u
🔥29🤯12👍1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🛥️ Global Tracklet Association MOT 🛥️
👉A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code released💙
👉Review https://t.ly/gk-yh
👉Paper https://lnkd.in/dvXQVKFw
👉Repo https://lnkd.in/dEJqiyWs
👉A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code released💙
👉Review https://t.ly/gk-yh
👉Paper https://lnkd.in/dvXQVKFw
👉Repo https://lnkd.in/dEJqiyWs
👍10🔥4❤2
This media is not supported in your browser
VIEW IN TELEGRAM
🧶 MagicQuill: super-easy Diffusion Editing 🧶
👉MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released 💙
👉Review https://t.ly/hJyLa
👉Paper https://arxiv.org/pdf/2411.09703
👉Project https://magicquill.art/demo/
👉Repo https://github.com/magic-quill/magicquill
👉Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
👉MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released 💙
👉Review https://t.ly/hJyLa
👉Paper https://arxiv.org/pdf/2411.09703
👉Project https://magicquill.art/demo/
👉Repo https://github.com/magic-quill/magicquill
👉Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
🤩7🔥4❤3👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🧰 EchoMimicV2: Semi-body Human 🧰
👉Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announced💙
👉Review https://t.ly/enLxJ
👉Paper arxiv.org/pdf/2411.10061
👉Project antgroup.github.io/ai/echomimic_v2/
👉Repo-v2 github.com/antgroup/echomimic_v2
👉Repo-v1 https://github.com/antgroup/echomimic
👉Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announced💙
👉Review https://t.ly/enLxJ
👉Paper arxiv.org/pdf/2411.10061
👉Project antgroup.github.io/ai/echomimic_v2/
👉Repo-v2 github.com/antgroup/echomimic_v2
👉Repo-v1 https://github.com/antgroup/echomimic
❤5🔥5👏2
This media is not supported in your browser
VIEW IN TELEGRAM
⚔️SAMurai: SAM for Tracking⚔️
👉UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0💙
👉Review https://t.ly/yGU0P
👉Paper https://arxiv.org/pdf/2411.11922
👉Repo https://github.com/yangchris11/samurai
👉Project https://yangchris11.github.io/samurai/
👉UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0💙
👉Review https://t.ly/yGU0P
👉Paper https://arxiv.org/pdf/2411.11922
👉Repo https://github.com/yangchris11/samurai
👉Project https://yangchris11.github.io/samurai/
🔥20❤6😍2⚡1👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦖Dino-X: Unified Obj-Centric LVM🦖
👉Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced 💙
👉Review https://t.ly/CSQon
👉Paper https://lnkd.in/dc44ZM8v
👉Project https://lnkd.in/dehKJVvC
👉Repo https://lnkd.in/df8Kb6iz
👉Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced 💙
👉Review https://t.ly/CSQon
👉Paper https://lnkd.in/dc44ZM8v
👉Project https://lnkd.in/dehKJVvC
👉Repo https://lnkd.in/df8Kb6iz
🔥12🤯8❤4👍3🤩1
🌎All Languages Matter: LMMs vs. 100 Lang.🌎
👉ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset 💙
👉Review https://t.ly/VsoJB
👉Paper https://lnkd.in/ddVVZfi2
👉Project https://lnkd.in/dpssaeRq
👉Code https://lnkd.in/dnbaJJE4
👉Dataset https://lnkd.in/drw-_95v
👉ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset 💙
👉Review https://t.ly/VsoJB
👉Paper https://lnkd.in/ddVVZfi2
👉Project https://lnkd.in/dpssaeRq
👉Code https://lnkd.in/dnbaJJE4
👉Dataset https://lnkd.in/drw-_95v
❤3👍1👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦙 EdgeCape: SOTA Agnostic Pose 🦙
👉EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released💙
👉Review https://t.ly/4TpAs
👉Paper https://arxiv.org/pdf/2411.16665
👉Project https://orhir.github.io/edge_cape/
👉Code https://github.com/orhir/EdgeCape
👉EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released💙
👉Review https://t.ly/4TpAs
👉Paper https://arxiv.org/pdf/2411.16665
👉Project https://orhir.github.io/edge_cape/
👉Code https://github.com/orhir/EdgeCape
🔥10👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🛟 StableAnimator: ID-aware Humans 🛟
👉StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
👉Review https://t.ly/JDtL3
👉Paper https://arxiv.org/pdf/2411.17697
👉Project francis-rings.github.io/StableAnimator/
👉Code github.com/Francis-Rings/StableAnimator
👉StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
👉Review https://t.ly/JDtL3
👉Paper https://arxiv.org/pdf/2411.17697
👉Project francis-rings.github.io/StableAnimator/
👉Code github.com/Francis-Rings/StableAnimator
👍12❤3🤯2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧶SOTA track-by-propagation🧶
👉SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 💙
👉Review https://t.ly/QSQ8L
👉Paper arxiv.org/pdf/2410.01806
👉Project sambamotr.github.io/
👉Repo https://lnkd.in/dRDX6nk2
👉SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 💙
👉Review https://t.ly/QSQ8L
👉Paper arxiv.org/pdf/2410.01806
👉Project sambamotr.github.io/
👉Repo https://lnkd.in/dRDX6nk2
❤5🔥2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
👺HiFiVFS: Extreme Face Swapping👺
👉HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced😢
👉Review https://t.ly/ea8dU
👉Paper https://arxiv.org/pdf/2411.18293
👉Project https://cxcx1996.github.io/HiFiVFS
👉HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced😢
👉Review https://t.ly/ea8dU
👉Paper https://arxiv.org/pdf/2411.18293
👉Project https://cxcx1996.github.io/HiFiVFS
🤯13❤2🔥2👍1👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Video Depth without Video Models🔥
👉RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth 🤯 Code under Apache💙
👉Review https://t.ly/R4LqS
👉Paper https://arxiv.org/pdf/2411.19189
👉Project https://rollingdepth.github.io/
👉Repo https://github.com/prs-eth/rollingdepth
👉RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth 🤯 Code under Apache💙
👉Review https://t.ly/R4LqS
👉Paper https://arxiv.org/pdf/2411.19189
👉Project https://rollingdepth.github.io/
👉Repo https://github.com/prs-eth/rollingdepth
🔥14🤯4👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽Universal Soccer Foundation Model⚽
👉Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released💙
👉Review https://t.ly/-X90B
👉Paper https://arxiv.org/pdf/2412.01820
👉Project https://jyrao.github.io/UniSoccer/
👉Repo https://github.com/jyrao/UniSoccer
👉Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be released💙
👉Review https://t.ly/-X90B
👉Paper https://arxiv.org/pdf/2412.01820
👉Project https://jyrao.github.io/UniSoccer/
👉Repo https://github.com/jyrao/UniSoccer
🔥8❤2👍2🤩1😍1