This media is not supported in your browser
VIEW IN TELEGRAM
๐ REM: Segment What You Describe ๐
๐REM is a framework for segmenting concepts in video that can be described via LLM. Suitable for rare & non-object dynamic concepts, such as waves, smoke, etc. Code & Data announced ๐
๐Review https://t.ly/OyVtV
๐Paper arxiv.org/pdf/2410.23287
๐Project https://miccooper9.github.io/projects/ReferEverything/
๐REM is a framework for segmenting concepts in video that can be described via LLM. Suitable for rare & non-object dynamic concepts, such as waves, smoke, etc. Code & Data announced ๐
๐Review https://t.ly/OyVtV
๐Paper arxiv.org/pdf/2410.23287
๐Project https://miccooper9.github.io/projects/ReferEverything/
๐ฅ18โค4๐3๐คฉ2๐คฏ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธ Universal Relightable Avatars โ๏ธ
๐#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!
๐Review https://t.ly/U-ESX
๐Paper arxiv.org/pdf/2410.24223
๐Project junxuan-li.github.io/urgca-website
๐#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!
๐Review https://t.ly/U-ESX
๐Paper arxiv.org/pdf/2410.24223
๐Project junxuan-li.github.io/urgca-website
โค11๐ฅ5โก1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฃ CityGaussianV2: Large-Scale City ๐ฃ
๐A novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code released๐
๐Review https://t.ly/Xgn59
๐Paper arxiv.org/pdf/2411.00771
๐Project dekuliutesla.github.io/CityGaussianV2/
๐Code github.com/DekuLiuTesla/CityGaussian
๐A novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code released๐
๐Review https://t.ly/Xgn59
๐Paper arxiv.org/pdf/2411.00771
๐Project dekuliutesla.github.io/CityGaussianV2/
๐Code github.com/DekuLiuTesla/CityGaussian
๐15๐ฅ9โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช Muscles in Time Dataset ๐ช
๐Muscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon ๐
๐Review https://t.ly/108g6
๐Paper arxiv.org/pdf/2411.00128
๐Project davidschneider.ai/mint
๐Code github.com/simplexsigil/MusclesInTime
๐Muscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon ๐
๐Review https://t.ly/108g6
๐Paper arxiv.org/pdf/2411.00128
๐Project davidschneider.ai/mint
๐Code github.com/simplexsigil/MusclesInTime
๐ฅ8โค3๐3
This media is not supported in your browser
VIEW IN TELEGRAM
๐ง Single Neuron Reconstruction ๐ง
๐SIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience ๐
๐Review https://t.ly/Y5Xu0
๐Paper https://arxiv.org/pdf/2411.04715
๐Repo github.com/beanli161514/neurofly
๐SIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience ๐
๐Review https://t.ly/Y5Xu0
๐Paper https://arxiv.org/pdf/2411.04715
๐Repo github.com/beanli161514/neurofly
โค4๐ฅ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ X-Portrait 2: SOTA(?) Portrait Animation ๐ซ
๐ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
๐Review https://t.ly/8Owh9 [UPDATE]
๐Paper ?
๐Project byteaigc.github.io/X-Portrait2/
๐Repo ?
๐ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
๐Review https://t.ly/8Owh9 [UPDATE]
๐Paper ?
๐Project byteaigc.github.io/X-Portrait2/
๐Repo ?
๐ฅ13๐คฏ5๐4โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธDonโt Look Twice: ViT by RLTโ๏ธ
๐CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced ๐
๐Review https://t.ly/ccSwN
๐Paper https://lnkd.in/d6VXur_q
๐Project https://lnkd.in/d4tXwM5T
๐Repo TBA
๐CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced ๐
๐Review https://t.ly/ccSwN
๐Paper https://lnkd.in/d6VXur_q
๐Project https://lnkd.in/d4tXwM5T
๐Repo TBA
๐ฅ9๐3โค1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐SeedEdit: foundational T2I๐
๐ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is online๐
๐Review https://t.ly/hPlnN
๐Paper https://arxiv.org/pdf/2411.06686
๐Project team.doubao.com/en/special/seededit
๐คDemo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
๐ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is online๐
๐Review https://t.ly/hPlnN
๐Paper https://arxiv.org/pdf/2411.06686
๐Project team.doubao.com/en/special/seededit
๐คDemo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
๐ฅ10โค6๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ 4 NanoSeconds inference ๐ฅ
๐LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
๐Review https://t.ly/GflOW
๐Paper https://lnkd.in/dAZQr3dW
๐Full clip https://lnkd.in/dvDJ3j-u
๐LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
๐Review https://t.ly/GflOW
๐Paper https://lnkd.in/dAZQr3dW
๐Full clip https://lnkd.in/dvDJ3j-u
๐ฅ29๐คฏ12๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ๏ธ Global Tracklet Association MOT ๐ฅ๏ธ
๐A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code released๐
๐Review https://t.ly/gk-yh
๐Paper https://lnkd.in/dvXQVKFw
๐Repo https://lnkd.in/dEJqiyWs
๐A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code released๐
๐Review https://t.ly/gk-yh
๐Paper https://lnkd.in/dvXQVKFw
๐Repo https://lnkd.in/dEJqiyWs
๐10๐ฅ4โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐งถ MagicQuill: super-easy Diffusion Editing ๐งถ
๐MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released ๐
๐Review https://t.ly/hJyLa
๐Paper https://arxiv.org/pdf/2411.09703
๐Project https://magicquill.art/demo/
๐Repo https://github.com/magic-quill/magicquill
๐Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
๐MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released ๐
๐Review https://t.ly/hJyLa
๐Paper https://arxiv.org/pdf/2411.09703
๐Project https://magicquill.art/demo/
๐Repo https://github.com/magic-quill/magicquill
๐Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
๐คฉ7๐ฅ4โค3๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฐ EchoMimicV2: Semi-body Human ๐งฐ
๐Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announced๐
๐Review https://t.ly/enLxJ
๐Paper arxiv.org/pdf/2411.10061
๐Project antgroup.github.io/ai/echomimic_v2/
๐Repo-v2 github.com/antgroup/echomimic_v2
๐Repo-v1 https://github.com/antgroup/echomimic
๐Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announced๐
๐Review https://t.ly/enLxJ
๐Paper arxiv.org/pdf/2411.10061
๐Project antgroup.github.io/ai/echomimic_v2/
๐Repo-v2 github.com/antgroup/echomimic_v2
๐Repo-v1 https://github.com/antgroup/echomimic
โค5๐ฅ5๐2
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธSAMurai: SAM for Trackingโ๏ธ
๐UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0๐
๐Review https://t.ly/yGU0P
๐Paper https://arxiv.org/pdf/2411.11922
๐Repo https://github.com/yangchris11/samurai
๐Project https://yangchris11.github.io/samurai/
๐UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0๐
๐Review https://t.ly/yGU0P
๐Paper https://arxiv.org/pdf/2411.11922
๐Repo https://github.com/yangchris11/samurai
๐Project https://yangchris11.github.io/samurai/
๐ฅ20โค6๐2โก1๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆDino-X: Unified Obj-Centric LVM๐ฆ
๐Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced ๐
๐Review https://t.ly/CSQon
๐Paper https://lnkd.in/dc44ZM8v
๐Project https://lnkd.in/dehKJVvC
๐Repo https://lnkd.in/df8Kb6iz
๐Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced ๐
๐Review https://t.ly/CSQon
๐Paper https://lnkd.in/dc44ZM8v
๐Project https://lnkd.in/dehKJVvC
๐Repo https://lnkd.in/df8Kb6iz
๐ฅ12๐คฏ8โค4๐3๐คฉ1
๐All Languages Matter: LMMs vs. 100 Lang.๐
๐ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset ๐
๐Review https://t.ly/VsoJB
๐Paper https://lnkd.in/ddVVZfi2
๐Project https://lnkd.in/dpssaeRq
๐Code https://lnkd.in/dnbaJJE4
๐Dataset https://lnkd.in/drw-_95v
๐ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset ๐
๐Review https://t.ly/VsoJB
๐Paper https://lnkd.in/ddVVZfi2
๐Project https://lnkd.in/dpssaeRq
๐Code https://lnkd.in/dnbaJJE4
๐Dataset https://lnkd.in/drw-_95v
โค3๐1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ EdgeCape: SOTA Agnostic Pose ๐ฆ
๐EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released๐
๐Review https://t.ly/4TpAs
๐Paper https://arxiv.org/pdf/2411.16665
๐Project https://orhir.github.io/edge_cape/
๐Code https://github.com/orhir/EdgeCape
๐EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released๐
๐Review https://t.ly/4TpAs
๐Paper https://arxiv.org/pdf/2411.16665
๐Project https://orhir.github.io/edge_cape/
๐Code https://github.com/orhir/EdgeCape
๐ฅ10๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ StableAnimator: ID-aware Humans ๐
๐StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
๐Review https://t.ly/JDtL3
๐Paper https://arxiv.org/pdf/2411.17697
๐Project francis-rings.github.io/StableAnimator/
๐Code github.com/Francis-Rings/StableAnimator
๐StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
๐Review https://t.ly/JDtL3
๐Paper https://arxiv.org/pdf/2411.17697
๐Project francis-rings.github.io/StableAnimator/
๐Code github.com/Francis-Rings/StableAnimator
๐12โค3๐คฏ2๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งถSOTA track-by-propagation๐งถ
๐SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐
๐Review https://t.ly/QSQ8L
๐Paper arxiv.org/pdf/2410.01806
๐Project sambamotr.github.io/
๐Repo https://lnkd.in/dRDX6nk2
๐SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐
๐Review https://t.ly/QSQ8L
๐Paper arxiv.org/pdf/2410.01806
๐Project sambamotr.github.io/
๐Repo https://lnkd.in/dRDX6nk2
โค5๐ฅ2๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐บHiFiVFS: Extreme Face Swapping๐บ
๐HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐ข
๐Review https://t.ly/ea8dU
๐Paper https://arxiv.org/pdf/2411.18293
๐Project https://cxcx1996.github.io/HiFiVFS
๐HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐ข
๐Review https://t.ly/ea8dU
๐Paper https://arxiv.org/pdf/2411.18293
๐Project https://cxcx1996.github.io/HiFiVFS
๐คฏ13โค2๐ฅ2๐1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅVideo Depth without Video Models๐ฅ
๐RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐คฏ Code under Apache๐
๐Review https://t.ly/R4LqS
๐Paper https://arxiv.org/pdf/2411.19189
๐Project https://rollingdepth.github.io/
๐Repo https://github.com/prs-eth/rollingdepth
๐RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐คฏ Code under Apache๐
๐Review https://t.ly/R4LqS
๐Paper https://arxiv.org/pdf/2411.19189
๐Project https://rollingdepth.github.io/
๐Repo https://github.com/prs-eth/rollingdepth
๐ฅ14๐คฏ4๐2๐คฉ1