This media is not supported in your browser
VIEW IN TELEGRAM
π« X-Portrait 2: SOTA(?) Portrait Animation π«
πByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
πReview https://t.ly/8Owh9 [UPDATE]
πPaper ?
πProject byteaigc.github.io/X-Portrait2/
πRepo ?
πByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
πReview https://t.ly/8Owh9 [UPDATE]
πPaper ?
πProject byteaigc.github.io/X-Portrait2/
πRepo ?
π₯13π€―5π4β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈDonβt Look Twice: ViT by RLTβοΈ
πCMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced π
πReview https://t.ly/ccSwN
πPaper https://lnkd.in/d6VXur_q
πProject https://lnkd.in/d4tXwM5T
πRepo TBA
πCMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced π
πReview https://t.ly/ccSwN
πPaper https://lnkd.in/d6VXur_q
πProject https://lnkd.in/d4tXwM5T
πRepo TBA
π₯9π3β€1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πSeedEdit: foundational T2Iπ
πByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is onlineπ
πReview https://t.ly/hPlnN
πPaper https://arxiv.org/pdf/2411.06686
πProject team.doubao.com/en/special/seededit
π€Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
πByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is onlineπ
πReview https://t.ly/hPlnN
πPaper https://arxiv.org/pdf/2411.06686
πProject team.doubao.com/en/special/seededit
π€Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
π₯10β€6π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ 4 NanoSeconds inference π₯
πLogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
πReview https://t.ly/GflOW
πPaper https://lnkd.in/dAZQr3dW
πFull clip https://lnkd.in/dvDJ3j-u
πLogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
πReview https://t.ly/GflOW
πPaper https://lnkd.in/dAZQr3dW
πFull clip https://lnkd.in/dvDJ3j-u
π₯29π€―12π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯οΈ Global Tracklet Association MOT π₯οΈ
πA novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code releasedπ
πReview https://t.ly/gk-yh
πPaper https://lnkd.in/dvXQVKFw
πRepo https://lnkd.in/dEJqiyWs
πA novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code releasedπ
πReview https://t.ly/gk-yh
πPaper https://lnkd.in/dvXQVKFw
πRepo https://lnkd.in/dEJqiyWs
π10π₯4β€2
This media is not supported in your browser
VIEW IN TELEGRAM
π§Ά MagicQuill: super-easy Diffusion Editing π§Ά
πMagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released π
πReview https://t.ly/hJyLa
πPaper https://arxiv.org/pdf/2411.09703
πProject https://magicquill.art/demo/
πRepo https://github.com/magic-quill/magicquill
πDemo https://huggingface.co/spaces/AI4Editing/MagicQuill
πMagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released π
πReview https://t.ly/hJyLa
πPaper https://arxiv.org/pdf/2411.09703
πProject https://magicquill.art/demo/
πRepo https://github.com/magic-quill/magicquill
πDemo https://huggingface.co/spaces/AI4Editing/MagicQuill
π€©7π₯4β€3π2
This media is not supported in your browser
VIEW IN TELEGRAM
π§° EchoMimicV2: Semi-body Human π§°
πAlipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announcedπ
πReview https://t.ly/enLxJ
πPaper arxiv.org/pdf/2411.10061
πProject antgroup.github.io/ai/echomimic_v2/
πRepo-v2 github.com/antgroup/echomimic_v2
πRepo-v1 https://github.com/antgroup/echomimic
πAlipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announcedπ
πReview https://t.ly/enLxJ
πPaper arxiv.org/pdf/2411.10061
πProject antgroup.github.io/ai/echomimic_v2/
πRepo-v2 github.com/antgroup/echomimic_v2
πRepo-v1 https://github.com/antgroup/echomimic
β€5π₯5π2
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈSAMurai: SAM for TrackingβοΈ
πUWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0π
πReview https://t.ly/yGU0P
πPaper https://arxiv.org/pdf/2411.11922
πRepo https://github.com/yangchris11/samurai
πProject https://yangchris11.github.io/samurai/
πUWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0π
πReview https://t.ly/yGU0P
πPaper https://arxiv.org/pdf/2411.11922
πRepo https://github.com/yangchris11/samurai
πProject https://yangchris11.github.io/samurai/
π₯20β€6π2β‘1π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Dino-X: Unified Obj-Centric LVMπ¦
πUnified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced π
πReview https://t.ly/CSQon
πPaper https://lnkd.in/dc44ZM8v
πProject https://lnkd.in/dehKJVvC
πRepo https://lnkd.in/df8Kb6iz
πUnified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced π
πReview https://t.ly/CSQon
πPaper https://lnkd.in/dc44ZM8v
πProject https://lnkd.in/dehKJVvC
πRepo https://lnkd.in/df8Kb6iz
π₯12π€―8β€4π3π€©1
πAll Languages Matter: LMMs vs. 100 Lang.π
πALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset π
πReview https://t.ly/VsoJB
πPaper https://lnkd.in/ddVVZfi2
πProject https://lnkd.in/dpssaeRq
πCode https://lnkd.in/dnbaJJE4
πDataset https://lnkd.in/drw-_95v
πALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset π
πReview https://t.ly/VsoJB
πPaper https://lnkd.in/ddVVZfi2
πProject https://lnkd.in/dpssaeRq
πCode https://lnkd.in/dnbaJJE4
πDataset https://lnkd.in/drw-_95v
β€3π1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ EdgeCape: SOTA Agnostic Pose π¦
πEdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code releasedπ
πReview https://t.ly/4TpAs
πPaper https://arxiv.org/pdf/2411.16665
πProject https://orhir.github.io/edge_cape/
πCode https://github.com/orhir/EdgeCape
πEdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code releasedπ
πReview https://t.ly/4TpAs
πPaper https://arxiv.org/pdf/2411.16665
πProject https://orhir.github.io/edge_cape/
πCode https://github.com/orhir/EdgeCape
π₯10π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π StableAnimator: ID-aware Humans π
πStableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
πReview https://t.ly/JDtL3
πPaper https://arxiv.org/pdf/2411.17697
πProject francis-rings.github.io/StableAnimator/
πCode github.com/Francis-Rings/StableAnimator
πStableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!
πReview https://t.ly/JDtL3
πPaper https://arxiv.org/pdf/2411.17697
πProject francis-rings.github.io/StableAnimator/
πCode github.com/Francis-Rings/StableAnimator
π12β€3π€―2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ΆSOTA track-by-propagationπ§Ά
πSambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 π
πReview https://t.ly/QSQ8L
πPaper arxiv.org/pdf/2410.01806
πProject sambamotr.github.io/
πRepo https://lnkd.in/dRDX6nk2
πSambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 π
πReview https://t.ly/QSQ8L
πPaper arxiv.org/pdf/2410.01806
πProject sambamotr.github.io/
πRepo https://lnkd.in/dRDX6nk2
β€5π₯2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πΊHiFiVFS: Extreme Face SwappingπΊ
πHiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announcedπ’
πReview https://t.ly/ea8dU
πPaper https://arxiv.org/pdf/2411.18293
πProject https://cxcx1996.github.io/HiFiVFS
πHiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announcedπ’
πReview https://t.ly/ea8dU
πPaper https://arxiv.org/pdf/2411.18293
πProject https://cxcx1996.github.io/HiFiVFS
π€―13β€2π₯2π1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Video Depth without Video Modelsπ₯
πRollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth π€― Code under Apacheπ
πReview https://t.ly/R4LqS
πPaper https://arxiv.org/pdf/2411.19189
πProject https://rollingdepth.github.io/
πRepo https://github.com/prs-eth/rollingdepth
πRollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth π€― Code under Apacheπ
πReview https://t.ly/R4LqS
πPaper https://arxiv.org/pdf/2411.19189
πProject https://rollingdepth.github.io/
πRepo https://github.com/prs-eth/rollingdepth
π₯14π€―4π2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
β½Universal Soccer Foundation Modelβ½
πUniversal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be releasedπ
πReview https://t.ly/-X90B
πPaper https://arxiv.org/pdf/2412.01820
πProject https://jyrao.github.io/UniSoccer/
πRepo https://github.com/jyrao/UniSoccer
πUniversal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be releasedπ
πReview https://t.ly/-X90B
πPaper https://arxiv.org/pdf/2412.01820
πProject https://jyrao.github.io/UniSoccer/
πRepo https://github.com/jyrao/UniSoccer
π₯8β€2π2π€©1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πMotion Prompting Video Generationπ
πDeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced π’
πReview https://t.ly/VyKbv
πPaper arxiv.org/pdf/2412.02700
πProject motion-prompting.github.io
πDeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced π’
πReview https://t.ly/VyKbv
πPaper arxiv.org/pdf/2412.02700
πProject motion-prompting.github.io
π₯13β€5π1π’1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦AniGS: Single Pic Animatable Avatarπ¦
π#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be releasedπ
πReview https://t.ly/4yfzn
πPaper arxiv.org/pdf/2412.02684
πProject lingtengqiu.github.io/2024/AniGS/
πRepo github.com/aigc3d/AniGS
π#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be releasedπ
πReview https://t.ly/4yfzn
πPaper arxiv.org/pdf/2412.02684
πProject lingtengqiu.github.io/2024/AniGS/
πRepo github.com/aigc3d/AniGS
1β€11π₯7π3π€©2π1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€GigaHands: Massive #3D Handsπ§€
πNovel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
πReview https://t.ly/SA0HG
πPaper www.arxiv.org/pdf/2412.04244
πRepo github.com/brown-ivl/gigahands
πProject ivl.cs.brown.edu/research/gigahands.html
πNovel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images
πReview https://t.ly/SA0HG
πPaper www.arxiv.org/pdf/2412.04244
πRepo github.com/brown-ivl/gigahands
πProject ivl.cs.brown.edu/research/gigahands.html
β€7π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦’ Track4Gen: Diffusion + Tracking π¦’
πTrack4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announcedπ’
πReview https://t.ly/9ujhc
πPaper arxiv.org/pdf/2412.06016
πProject hyeonho99.github.io/track4gen/
πGallery hyeonho99.github.io/track4gen/full.html
πTrack4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announcedπ’
πReview https://t.ly/9ujhc
πPaper arxiv.org/pdf/2412.06016
πProject hyeonho99.github.io/track4gen/
πGallery hyeonho99.github.io/track4gen/full.html
β€3π₯3πΎ1