✨WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation
📝 Summary:
WEAVE introduces a suite with a large dataset and benchmark to assess multi-turn context-dependent image generation and editing in multimodal models. It enables new capabilities like visual memory in models while exposing current limitations in these complex tasks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11434
• PDF: https://arxiv.org/pdf/2511.11434
• Project Page: https://weichow23.github.io/weave/
• Github: https://github.com/weichow23/weave
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #ImageGeneration #GenerativeAI #ComputerVision #AIResearch
📝 Summary:
WEAVE introduces a suite with a large dataset and benchmark to assess multi-turn context-dependent image generation and editing in multimodal models. It enables new capabilities like visual memory in models while exposing current limitations in these complex tasks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11434
• PDF: https://arxiv.org/pdf/2511.11434
• Project Page: https://weichow23.github.io/weave/
• Github: https://github.com/weichow23/weave
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #ImageGeneration #GenerativeAI #ComputerVision #AIResearch
✨A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.
🔹 Publication Date: Published on Nov 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Kwai-Kolors/CoTyle
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.
🔹 Publication Date: Published on Nov 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Kwai-Kolors/CoTyle
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
✨Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
✨DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
✨Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
✨iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels
📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels
✨OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation
📝 Summary:
OmniAlpha is the first unified multi-task generative framework for RGBA image generation and editing. It uses a Diffusion Transformer with a novel MSRoPE-BiL method and a new AlphaLayers dataset. OmniAlpha consistently outperforms specialized models across 21 tasks, achieving superior results in ...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20211
• PDF: https://arxiv.org/pdf/2511.20211
• Github: https://github.com/Longin-Yu/OmniAlpha
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #DiffusionModels #ImageGeneration #ComputerVision #DeepLearning
📝 Summary:
OmniAlpha is the first unified multi-task generative framework for RGBA image generation and editing. It uses a Diffusion Transformer with a novel MSRoPE-BiL method and a new AlphaLayers dataset. OmniAlpha consistently outperforms specialized models across 21 tasks, achieving superior results in ...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20211
• PDF: https://arxiv.org/pdf/2511.20211
• Github: https://github.com/Longin-Yu/OmniAlpha
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #DiffusionModels #ImageGeneration #ComputerVision #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨Canvas-to-Image: Compositional Image Generation with Multimodal Controls
📝 Summary:
Canvas-to-Image unifies diverse controls like text, poses, and layouts into a single canvas image for high-fidelity compositional image generation. Its multi-task training helps it understand and integrate these controls, outperforming existing methods in adherence and identity.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21691
• PDF: https://arxiv.org/pdf/2511.21691
• Project Page: https://snap-research.github.io/canvas-to-image/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #GenerativeAI #MultimodalAI #ComputerVision #DeepLearning
📝 Summary:
Canvas-to-Image unifies diverse controls like text, poses, and layouts into a single canvas image for high-fidelity compositional image generation. Its multi-task training helps it understand and integrate these controls, outperforming existing methods in adherence and identity.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21691
• PDF: https://arxiv.org/pdf/2511.21691
• Project Page: https://snap-research.github.io/canvas-to-image/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #GenerativeAI #MultimodalAI #ComputerVision #DeepLearning
✨Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning
📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning
❤1
✨DiP: Taming Diffusion Models in Pixel Space
📝 Summary:
DiP is an efficient pixel space diffusion framework addressing the quality-efficiency trade-off without VAEs. It combines a Diffusion Transformer for global structure and a Patch Detailer Head for local details, achieving high-quality images up to 10x faster.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18822
• PDF: https://arxiv.org/pdf/2511.18822
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #GenerativeAI #ImageGeneration #DeepLearning #ComputerVision
📝 Summary:
DiP is an efficient pixel space diffusion framework addressing the quality-efficiency trade-off without VAEs. It combines a Diffusion Transformer for global structure and a Patch Detailer Head for local details, achieving high-quality images up to 10x faster.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18822
• PDF: https://arxiv.org/pdf/2511.18822
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #GenerativeAI #ImageGeneration #DeepLearning #ComputerVision
✨OmniRefiner: Reinforcement-Guided Local Diffusion Refinement
📝 Summary:
OmniRefiner enhances reference-guided image generation by overcoming fine detail loss. It uses a two-stage framework: a fine-tuned diffusion editor for global coherence, then reinforcement learning for localized detail accuracy. This significantly improves detail preservation and consistency.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19990
• PDF: https://arxiv.org/pdf/2511.19990
• Github: https://github.com/yaoliliu/OmniRefiner
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ImageGeneration #ReinforcementLearning #GenerativeAI #ComputerVision
📝 Summary:
OmniRefiner enhances reference-guided image generation by overcoming fine detail loss. It uses a two-stage framework: a fine-tuned diffusion editor for global coherence, then reinforcement learning for localized detail accuracy. This significantly improves detail preservation and consistency.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19990
• PDF: https://arxiv.org/pdf/2511.19990
• Github: https://github.com/yaoliliu/OmniRefiner
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ImageGeneration #ReinforcementLearning #GenerativeAI #ComputerVision
👍1
✨The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
arXiv.org
The Consistency Critic: Correcting Inconsistencies in Generated...
Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is...
✨Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
📝 Summary:
Flash-DMD accelerates generative diffusion models via efficient timestep-aware distillation and joint reinforcement learning. This framework achieves faster convergence, high-fidelity few-step generation, and stabilizes RL training using distillation as a regularizer, all with reduced computation...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20549
• PDF: https://arxiv.org/pdf/2511.20549
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ImageGeneration #ReinforcementLearning #ModelDistillation #GenerativeAI
📝 Summary:
Flash-DMD accelerates generative diffusion models via efficient timestep-aware distillation and joint reinforcement learning. This framework achieves faster convergence, high-fidelity few-step generation, and stabilizes RL training using distillation as a regularizer, all with reduced computation...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20549
• PDF: https://arxiv.org/pdf/2511.20549
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ImageGeneration #ReinforcementLearning #ModelDistillation #GenerativeAI
👍1
✨CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation
📝 Summary:
CookAnything is a diffusion framework generating coherent, multi-step recipe image sequences from instructions. It uses step-wise regional control, flexible positional encoding, and cross-step consistency for consistent, high-quality visual synthesis.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03540
• PDF: https://arxiv.org/pdf/2512.03540
• Github: https://github.com/zhangdaxia22/CookAnything
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#CookAnything #ImageGeneration #DiffusionModels #AI #RecipeGeneration
📝 Summary:
CookAnything is a diffusion framework generating coherent, multi-step recipe image sequences from instructions. It uses step-wise regional control, flexible positional encoding, and cross-step consistency for consistent, high-quality visual synthesis.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03540
• PDF: https://arxiv.org/pdf/2512.03540
• Github: https://github.com/zhangdaxia22/CookAnything
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#CookAnything #ImageGeneration #DiffusionModels #AI #RecipeGeneration
✨Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
📝 Summary:
Echo-4o-Image is a 180K synthetic dataset from GPT-4o. It enhances image generation by covering rare scenarios and providing clean text to image supervision. This improves model performance and transferability across various foundation models.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09987
• PDF: https://arxiv.org/pdf/2508.09987
• Project Page: https://yejy53.github.io/Echo-4o/
• Github: https://yejy53.github.io/Echo-4o
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Yejy53/Echo-4o-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #GPT4o #SyntheticData #AIResearch #FoundationModels
📝 Summary:
Echo-4o-Image is a 180K synthetic dataset from GPT-4o. It enhances image generation by covering rare scenarios and providing clean text to image supervision. This improves model performance and transferability across various foundation models.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09987
• PDF: https://arxiv.org/pdf/2508.09987
• Project Page: https://yejy53.github.io/Echo-4o/
• Github: https://yejy53.github.io/Echo-4o
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Yejy53/Echo-4o-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #GPT4o #SyntheticData #AIResearch #FoundationModels
✨Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
📝 Summary:
Semantic-First Diffusion SFD asynchronously denoises semantic and texture latents for image generation. This method prioritizes semantic formation, providing clearer guidance for texture refinement. SFD significantly improves convergence speed by up to 100x and enhances image quality.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04926
• PDF: https://arxiv.org/pdf/2512.04926
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ImageGeneration #SemanticAI #GenerativeAI #DeepLearning
📝 Summary:
Semantic-First Diffusion SFD asynchronously denoises semantic and texture latents for image generation. This method prioritizes semantic formation, providing clearer guidance for texture refinement. SFD significantly improves convergence speed by up to 100x and enhances image quality.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04926
• PDF: https://arxiv.org/pdf/2512.04926
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #ImageGeneration #SemanticAI #GenerativeAI #DeepLearning
✨UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
📝 Summary:
UltraImage tackles content repetition and quality degradation in high-resolution image generation by correcting dominant frequency periodicity and applying entropy-guided attention. It achieves extreme extrapolation, producing high-fidelity images up to 6Kx6K without low-resolution guidance.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04504
• PDF: https://arxiv.org/pdf/2512.04504
• Project Page: https://thu-ml.github.io/ultraimage.github.io/
• Github: https://thu-ml.github.io/ultraimage.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #Transformers #HighResolution #DeepLearning
📝 Summary:
UltraImage tackles content repetition and quality degradation in high-resolution image generation by correcting dominant frequency periodicity and applying entropy-guided attention. It achieves extreme extrapolation, producing high-fidelity images up to 6Kx6K without low-resolution guidance.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04504
• PDF: https://arxiv.org/pdf/2512.04504
• Project Page: https://thu-ml.github.io/ultraimage.github.io/
• Github: https://thu-ml.github.io/ultraimage.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #Transformers #HighResolution #DeepLearning
✨PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
📝 Summary:
PaCo-RL is a reinforcement learning framework for consistent image generation. It introduces PaCo-Reward for human-aligned consistency evaluation and PaCo-GRPO for efficient RL optimization. The framework achieves state-of-the-art consistency with improved training efficiency.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04784
• PDF: https://arxiv.org/pdf/2512.04784
• Project Page: https://x-gengroup.github.io/HomePage_PaCo-RL/
• Github: https://x-gengroup.github.io/HomePage_PaCo-RL
🔹 Models citing this paper:
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora
• https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #ImageGeneration #AI #DeepLearning #GenerativeAI
📝 Summary:
PaCo-RL is a reinforcement learning framework for consistent image generation. It introduces PaCo-Reward for human-aligned consistency evaluation and PaCo-GRPO for efficient RL optimization. The framework achieves state-of-the-art consistency with improved training efficiency.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04784
• PDF: https://arxiv.org/pdf/2512.04784
• Project Page: https://x-gengroup.github.io/HomePage_PaCo-RL/
• Github: https://x-gengroup.github.io/HomePage_PaCo-RL
🔹 Models citing this paper:
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora
• https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #ImageGeneration #AI #DeepLearning #GenerativeAI
arXiv.org
PaCo-RL: Advancing Reinforcement Learning for Consistent Image...
Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character...
This media is not supported in your browser
VIEW IN TELEGRAM
✨Vibe Spaces for Creatively Connecting and Expressing Visual Concepts
📝 Summary:
Vibe Blending uses Vibe Space, a hierarchical graph manifold, to create coherent and creative image hybrids. It learns geodesics in feature spaces, outperforming current methods in creativity and coherence as rated by humans.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14884
• PDF: https://arxiv.org/pdf/2512.14884
• Project Page: https://huzeyann.github.io/VibeSpace-webpage/
• Github: https://github.com/huzeyann/VibeSpace
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #ComputerVision #AI #MachineLearning #CreativeAI
📝 Summary:
Vibe Blending uses Vibe Space, a hierarchical graph manifold, to create coherent and creative image hybrids. It learns geodesics in feature spaces, outperforming current methods in creativity and coherence as rated by humans.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14884
• PDF: https://arxiv.org/pdf/2512.14884
• Project Page: https://huzeyann.github.io/VibeSpace-webpage/
• Github: https://github.com/huzeyann/VibeSpace
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #ComputerVision #AI #MachineLearning #CreativeAI
❤1
✨Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing
📝 Summary:
This paper proposes a framework using a semantic-pixel reconstruction objective to adapt encoder features for generation. It creates a compact, semantically rich latent space, leading to state-of-the-art image reconstruction and improved text-to-image generation and editing.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17909
• PDF: https://arxiv.org/pdf/2512.17909
• Project Page: https://jshilong.github.io/PS-VAE-PAGE/
• Github: https://jshilong.github.io/PS-VAE-PAGE/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextToImage #ImageGeneration #DeepLearning #ComputerVision #AIResearch
📝 Summary:
This paper proposes a framework using a semantic-pixel reconstruction objective to adapt encoder features for generation. It creates a compact, semantically rich latent space, leading to state-of-the-art image reconstruction and improved text-to-image generation and editing.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17909
• PDF: https://arxiv.org/pdf/2512.17909
• Project Page: https://jshilong.github.io/PS-VAE-PAGE/
• Github: https://jshilong.github.io/PS-VAE-PAGE/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextToImage #ImageGeneration #DeepLearning #ComputerVision #AIResearch
❤1