✨Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k
🔹 Models citing this paper:
• https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k
🔹 Models citing this paper:
• https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
✨ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
📝 Summary:
ChronoEdit ensures physical consistency in image editing by reframing it as a video generation problem. It uses pretrained video models and temporal reasoning tokens to imagine plausible physical transformations between edited images. This approach significantly improves realism and visual fideli...
🔹 Publication Date: Published on Oct 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.04290
• PDF: https://arxiv.org/pdf/2510.04290
• Project Page: https://research.nvidia.com/labs/toronto-ai/chronoedit
• Github: https://github.com/nv-tlabs/ChronoEdit
🔹 Models citing this paper:
• https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers
• https://huggingface.co/vantagewithai/ChronoEdit-GGUF
• https://huggingface.co/vantagewithai/ChronoEdit-fp8-scaled
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nvidia/ChronoEdit
• https://huggingface.co/spaces/JarlJarle/nvidia-ChronoEdit-14B-Diffusers
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #VideoGeneration #TemporalReasoning #ComputerVision #AIResearch
📝 Summary:
ChronoEdit ensures physical consistency in image editing by reframing it as a video generation problem. It uses pretrained video models and temporal reasoning tokens to imagine plausible physical transformations between edited images. This approach significantly improves realism and visual fideli...
🔹 Publication Date: Published on Oct 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.04290
• PDF: https://arxiv.org/pdf/2510.04290
• Project Page: https://research.nvidia.com/labs/toronto-ai/chronoedit
• Github: https://github.com/nv-tlabs/ChronoEdit
🔹 Models citing this paper:
• https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers
• https://huggingface.co/vantagewithai/ChronoEdit-GGUF
• https://huggingface.co/vantagewithai/ChronoEdit-fp8-scaled
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nvidia/ChronoEdit
• https://huggingface.co/spaces/JarlJarle/nvidia-ChronoEdit-14B-Diffusers
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #VideoGeneration #TemporalReasoning #ComputerVision #AIResearch
arXiv.org
ChronoEdit: Towards Temporal Reasoning for Image Editing and World...
Recent advances in large generative models have greatly enhanced both image editing and in-context image generation, yet a critical gap remains in ensuring physical consistency, where edited...
🤖🧠 Pico-Banana-400K: The Breakthrough Dataset Advancing Text-Guided Image Editing
🗓️ 09 Nov 2025
📚 AI News & Trends
Text-guided image editing has rapidly evolved with powerful multimodal models capable of transforming images using simple natural-language instructions. These models can change object colors, modify lighting, add accessories, adjust backgrounds or even convert real photographs into artistic styles. However, the progress of research has been limited by one crucial bottleneck: the lack of large-scale, high-quality, ...
#TextGuidedEditing #MultimodalAI #ImageEditing #AIResearch #ComputerVision #DeepLearning
🗓️ 09 Nov 2025
📚 AI News & Trends
Text-guided image editing has rapidly evolved with powerful multimodal models capable of transforming images using simple natural-language instructions. These models can change object colors, modify lighting, add accessories, adjust backgrounds or even convert real photographs into artistic styles. However, the progress of research has been limited by one crucial bottleneck: the lack of large-scale, high-quality, ...
#TextGuidedEditing #MultimodalAI #ImageEditing #AIResearch #ComputerVision #DeepLearning
✨Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
✨MIRA: Multimodal Iterative Reasoning Agent for Image Editing
📝 Summary:
MIRA is a multimodal iterative reasoning agent that enhances diffusion-based image editing. It tackles complex instructions by breaking them into atomic edits via a perception-reasoning-action loop with visual feedback. This improves semantic consistency and perceptual quality, outperforming othe...
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21087
• PDF: https://arxiv.org/pdf/2511.21087
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ImageEditing #MultimodalAI #DiffusionModels #ComputerVision
📝 Summary:
MIRA is a multimodal iterative reasoning agent that enhances diffusion-based image editing. It tackles complex instructions by breaking them into atomic edits via a perception-reasoning-action loop with visual feedback. This improves semantic consistency and perceptual quality, outperforming othe...
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21087
• PDF: https://arxiv.org/pdf/2511.21087
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ImageEditing #MultimodalAI #DiffusionModels #ComputerVision
✨Test-time scaling of diffusions with flow maps
📝 Summary:
The Flow Map Trajectory Tilting FMTT algorithm enhances test-time diffusion models by using flow maps to align better with user rewards. This approach solves the ill-posed problem of reward gradients, achieving superior reward ascent for improved sampling and novel image editing.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22688
• PDF: https://arxiv.org/pdf/2511.22688
• Project Page: https://flow-map-trajectory-tilting.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #GenerativeAI #ImageEditing #MachineLearning #FlowMaps
📝 Summary:
The Flow Map Trajectory Tilting FMTT algorithm enhances test-time diffusion models by using flow maps to align better with user rewards. This approach solves the ill-posed problem of reward gradients, achieving superior reward ascent for improved sampling and novel image editing.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22688
• PDF: https://arxiv.org/pdf/2511.22688
• Project Page: https://flow-map-trajectory-tilting.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DiffusionModels #GenerativeAI #ImageEditing #MachineLearning #FlowMaps
❤1
✨REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
📝 Summary:
REASONEDIT integrates MLLM reasoning thinking and reflection into image editing models. This enables a thinking-editing-reflection loop, improving instruction understanding and editing accuracy by interpreting abstract instructions and correcting results. The approach achieves significant perform...
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22625
• PDF: https://arxiv.org/pdf/2511.22625
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #AIReasoning #MLLM #ComputerVision #AI
📝 Summary:
REASONEDIT integrates MLLM reasoning thinking and reflection into image editing models. This enables a thinking-editing-reflection loop, improving instruction understanding and editing accuracy by interpreting abstract instructions and correcting results. The approach achieves significant perform...
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22625
• PDF: https://arxiv.org/pdf/2511.22625
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #AIReasoning #MLLM #ComputerVision #AI
✨The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
arXiv.org
The Consistency Critic: Correcting Inconsistencies in Generated...
Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is...
✨WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
📝 Summary:
WiseEdit is a new benchmark for evaluating image editing models, focusing on cognition and creativity. It decomposes editing into Awareness, Interpretation, and Imagination tasks, assessing declarative, procedural, and metacognitive knowledge. This reveals limitations in current models.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00387
• PDF: https://arxiv.org/pdf/2512.00387
• Project Page: https://qnancy.github.io/wiseedit_project_page/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #ComputerVision #AIResearch #CognitiveAI #CreativeAI
📝 Summary:
WiseEdit is a new benchmark for evaluating image editing models, focusing on cognition and creativity. It decomposes editing into Awareness, Interpretation, and Imagination tasks, assessing declarative, procedural, and metacognitive knowledge. This reveals limitations in current models.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00387
• PDF: https://arxiv.org/pdf/2512.00387
• Project Page: https://qnancy.github.io/wiseedit_project_page/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #ComputerVision #AIResearch #CognitiveAI #CreativeAI
❤1
✨UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits
📝 Summary:
This paper tackles image editing model performance gaps due to data scarcity by introducing UnicEdit-10M, a 10M-scale high-quality dataset from a lightweight verified pipeline. It also proposes UnicBench, a new benchmark with novel metrics to diagnose reasoning limitations in models.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02790
• PDF: https://arxiv.org/pdf/2512.02790
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #AI #Dataset #Benchmark #ComputerVision
📝 Summary:
This paper tackles image editing model performance gaps due to data scarcity by introducing UnicEdit-10M, a 10M-scale high-quality dataset from a lightweight verified pipeline. It also proposes UnicBench, a new benchmark with novel metrics to diagnose reasoning limitations in models.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02790
• PDF: https://arxiv.org/pdf/2512.02790
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #AI #Dataset #Benchmark #ComputerVision
✨Step1X-Edit: A Practical Framework for General Image Editing
📝 Summary:
Step1X-Edit is a new image editing model combining multimodal LLM with a diffusion decoder. It significantly outperforms open-source models and approaches the quality of proprietary models like GPT-4o. This bridges the gap in general image editing capabilities.
🔹 Publication Date: Published on Apr 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.17761
• PDF: https://arxiv.org/pdf/2504.17761
• Github: https://github.com/stepfun-ai/Step1X-Edit
🔹 Models citing this paper:
• https://huggingface.co/stepfun-ai/Step1X-Edit
• https://huggingface.co/stepfun-ai/Step1X-Edit-v1p2
• https://huggingface.co/stepfun-ai/Step1X-Edit-v1p2-preview
✨ Datasets citing this paper:
• https://huggingface.co/datasets/stepfun-ai/GEdit-Bench
✨ Spaces citing this paper:
• https://huggingface.co/spaces/johnnyclem/stepfun-ai-Step1X-Edit
• https://huggingface.co/spaces/Osuii/stepfun-ai-Step1X-Edit
• https://huggingface.co/spaces/Paus/stepfun-ai-Step1X-Edit
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #AI #LLM #DiffusionModels #ComputerVision
📝 Summary:
Step1X-Edit is a new image editing model combining multimodal LLM with a diffusion decoder. It significantly outperforms open-source models and approaches the quality of proprietary models like GPT-4o. This bridges the gap in general image editing capabilities.
🔹 Publication Date: Published on Apr 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.17761
• PDF: https://arxiv.org/pdf/2504.17761
• Github: https://github.com/stepfun-ai/Step1X-Edit
🔹 Models citing this paper:
• https://huggingface.co/stepfun-ai/Step1X-Edit
• https://huggingface.co/stepfun-ai/Step1X-Edit-v1p2
• https://huggingface.co/stepfun-ai/Step1X-Edit-v1p2-preview
✨ Datasets citing this paper:
• https://huggingface.co/datasets/stepfun-ai/GEdit-Bench
✨ Spaces citing this paper:
• https://huggingface.co/spaces/johnnyclem/stepfun-ai-Step1X-Edit
• https://huggingface.co/spaces/Osuii/stepfun-ai-Step1X-Edit
• https://huggingface.co/spaces/Paus/stepfun-ai-Step1X-Edit
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #AI #LLM #DiffusionModels #ComputerVision
arXiv.org
Step1X-Edit: A Practical Framework for General Image Editing
In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly...
❤3
✨EditThinker: Unlocking Iterative Reasoning for Any Image Editor
📝 Summary:
EditThinker proposes a deliberative framework for image editing, simulating human iterative critique and refinement of instructions. It uses an MLLM as a reasoning engine to enhance instruction-following capability. This significantly improves the performance of any image editor.
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05965
• PDF: https://arxiv.org/pdf/2512.05965
• Project Page: https://appletea233.github.io/think-while-edit/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #MLLM #AI #Reasoning #ComputerVision
📝 Summary:
EditThinker proposes a deliberative framework for image editing, simulating human iterative critique and refinement of instructions. It uses an MLLM as a reasoning engine to enhance instruction-following capability. This significantly improves the performance of any image editor.
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05965
• PDF: https://arxiv.org/pdf/2512.05965
• Project Page: https://appletea233.github.io/think-while-edit/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #MLLM #AI #Reasoning #ComputerVision
✨SpotEdit: Selective Region Editing in Diffusion Transformers
📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning
📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning