π₯ EasyControl is a framework (set of tools and methods) designed to add control signals (conditions) to Diffusion Transformer (DiT)-based image generation models.
In essence, this is an attempt to create an analogue of the popular ControlNet (which is mainly used with U-Net architectures) for a new generation of diffusion models built on transformers. Its goal is to make the process of generation control in DiT models as flexible, efficient and easily pluggable.
How does EasyControl work?
EasyControl solves the problems of control signal integration in DiT by using a combination of several key ideas:
βͺοΈ Condition Injection LoRA: Instead of retraining the entire huge DiT model or creating bulky copies of its parts for each new condition (e.g. poses, contours, depth), EasyControl uses LoRA (Low-Rank Adaptation). This is a technique that allows "injecting" additional information (control signal) into an existing model by training only a small number of additional parameters. This makes the process of adding new control types very resource-efficient and allows preserving the original "knowledge" and style of the base DiT model (style lossless).
βͺοΈ Position-Aware Training Paradigm: Transformers (like in DiT) treat an image as a sequence of patches. To ensure that the control signal (e.g. pose map) correctly influences the corresponding patches of the generated image, EasyControl uses a special training approach that helps the model better understand the spatial correspondence between the control signal and the generated content.
βͺοΈ Attention Optimization and Caching (Causal Attention + KV Cache): To improve efficiency at the inference stage, EasyControl applies optimizations specific to transformers. Using Causal Attention and KV Cache (caching keys and values ββin the attention mechanism) allows to speed up the generation process, especially when working with long sequences of patches and additional condition modules.
π Github
πPaper
In essence, this is an attempt to create an analogue of the popular ControlNet (which is mainly used with U-Net architectures) for a new generation of diffusion models built on transformers. Its goal is to make the process of generation control in DiT models as flexible, efficient and easily pluggable.
How does EasyControl work?
EasyControl solves the problems of control signal integration in DiT by using a combination of several key ideas:
βͺοΈ Condition Injection LoRA: Instead of retraining the entire huge DiT model or creating bulky copies of its parts for each new condition (e.g. poses, contours, depth), EasyControl uses LoRA (Low-Rank Adaptation). This is a technique that allows "injecting" additional information (control signal) into an existing model by training only a small number of additional parameters. This makes the process of adding new control types very resource-efficient and allows preserving the original "knowledge" and style of the base DiT model (style lossless).
βͺοΈ Position-Aware Training Paradigm: Transformers (like in DiT) treat an image as a sequence of patches. To ensure that the control signal (e.g. pose map) correctly influences the corresponding patches of the generated image, EasyControl uses a special training approach that helps the model better understand the spatial correspondence between the control signal and the generated content.
βͺοΈ Attention Optimization and Caching (Causal Attention + KV Cache): To improve efficiency at the inference stage, EasyControl applies optimizations specific to transformers. Using Causal Attention and KV Cache (caching keys and values ββin the attention mechanism) allows to speed up the generation process, especially when working with long sequences of patches and additional condition modules.
π Github
πPaper
π29β€9π₯4π1
What is it based on?
Goal : Support a wider range of languages, with a special focus on 40 Oriental languages ββ(East Asia, South Asia, Southeast Asia, Middle East) and 22 Chinese dialects.
How was it trained? A combination of proprietary and open-source datasets were used for training and optimization.
Results: Experiments show that Dolphin significantly outperforms existing best open source models in recognition quality for many languages.
Availability : The developers make the trained models and the source code for using them (inference) publicly available to promote reproducibility and community growth.
https://huggingface.co/DataoceanAI/dolphin-base
https://huggingface.co/DataoceanAI/dolphin-small
https://huggingface.co/papers/2503.20212
Please open Telegram to view this post
VIEW IN TELEGRAM
π19β€8π₯1
Python library for finetuning Gemma 3! π₯
Includes papers on finetuning, sharding, LoRA, PEFT, multimodality, and tokenization in LLM.
100% open source.
π Documentation
Includes papers on finetuning, sharding, LoRA, PEFT, multimodality, and tokenization in LLM.
100% open source.
pip install gemmaπ Documentation
π52β€19π₯9π₯°4
This media is not supported in your browser
VIEW IN TELEGRAM
Have you ever seen a Drone working under a waterway
π±35β€18π9π2π1
Please open Telegram to view this post
VIEW IN TELEGRAM
π26β€12π2
Build an LLM app with Mixture of AI Agents using small Open Source LLMs that can beat GPT-4o in just 40 lines of Python Code
Please open Telegram to view this post
VIEW IN TELEGRAM
β€18π4π₯4