Artificial Intelligence

🔥 EasyControl is a framework (set of tools and methods) designed to add control signals (conditions) to Diffusion Transformer (DiT)-based image generation models.

In essence, this is an attempt to create an analogue of the popular ControlNet (which is mainly used with U-Net architectures) for a new generation of diffusion models built on transformers. Its goal is to make the process of generation control in DiT models as flexible, efficient and easily pluggable.

How does EasyControl work?

EasyControl solves the problems of control signal integration in DiT by using a combination of several key ideas:

▪️ Condition Injection LoRA: Instead of retraining the entire huge DiT model or creating bulky copies of its parts for each new condition (e.g. poses, contours, depth), EasyControl uses LoRA (Low-Rank Adaptation). This is a technique that allows "injecting" additional information (control signal) into an existing model by training only a small number of additional parameters. This makes the process of adding new control types very resource-efficient and allows preserving the original "knowledge" and style of the base DiT model (style lossless).

▪️ Position-Aware Training Paradigm: Transformers (like in DiT) treat an image as a sequence of patches. To ensure that the control signal (e.g. pose map) correctly influences the corresponding patches of the generated image, EasyControl uses a special training approach that helps the model better understand the spatial correspondence between the control signal and the generated content.

▪️ Attention Optimization and Caching (Causal Attention + KV Cache): To improve efficiency at the inference stage, EasyControl applies optimizations specific to transformers. Using Causal Attention and KV Cache (caching keys and values in the attention mechanism) allows to speed up the generation process, especially when working with long sequences of patches and additional condition modules.

🔗 Github
🔗Paper

👍29❤9🔥4👏1

28.4K viewsedited 15:00

Artificial Intelligence

🐬

Dolphin is an improved and expanded version of Whisper, optimized for recognizing a large number of Asian languages and Chinese dialects, outperforming other open models and available for community use.
What is it based on?

Goal : Support a wider range of languages, with a special focus on 40 Oriental languages (East Asia, South Asia, Southeast Asia, Middle East) and 22 Chinese dialects.

How was it trained? A combination of proprietary and open-source datasets were used for training and optimization.

Results: Experiments show that Dolphin significantly outperforms existing best open source models in recognition quality for many languages.

Availability : The developers make the trained models and the source code for using them (inference) publicly available to promote reproducibility and community growth.

🟡

Model :
https://huggingface.co/DataoceanAI/dolphin-base
https://huggingface.co/DataoceanAI/dolphin-small

🟡

Paper :
https://huggingface.co/papers/2503.20212

Please open Telegram to view this post

VIEW IN TELEGRAM

👍19❤8🔥1

30.1K viewsedited 15:00

Artificial Intelligence

Python library for finetuning Gemma 3! 🔥

Includes papers on finetuning, sharding, LoRA, PEFT, multimodality, and tokenization in LLM.

100% open source.

pip install gemma

📌 Documentation

👍52❤19🔥9🥰4

29.8K viewsedited 12:17

Artificial Intelligence

What is torch.nn really?