Reinforcement learning implementation in AI Toolkit

I always wanted to try to fine-tune models to my own preferences to make them a bit more personalized. LoRA can train a certain character or style - this thing lets you steer model outputs directly without any references at all or even fine-tune an existing LoRA. This is in a way what Midjourney does when it gives you two pictures to vote and then builds your own slightly custom version of their model.

The PR is open here:

https://github.com/ostris/ai-toolkit/pull/808

Default parameters seem quite well tuned for quick results within a few iterations. The only difference in this implementation vs original: rewards are binary instead of relying on a ranking model

There's a new job type dropdown for creating Flow-GRPO tasks, and GRPO job has a voting interface that lets you generate samples and vote on them

Stuff yet to do:

Manual checkpoints

Reduce memory usage (Z-Image takes 40+ GB) and improve speed

UI polishing and bug fixing

Keep testing the algorithm on all models

Thus, I call it a POC. Will be pushing updates to my own branch as we go, but I doubt it will ever be merged into AI-Toolkit itself, so clone and have fun!

https://redd.it/1syhp27
@rStableDiffusion
Z-Anime - Full Anime Fine-Tune on Z-Image Base

https://huggingface.co/SeeSee21/Z-Anime

"Z-Anime is a full fine-tune of Alibaba's Z-Image Base architecture — not a LoRA merge, but a fully trained anime-focused model family built from the ground up.

Built on the S3-DiT (Single-Stream Diffusion Transformer, 6B parameters), Z-Anime inherits the strong foundation of Z-Image Base: rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning — now adapted for anime-style generation."

https://preview.redd.it/uh5sfmh5s3yg1.png?width=1536&format=png&auto=webp&s=8753e6768c1157446fcec7f56edc7c4cd564f868

https://preview.redd.it/cmjb5ih5s3yg1.png?width=1536&format=png&auto=webp&s=34f8f94d4ea17f09a59f040ad95ffa1c5ab8ac29



https://redd.it/1syu74k
@rStableDiffusion