Reinforcement learning implementation in AI Toolkit
I always wanted to try to fine-tune models to my own preferences to make them a bit more personalized. LoRA can train a certain character or style - this thing lets you steer model outputs directly without any references at all or even fine-tune an existing LoRA. This is in a way what Midjourney does when it gives you two pictures to vote and then builds your own slightly custom version of their model.
The PR is open here:
https://github.com/ostris/ai-toolkit/pull/808
Default parameters seem quite well tuned for quick results within a few iterations. The only difference in this implementation vs original: rewards are binary instead of relying on a ranking model
There's a new job type dropdown for creating Flow-GRPO tasks, and GRPO job has a voting interface that lets you generate samples and vote on them
Stuff yet to do:
Manual checkpoints
Reduce memory usage (Z-Image takes 40+ GB) and improve speed
UI polishing and bug fixing
Keep testing the algorithm on all models
Thus, I call it a POC. Will be pushing updates to my own branch as we go, but I doubt it will ever be merged into AI-Toolkit itself, so clone and have fun!
https://redd.it/1syhp27
@rStableDiffusion
I always wanted to try to fine-tune models to my own preferences to make them a bit more personalized. LoRA can train a certain character or style - this thing lets you steer model outputs directly without any references at all or even fine-tune an existing LoRA. This is in a way what Midjourney does when it gives you two pictures to vote and then builds your own slightly custom version of their model.
The PR is open here:
https://github.com/ostris/ai-toolkit/pull/808
Default parameters seem quite well tuned for quick results within a few iterations. The only difference in this implementation vs original: rewards are binary instead of relying on a ranking model
There's a new job type dropdown for creating Flow-GRPO tasks, and GRPO job has a voting interface that lets you generate samples and vote on them
Stuff yet to do:
Manual checkpoints
Reduce memory usage (Z-Image takes 40+ GB) and improve speed
UI polishing and bug fixing
Keep testing the algorithm on all models
Thus, I call it a POC. Will be pushing updates to my own branch as we go, but I doubt it will ever be merged into AI-Toolkit itself, so clone and have fun!
https://redd.it/1syhp27
@rStableDiffusion
GitHub
RLHF Flow-GRPO implementation POC by ifilipis · Pull Request #808 · ostris/ai-toolkit
Added reinforcement learning (Flow-GRPO) that seems to work quite universally across models.
It implements Flow-GRPO and lets you vote live and thus create a LoRA
Default parameters seem quite well...
It implements Flow-GRPO and lets you vote live and thus create a LoRA
Default parameters seem quite well...
Z-Anime - Full Anime Fine-Tune on Z-Image Base
https://huggingface.co/SeeSee21/Z-Anime
"Z-Anime is a full fine-tune of Alibaba's Z-Image Base architecture — not a LoRA merge, but a fully trained anime-focused model family built from the ground up.
Built on the S3-DiT (Single-Stream Diffusion Transformer, 6B parameters), Z-Anime inherits the strong foundation of Z-Image Base: rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning — now adapted for anime-style generation."
https://preview.redd.it/uh5sfmh5s3yg1.png?width=1536&format=png&auto=webp&s=8753e6768c1157446fcec7f56edc7c4cd564f868
https://preview.redd.it/cmjb5ih5s3yg1.png?width=1536&format=png&auto=webp&s=34f8f94d4ea17f09a59f040ad95ffa1c5ab8ac29
https://redd.it/1syu74k
@rStableDiffusion
https://huggingface.co/SeeSee21/Z-Anime
"Z-Anime is a full fine-tune of Alibaba's Z-Image Base architecture — not a LoRA merge, but a fully trained anime-focused model family built from the ground up.
Built on the S3-DiT (Single-Stream Diffusion Transformer, 6B parameters), Z-Anime inherits the strong foundation of Z-Image Base: rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning — now adapted for anime-style generation."
https://preview.redd.it/uh5sfmh5s3yg1.png?width=1536&format=png&auto=webp&s=8753e6768c1157446fcec7f56edc7c4cd564f868
https://preview.redd.it/cmjb5ih5s3yg1.png?width=1536&format=png&auto=webp&s=34f8f94d4ea17f09a59f040ad95ffa1c5ab8ac29
https://redd.it/1syu74k
@rStableDiffusion