The mysterious science of LoRA training (sdxl)

I find myself still unable to train good looking character loras for illustrious, and I don't know what I'm doing wrong. I'm using a 3D character for this purpose (blender model) and I've tried replicating training settings from other people's lora that I consider great, but I still have questions.

1. Can you train actually train a 3D character on illustrious or is it fighting the model too much? (considering it seems much better at handling 2D visuals)
2. I've noticed most great LoRAs out there are using hundreds of image in their dataset, usually 200 to 400. My dataset is more on the side of 50, is there an actual benefit to such large datasets?
3. Repeats. Sounds like 10 epochs of 10 repeats would be equivalent to a 100 epochs of 1 repeat, but is that truly the case? I always struggle to figure out how many repeats I should be using.
4. TE. I noticed some people do not train the text encoder at all, anyone has feedback on the benefits of doing this?
5. Batch size. I want to use 6 or 8 batch size, because I can. But I'm not sure how I need to dial the other settings based on that, in particular with learning rate and repeats.
6. Removing backgrounds. Beside the fact that is makes captionning easier, is there an actual benefit, have you noticed it yielded better results?

I have noticed the following issues with my attempt at training, perhaps this will help someone point me in the right direction on what I'm doing wrong here:

* Style locking in too much. For example I like prompting with "dark, dim lighting" keywords which works well with illustrious, but my loras will make the result much brighter than the base model (even when tagging the dataset with "day"). Dataset has a couple night shots but they are mostly bright daylight.
* Faces train fast and seem to overtrain before clothes, making it impossible to find a good balance. Either one is overtrained or the other is undertrained. (I do have less full body shot than upper body and portrait, but this is apparently a desired ratio?)
* I have settled down on a LR of 2e-4 but have tried higher and lower with no success.

If you take the time to give to answer some of that, thank you =)

https://redd.it/1sjhf1d
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Free open-source tool to instantly rig and animate your illustrations (also with mesh deform)

https://redd.it/1sjj7ta
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Me whenever people on the PC building subreddits ask me why I need >32GB of system RAM.
https://redd.it/1sjsvjk
@rStableDiffusion
Dataset source for AceStep team! XD
https://redd.it/1sjzhva
@rStableDiffusion
Used LTX 2.3 anchor frame injection to maintain brand consistency across AI video — before/after

Working on a brand campaign where consistency was everything — same can, same character, same lighting across all assets including video.

The main technique I used was anchor frame injection through using LTXV guides over inplace. Three reference frames injected at key points in the timeline:

a starting frame to lock the logo specifically,
a mid-point "consistency anchor" at frame 138 to bridge the gap, the guide is set low and the anchor image is designed with high almost flat contrast in key areas

and a hard end frame at reference strength 0.7 to leave enough room for natural movement.

Combined with canny edges, depth map, and pose estimation as control references.
The before GIF is the raw output. The after is the rerender with the anchor method applied.

The environment cleaned up significantly. One thing LTX over-interpreted was the walk — it added a fluidity that felt more runway than competitive player. Tighter pose constraints next pass.

Full case study in comments.


https://i.redd.it/fj2pl5covwug1.gif

https://i.redd.it/p0ubkd5pvwug1.gif



https://redd.it/1sk4051
@rStableDiffusion
LTX2.3 (Distilled) - Updated sigmas for better results (?)

Hey y'all,

Was playing around with the LTX2.3 distilled sigmas for the first Ksampler and tried to tweak them for a bit of fun, and I think I've stumbled upon updated sigmas that give me better quality, detail and prompt adherence.

I've been using LTX2.3 since it came out and never really questioned the "official" sigmas that come with the original workflow, but today, for fun, I tried to tweak them a bit and I'm really liking the results I am getting.

This is all T2V and I have not tried with I2V, so not sure how it would affect the results there.

The original Sigmas for the first Ksampler (8 steps) are: 1.0, 0.99375, 0.9875, 0.98125, 0.975, 0.909375, 0.725, 0.421875, 0.0

After a bit of testing, I've settled on these new sigmas: 1.0, 0.995, 0.99, 0.9875, 0.975, 0.65, 0.28, 0.07, 0.0

I have made some comparisons that showcase the difference between both old and new sigmas, and I am really liking how things turn out with the new ones.

All results are 1280 x 704 x 24FPS, 5 seconds, Euler A sampler (16GB of VRAM so excuse the lower quality, also Reddit compression hurts a lot).

Left is with old sigmas, right is with new sigmas.

Sounds is from the video with the new sigmas.

https://reddit.com/link/1sk8vhq/video/7gsjvdn15yug1/player

a muscular man with rolled-up sleeves and a leather apron leans over a metal workbench in a dimly lit industrial workshop, he presses an angle grinder against a large piece of steel, a cascade of bright orange and white sparks erupts and scatters across the floor, his forearms flex with the effort, face partially lit by the sparks and harsh overhead workshop lamp, sawdust and metal shavings on the floor, dark gritty background with shelving and hanging tools slightly out of focus, cinematic, shallow depth of field, photorealistic

Streamable link: *https://streamable.com/rwt3vl*

https://reddit.com/link/1sk8vhq/video/yn1qv1g55yug1/player

a heavily muscular man with short cropped hair and scarred knuckles wraps his hands in a dimly lit boxing gym, then steps up to a heavy bag and throws a hard combination of punches, the bag swings violently, sweat flying off his arms with each impact, harsh overhead fluorescent light, cinematic, photorealistic

Streamable link: *https://streamable.com/36b5nx*

https://reddit.com/link/1sk8vhq/video/a4ougyv17yug1/player

In a dark theater room, a ballerina wearing a typical ballerina outfit is dancing, moving gracefully on the stage. A spotlight is focused on her.

Streamable link: *https://streamable.com/jwey0a*

https://reddit.com/link/1sk8vhq/video/p8ip8l5d5yug1/player

a tall dark-haired muscular man in a fitted black shirt behind a moody speakeasy bar grabs a shaker, tosses it spinning in the air, catches it smoothly and slams it on the bar, then leans forward on both hands looking directly into camera, neon backlit bottles, dark atmospheric lighting, cinematic, photorealistic

Streamable link: *https://streamable.com/qhycpa*

https://reddit.com/link/1sk8vhq/video/belte2og5yug1/player

A beautiful woman with long blonde hair, wearing a long white dress flowing in the wind is walking by a cliff, looking ethereal, looking in the distance. The sound of waves crashing down below can be heard. She is barefoot, walking through tall grass. The sun is casting beautiful lights and shadows on the scene.

Streamable link: *https://streamable.com/hz2fu5*

These are just some short examples which weren't cherry picked.

Not sure what this is worth, but thought I would share.

https://redd.it/1sk8vhq
@rStableDiffusion