The mysterious science of LoRA training (sdxl)
I find myself still unable to train good looking character loras for illustrious, and I don't know what I'm doing wrong. I'm using a 3D character for this purpose (blender model) and I've tried replicating training settings from other people's lora that I consider great, but I still have questions.
1. Can you train actually train a 3D character on illustrious or is it fighting the model too much? (considering it seems much better at handling 2D visuals)
2. I've noticed most great LoRAs out there are using hundreds of image in their dataset, usually 200 to 400. My dataset is more on the side of 50, is there an actual benefit to such large datasets?
3. Repeats. Sounds like 10 epochs of 10 repeats would be equivalent to a 100 epochs of 1 repeat, but is that truly the case? I always struggle to figure out how many repeats I should be using.
4. TE. I noticed some people do not train the text encoder at all, anyone has feedback on the benefits of doing this?
5. Batch size. I want to use 6 or 8 batch size, because I can. But I'm not sure how I need to dial the other settings based on that, in particular with learning rate and repeats.
6. Removing backgrounds. Beside the fact that is makes captionning easier, is there an actual benefit, have you noticed it yielded better results?
I have noticed the following issues with my attempt at training, perhaps this will help someone point me in the right direction on what I'm doing wrong here:
* Style locking in too much. For example I like prompting with "dark, dim lighting" keywords which works well with illustrious, but my loras will make the result much brighter than the base model (even when tagging the dataset with "day"). Dataset has a couple night shots but they are mostly bright daylight.
* Faces train fast and seem to overtrain before clothes, making it impossible to find a good balance. Either one is overtrained or the other is undertrained. (I do have less full body shot than upper body and portrait, but this is apparently a desired ratio?)
* I have settled down on a LR of 2e-4 but have tried higher and lower with no success.
If you take the time to give to answer some of that, thank you =)
https://redd.it/1sjhf1d
@rStableDiffusion
I find myself still unable to train good looking character loras for illustrious, and I don't know what I'm doing wrong. I'm using a 3D character for this purpose (blender model) and I've tried replicating training settings from other people's lora that I consider great, but I still have questions.
1. Can you train actually train a 3D character on illustrious or is it fighting the model too much? (considering it seems much better at handling 2D visuals)
2. I've noticed most great LoRAs out there are using hundreds of image in their dataset, usually 200 to 400. My dataset is more on the side of 50, is there an actual benefit to such large datasets?
3. Repeats. Sounds like 10 epochs of 10 repeats would be equivalent to a 100 epochs of 1 repeat, but is that truly the case? I always struggle to figure out how many repeats I should be using.
4. TE. I noticed some people do not train the text encoder at all, anyone has feedback on the benefits of doing this?
5. Batch size. I want to use 6 or 8 batch size, because I can. But I'm not sure how I need to dial the other settings based on that, in particular with learning rate and repeats.
6. Removing backgrounds. Beside the fact that is makes captionning easier, is there an actual benefit, have you noticed it yielded better results?
I have noticed the following issues with my attempt at training, perhaps this will help someone point me in the right direction on what I'm doing wrong here:
* Style locking in too much. For example I like prompting with "dark, dim lighting" keywords which works well with illustrious, but my loras will make the result much brighter than the base model (even when tagging the dataset with "day"). Dataset has a couple night shots but they are mostly bright daylight.
* Faces train fast and seem to overtrain before clothes, making it impossible to find a good balance. Either one is overtrained or the other is undertrained. (I do have less full body shot than upper body and portrait, but this is apparently a desired ratio?)
* I have settled down on a LR of 2e-4 but have tried higher and lower with no success.
If you take the time to give to answer some of that, thank you =)
https://redd.it/1sjhf1d
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Free open-source tool to instantly rig and animate your illustrations (also with mesh deform)
https://redd.it/1sjj7ta
@rStableDiffusion
https://redd.it/1sjj7ta
@rStableDiffusion
Greg Rutkowski Anima Lora from Circlestone Labs (Anima makers) with training params
https://civitai.com/models/2536147/greg-rutkowski-style-anima
https://redd.it/1sjk7dc
@rStableDiffusion
https://civitai.com/models/2536147/greg-rutkowski-style-anima
https://redd.it/1sjk7dc
@rStableDiffusion
Civitai
Greg Rutkowski Style - Anima - v1.0 | Anima LoRA | Civitai
Greg Rutkowski style LoRA for Anima. Trained on preview3. Prefix prompt with "@greg rutkowski. " Natural language prompts work best. All training d...
Suggestions on which model I should train an MC Escher Tessellation LoRA on?
https://redd.it/1sjoe8u
@rStableDiffusion
https://redd.it/1sjoe8u
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Suggestions on which model I should train an MC Escher Tessellation LoRA on?
Explore this post and more from the StableDiffusion community
SD-FORGE EXTENSION
/r/StableDiffusion/comments/1sjty9x/sdforge_extension/
https://redd.it/1sjtyrz
@rStableDiffusion
/r/StableDiffusion/comments/1sjty9x/sdforge_extension/
https://redd.it/1sjtyrz
@rStableDiffusion
Reddit
From the sdforall community on Reddit: SD-FORGE EXTENSION
Posted by BusBackground5847 - 1 vote and 0 comments
This media is not supported in your browser
VIEW IN TELEGRAM
Me whenever people on the PC building subreddits ask me why I need >32GB of system RAM.
https://redd.it/1sjsvjk
@rStableDiffusion
https://redd.it/1sjsvjk
@rStableDiffusion
Does anyone know which model and potentially Lora was used to create these?
https://redd.it/1sjv1ga
@rStableDiffusion
https://redd.it/1sjv1ga
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Does anyone know which model and potentially Lora was used to create these?
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
IC-LoRA-Detailer: It's for post-processing, not just rendering (LTX2.3)
https://redd.it/1sjxoz6
@rStableDiffusion
https://redd.it/1sjxoz6
@rStableDiffusion
Haven't had more fun than today with subgraphs - Subgraphs are awesome!!!
https://redd.it/1sjs2bq
@rStableDiffusion
https://redd.it/1sjs2bq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Haven't had more fun than today with subgraphs - Subgraphs are awesome!!!
Explore this post and more from the StableDiffusion community
Used LTX 2.3 anchor frame injection to maintain brand consistency across AI video — before/after
Working on a brand campaign where consistency was everything — same can, same character, same lighting across all assets including video.
The main technique I used was anchor frame injection through using LTXV guides over inplace. Three reference frames injected at key points in the timeline:
a starting frame to lock the logo specifically,
a mid-point "consistency anchor" at frame 138 to bridge the gap, the guide is set low and the anchor image is designed with high almost flat contrast in key areas
and a hard end frame at reference strength 0.7 to leave enough room for natural movement.
Combined with canny edges, depth map, and pose estimation as control references.
The before GIF is the raw output. The after is the rerender with the anchor method applied.
The environment cleaned up significantly. One thing LTX over-interpreted was the walk — it added a fluidity that felt more runway than competitive player. Tighter pose constraints next pass.
Full case study in comments.
https://i.redd.it/fj2pl5covwug1.gif
https://i.redd.it/p0ubkd5pvwug1.gif
https://redd.it/1sk4051
@rStableDiffusion
Working on a brand campaign where consistency was everything — same can, same character, same lighting across all assets including video.
The main technique I used was anchor frame injection through using LTXV guides over inplace. Three reference frames injected at key points in the timeline:
a starting frame to lock the logo specifically,
a mid-point "consistency anchor" at frame 138 to bridge the gap, the guide is set low and the anchor image is designed with high almost flat contrast in key areas
and a hard end frame at reference strength 0.7 to leave enough room for natural movement.
Combined with canny edges, depth map, and pose estimation as control references.
The before GIF is the raw output. The after is the rerender with the anchor method applied.
The environment cleaned up significantly. One thing LTX over-interpreted was the walk — it added a fluidity that felt more runway than competitive player. Tighter pose constraints next pass.
Full case study in comments.
https://i.redd.it/fj2pl5covwug1.gif
https://i.redd.it/p0ubkd5pvwug1.gif
https://redd.it/1sk4051
@rStableDiffusion
Corridor Crew green/blue screening tool: Corridor Key
https://www.youtube.com/watch?v=Y3Dfw969itU
https://redd.it/1sk74mz
@rStableDiffusion
https://www.youtube.com/watch?v=Y3Dfw969itU
https://redd.it/1sk74mz
@rStableDiffusion
YouTube
I accidentally started a green screen revolution...
Squarespace ► Head to http://squarespace.com/corridorcrew to save 10% off your first purchase!
Our videos are made possible by Members of CorridorDigital, our Exclusive Streaming Service! Try a membership yourself with a 14-Day Free Trial ► http://cor…
Our videos are made possible by Members of CorridorDigital, our Exclusive Streaming Service! Try a membership yourself with a 14-Day Free Trial ► http://cor…
LTX2.3 (Distilled) - Updated sigmas for better results (?)
Hey y'all,
Was playing around with the LTX2.3 distilled sigmas for the first Ksampler and tried to tweak them for a bit of fun, and I think I've stumbled upon updated sigmas that give me better quality, detail and prompt adherence.
I've been using LTX2.3 since it came out and never really questioned the "official" sigmas that come with the original workflow, but today, for fun, I tried to tweak them a bit and I'm really liking the results I am getting.
This is all T2V and I have not tried with I2V, so not sure how it would affect the results there.
The original Sigmas for the first Ksampler (8 steps) are: 1.0, 0.99375, 0.9875, 0.98125, 0.975, 0.909375, 0.725, 0.421875, 0.0
After a bit of testing, I've settled on these new sigmas: 1.0, 0.995, 0.99, 0.9875, 0.975, 0.65, 0.28, 0.07, 0.0
I have made some comparisons that showcase the difference between both old and new sigmas, and I am really liking how things turn out with the new ones.
All results are 1280 x 704 x 24FPS, 5 seconds, Euler A sampler (16GB of VRAM so excuse the lower quality, also Reddit compression hurts a lot).
Left is with old sigmas, right is with new sigmas.
Sounds is from the video with the new sigmas.
https://reddit.com/link/1sk8vhq/video/7gsjvdn15yug1/player
a muscular man with rolled-up sleeves and a leather apron leans over a metal workbench in a dimly lit industrial workshop, he presses an angle grinder against a large piece of steel, a cascade of bright orange and white sparks erupts and scatters across the floor, his forearms flex with the effort, face partially lit by the sparks and harsh overhead workshop lamp, sawdust and metal shavings on the floor, dark gritty background with shelving and hanging tools slightly out of focus, cinematic, shallow depth of field, photorealistic
Streamable link: *https://streamable.com/rwt3vl*
https://reddit.com/link/1sk8vhq/video/yn1qv1g55yug1/player
a heavily muscular man with short cropped hair and scarred knuckles wraps his hands in a dimly lit boxing gym, then steps up to a heavy bag and throws a hard combination of punches, the bag swings violently, sweat flying off his arms with each impact, harsh overhead fluorescent light, cinematic, photorealistic
Streamable link: *https://streamable.com/36b5nx*
https://reddit.com/link/1sk8vhq/video/a4ougyv17yug1/player
In a dark theater room, a ballerina wearing a typical ballerina outfit is dancing, moving gracefully on the stage. A spotlight is focused on her.
Streamable link: *https://streamable.com/jwey0a*
https://reddit.com/link/1sk8vhq/video/p8ip8l5d5yug1/player
a tall dark-haired muscular man in a fitted black shirt behind a moody speakeasy bar grabs a shaker, tosses it spinning in the air, catches it smoothly and slams it on the bar, then leans forward on both hands looking directly into camera, neon backlit bottles, dark atmospheric lighting, cinematic, photorealistic
Streamable link: *https://streamable.com/qhycpa*
https://reddit.com/link/1sk8vhq/video/belte2og5yug1/player
A beautiful woman with long blonde hair, wearing a long white dress flowing in the wind is walking by a cliff, looking ethereal, looking in the distance. The sound of waves crashing down below can be heard. She is barefoot, walking through tall grass. The sun is casting beautiful lights and shadows on the scene.
Streamable link: *https://streamable.com/hz2fu5*
These are just some short examples which weren't cherry picked.
Not sure what this is worth, but thought I would share.
https://redd.it/1sk8vhq
@rStableDiffusion
Hey y'all,
Was playing around with the LTX2.3 distilled sigmas for the first Ksampler and tried to tweak them for a bit of fun, and I think I've stumbled upon updated sigmas that give me better quality, detail and prompt adherence.
I've been using LTX2.3 since it came out and never really questioned the "official" sigmas that come with the original workflow, but today, for fun, I tried to tweak them a bit and I'm really liking the results I am getting.
This is all T2V and I have not tried with I2V, so not sure how it would affect the results there.
The original Sigmas for the first Ksampler (8 steps) are: 1.0, 0.99375, 0.9875, 0.98125, 0.975, 0.909375, 0.725, 0.421875, 0.0
After a bit of testing, I've settled on these new sigmas: 1.0, 0.995, 0.99, 0.9875, 0.975, 0.65, 0.28, 0.07, 0.0
I have made some comparisons that showcase the difference between both old and new sigmas, and I am really liking how things turn out with the new ones.
All results are 1280 x 704 x 24FPS, 5 seconds, Euler A sampler (16GB of VRAM so excuse the lower quality, also Reddit compression hurts a lot).
Left is with old sigmas, right is with new sigmas.
Sounds is from the video with the new sigmas.
https://reddit.com/link/1sk8vhq/video/7gsjvdn15yug1/player
a muscular man with rolled-up sleeves and a leather apron leans over a metal workbench in a dimly lit industrial workshop, he presses an angle grinder against a large piece of steel, a cascade of bright orange and white sparks erupts and scatters across the floor, his forearms flex with the effort, face partially lit by the sparks and harsh overhead workshop lamp, sawdust and metal shavings on the floor, dark gritty background with shelving and hanging tools slightly out of focus, cinematic, shallow depth of field, photorealistic
Streamable link: *https://streamable.com/rwt3vl*
https://reddit.com/link/1sk8vhq/video/yn1qv1g55yug1/player
a heavily muscular man with short cropped hair and scarred knuckles wraps his hands in a dimly lit boxing gym, then steps up to a heavy bag and throws a hard combination of punches, the bag swings violently, sweat flying off his arms with each impact, harsh overhead fluorescent light, cinematic, photorealistic
Streamable link: *https://streamable.com/36b5nx*
https://reddit.com/link/1sk8vhq/video/a4ougyv17yug1/player
In a dark theater room, a ballerina wearing a typical ballerina outfit is dancing, moving gracefully on the stage. A spotlight is focused on her.
Streamable link: *https://streamable.com/jwey0a*
https://reddit.com/link/1sk8vhq/video/p8ip8l5d5yug1/player
a tall dark-haired muscular man in a fitted black shirt behind a moody speakeasy bar grabs a shaker, tosses it spinning in the air, catches it smoothly and slams it on the bar, then leans forward on both hands looking directly into camera, neon backlit bottles, dark atmospheric lighting, cinematic, photorealistic
Streamable link: *https://streamable.com/qhycpa*
https://reddit.com/link/1sk8vhq/video/belte2og5yug1/player
A beautiful woman with long blonde hair, wearing a long white dress flowing in the wind is walking by a cliff, looking ethereal, looking in the distance. The sound of waves crashing down below can be heard. She is barefoot, walking through tall grass. The sun is casting beautiful lights and shadows on the scene.
Streamable link: *https://streamable.com/hz2fu5*
These are just some short examples which weren't cherry picked.
Not sure what this is worth, but thought I would share.
https://redd.it/1sk8vhq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community