Spatial Edit (Apache 2.0)
Has anyone tried this out?
https://github.com/EasonXiao-888/SpatialEdit
https://huggingface.co/EasonXiao-888/SpatialEdit-16B
https://redd.it/1sjcljf
@rStableDiffusion
Has anyone tried this out?
https://github.com/EasonXiao-888/SpatialEdit
https://huggingface.co/EasonXiao-888/SpatialEdit-16B
https://redd.it/1sjcljf
@rStableDiffusion
GitHub
GitHub - EasonXiao-888/SpatialEdit: SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing - EasonXiao-888/SpatialEdit
Can you use Qwen3.5 4b & Gemma 4 E4B with Z image/Turbo?
So I was wondering if I could use the latest for billion parameter versions of Qwen3.5 and Gemma 4 with Z image turbo and base version?
https://redd.it/1sje2ag
@rStableDiffusion
So I was wondering if I could use the latest for billion parameter versions of Qwen3.5 and Gemma 4 with Z image turbo and base version?
https://redd.it/1sje2ag
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
The mysterious science of LoRA training (sdxl)
I find myself still unable to train good looking character loras for illustrious, and I don't know what I'm doing wrong. I'm using a 3D character for this purpose (blender model) and I've tried replicating training settings from other people's lora that I consider great, but I still have questions.
1. Can you train actually train a 3D character on illustrious or is it fighting the model too much? (considering it seems much better at handling 2D visuals)
2. I've noticed most great LoRAs out there are using hundreds of image in their dataset, usually 200 to 400. My dataset is more on the side of 50, is there an actual benefit to such large datasets?
3. Repeats. Sounds like 10 epochs of 10 repeats would be equivalent to a 100 epochs of 1 repeat, but is that truly the case? I always struggle to figure out how many repeats I should be using.
4. TE. I noticed some people do not train the text encoder at all, anyone has feedback on the benefits of doing this?
5. Batch size. I want to use 6 or 8 batch size, because I can. But I'm not sure how I need to dial the other settings based on that, in particular with learning rate and repeats.
6. Removing backgrounds. Beside the fact that is makes captionning easier, is there an actual benefit, have you noticed it yielded better results?
I have noticed the following issues with my attempt at training, perhaps this will help someone point me in the right direction on what I'm doing wrong here:
* Style locking in too much. For example I like prompting with "dark, dim lighting" keywords which works well with illustrious, but my loras will make the result much brighter than the base model (even when tagging the dataset with "day"). Dataset has a couple night shots but they are mostly bright daylight.
* Faces train fast and seem to overtrain before clothes, making it impossible to find a good balance. Either one is overtrained or the other is undertrained. (I do have less full body shot than upper body and portrait, but this is apparently a desired ratio?)
* I have settled down on a LR of 2e-4 but have tried higher and lower with no success.
If you take the time to give to answer some of that, thank you =)
https://redd.it/1sjhf1d
@rStableDiffusion
I find myself still unable to train good looking character loras for illustrious, and I don't know what I'm doing wrong. I'm using a 3D character for this purpose (blender model) and I've tried replicating training settings from other people's lora that I consider great, but I still have questions.
1. Can you train actually train a 3D character on illustrious or is it fighting the model too much? (considering it seems much better at handling 2D visuals)
2. I've noticed most great LoRAs out there are using hundreds of image in their dataset, usually 200 to 400. My dataset is more on the side of 50, is there an actual benefit to such large datasets?
3. Repeats. Sounds like 10 epochs of 10 repeats would be equivalent to a 100 epochs of 1 repeat, but is that truly the case? I always struggle to figure out how many repeats I should be using.
4. TE. I noticed some people do not train the text encoder at all, anyone has feedback on the benefits of doing this?
5. Batch size. I want to use 6 or 8 batch size, because I can. But I'm not sure how I need to dial the other settings based on that, in particular with learning rate and repeats.
6. Removing backgrounds. Beside the fact that is makes captionning easier, is there an actual benefit, have you noticed it yielded better results?
I have noticed the following issues with my attempt at training, perhaps this will help someone point me in the right direction on what I'm doing wrong here:
* Style locking in too much. For example I like prompting with "dark, dim lighting" keywords which works well with illustrious, but my loras will make the result much brighter than the base model (even when tagging the dataset with "day"). Dataset has a couple night shots but they are mostly bright daylight.
* Faces train fast and seem to overtrain before clothes, making it impossible to find a good balance. Either one is overtrained or the other is undertrained. (I do have less full body shot than upper body and portrait, but this is apparently a desired ratio?)
* I have settled down on a LR of 2e-4 but have tried higher and lower with no success.
If you take the time to give to answer some of that, thank you =)
https://redd.it/1sjhf1d
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Free open-source tool to instantly rig and animate your illustrations (also with mesh deform)
https://redd.it/1sjj7ta
@rStableDiffusion
https://redd.it/1sjj7ta
@rStableDiffusion
Greg Rutkowski Anima Lora from Circlestone Labs (Anima makers) with training params
https://civitai.com/models/2536147/greg-rutkowski-style-anima
https://redd.it/1sjk7dc
@rStableDiffusion
https://civitai.com/models/2536147/greg-rutkowski-style-anima
https://redd.it/1sjk7dc
@rStableDiffusion
Civitai
Greg Rutkowski Style - Anima - v1.0 | Anima LoRA | Civitai
Greg Rutkowski style LoRA for Anima. Trained on preview3. Prefix prompt with "@greg rutkowski. " Natural language prompts work best. All training d...
Suggestions on which model I should train an MC Escher Tessellation LoRA on?
https://redd.it/1sjoe8u
@rStableDiffusion
https://redd.it/1sjoe8u
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Suggestions on which model I should train an MC Escher Tessellation LoRA on?
Explore this post and more from the StableDiffusion community
SD-FORGE EXTENSION
/r/StableDiffusion/comments/1sjty9x/sdforge_extension/
https://redd.it/1sjtyrz
@rStableDiffusion
/r/StableDiffusion/comments/1sjty9x/sdforge_extension/
https://redd.it/1sjtyrz
@rStableDiffusion
Reddit
From the sdforall community on Reddit: SD-FORGE EXTENSION
Posted by BusBackground5847 - 1 vote and 0 comments
This media is not supported in your browser
VIEW IN TELEGRAM
Me whenever people on the PC building subreddits ask me why I need >32GB of system RAM.
https://redd.it/1sjsvjk
@rStableDiffusion
https://redd.it/1sjsvjk
@rStableDiffusion
Does anyone know which model and potentially Lora was used to create these?
https://redd.it/1sjv1ga
@rStableDiffusion
https://redd.it/1sjv1ga
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Does anyone know which model and potentially Lora was used to create these?
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
IC-LoRA-Detailer: It's for post-processing, not just rendering (LTX2.3)
https://redd.it/1sjxoz6
@rStableDiffusion
https://redd.it/1sjxoz6
@rStableDiffusion
Haven't had more fun than today with subgraphs - Subgraphs are awesome!!!
https://redd.it/1sjs2bq
@rStableDiffusion
https://redd.it/1sjs2bq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Haven't had more fun than today with subgraphs - Subgraphs are awesome!!!
Explore this post and more from the StableDiffusion community
Used LTX 2.3 anchor frame injection to maintain brand consistency across AI video — before/after
Working on a brand campaign where consistency was everything — same can, same character, same lighting across all assets including video.
The main technique I used was anchor frame injection through using LTXV guides over inplace. Three reference frames injected at key points in the timeline:
a starting frame to lock the logo specifically,
a mid-point "consistency anchor" at frame 138 to bridge the gap, the guide is set low and the anchor image is designed with high almost flat contrast in key areas
and a hard end frame at reference strength 0.7 to leave enough room for natural movement.
Combined with canny edges, depth map, and pose estimation as control references.
The before GIF is the raw output. The after is the rerender with the anchor method applied.
The environment cleaned up significantly. One thing LTX over-interpreted was the walk — it added a fluidity that felt more runway than competitive player. Tighter pose constraints next pass.
Full case study in comments.
https://i.redd.it/fj2pl5covwug1.gif
https://i.redd.it/p0ubkd5pvwug1.gif
https://redd.it/1sk4051
@rStableDiffusion
Working on a brand campaign where consistency was everything — same can, same character, same lighting across all assets including video.
The main technique I used was anchor frame injection through using LTXV guides over inplace. Three reference frames injected at key points in the timeline:
a starting frame to lock the logo specifically,
a mid-point "consistency anchor" at frame 138 to bridge the gap, the guide is set low and the anchor image is designed with high almost flat contrast in key areas
and a hard end frame at reference strength 0.7 to leave enough room for natural movement.
Combined with canny edges, depth map, and pose estimation as control references.
The before GIF is the raw output. The after is the rerender with the anchor method applied.
The environment cleaned up significantly. One thing LTX over-interpreted was the walk — it added a fluidity that felt more runway than competitive player. Tighter pose constraints next pass.
Full case study in comments.
https://i.redd.it/fj2pl5covwug1.gif
https://i.redd.it/p0ubkd5pvwug1.gif
https://redd.it/1sk4051
@rStableDiffusion