Hi guys need info what can i use to generate sounds (sound effects)? I have gpu with 6GB of video memory and 32GB of RAM



https://redd.it/1lb39i8
@rStableDiffusion
I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch miniDiffusion

Hello Everyone,

I'm happy to share a project I've been working on over the past few months: miniDiffusion. It's a from-scratch reimplementation of Stable Diffusion 3.5, built entirely in PyTorch with minimal dependencies. What miniDiffusion includes:

1. Multi-Modal Diffusion Transformer Model (MM-DiT) Implementation

2. Implementations of core image generation modules: VAE, T5 encoder, and CLIP Encoder3. Flow Matching Scheduler & Joint Attention implementation

The goal behind miniDiffusion is to make it easier to understand how modern image generation diffusion models work by offering a clean, minimal, and readable implementation.

Check it out here: https://github.com/yousef-rafat/miniDiffusion

I'd love to hear your thoughts, feedback, or suggestions.

https://redd.it/1lb9ubp
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Nvidia presents Efficient Part-level 3D Object Generation via Dual Volume Packing
https://redd.it/1lbaxhu
@rStableDiffusion
Unlock Perplexity AI PRO – Full Year Access – 90% OFF! [LIMITED OFFER]
https://redd.it/1lbcn3v
@rStableDiffusion
How do I train a character LoRA that won’t conflict with style LoRAs? (consistent identity, flexible style)

Hi everyone, I’m a beginner who recently started working with AI-generated images, and I have a few questions I’d like to ask.

I’ve already experimented with training style LoRAs, and the results were quite good. I also tried training character LoRAs. My goal with anime character LoRAs is to remove the need for specific character tags—so ideally, when I use the prompt “1girl,” it would automatically generate the intended character. I only want to use extra tags when the character has variant outfits or hairstyles.

So my ideal generation flow is:

Base model → Character LoRA → Style LoRA

However, I ran into issues when combining these two LoRAs.
When both weights are set to 1.0, the colors become overly saturated and distorted.
If I reduce the character LoRA weight, the result deviates from the intended character design.
If I reduce the style LoRA weight, the art style no longer matches what I want.

For training the character LoRA, I prepared 50–100 images of the same character across various styles and angles.
I’ve seen conflicting advice about how to prepare datasets and captions for character LoRAs:

Some say you should use a dataset with a single consistent art style per character. I haven’t tried this, but I worry it might lead to style conflicts anyway (i.e., the character LoRA "bakes in" the training art style).
Some say you should include the character name tag in the captions; others say you shouldn’t. I chose not to use the tag.

# TL;DR

How can I train a character LoRA that works consistently with different style LoRAs without creating conflicts—ensuring the same character identity while freely changing the art style?
(Yes, I know I could just prompt famous anime characters by name, but I want to generate original or obscure characters that base models don’t recognize.)

https://redd.it/1lb97ut
@rStableDiffusion
How do I train my own model?(Not a very tech savy person)

Hello everyone!

I want to create a model where I can upload different angles of the eyewear and the output can give a result of a realistic model wearing those glasses.
Right now i have tried everyhing, Flux, Veo 2, etc, by far Veo 2 has a very high accuracy on the product, but i want to make a streamlined and reliable workflow for future, how do i do this?
if someone can help me with the process, it would mean a lot.

Thanks a lot:)

https://redd.it/1lbfkal
@rStableDiffusion
I built a tool to turn any video into a perfect LoRA dataset.

One thing I noticed is that creating a good LoRA starts with a good dataset. The process of scrubbing through videos, taking screenshots, trying to find a good mix of angles, and then weeding out all the blurry or near-identical frames can be **incredibly tedious.**

With the goal of learning how to use pose detection models, I ended up building a tool to automate that whole process. I don't have experience creating LoRAs myself, but this was a fun learning project, and I figured it might actually be helpful to the community.

**TO BE CLEAR:** this tool does not create LORAs. It extracts frame images from video files.

It's a command-line tool called `personfromvid`. You give it a video file, and it does the hard work for you:

* **Analyzes for quality:** It automatically finds the sharpest, best-lit frames and skips the blurry or poorly exposed ones.
* **Sorts by pose and angle:** It categorizes the good frames by pose (standing, sitting) and head direction (front, profile, looking up, etc.), which is perfect for getting the variety needed for a robust model.
* **Outputs ready-to-use images:** It saves everything to a folder of your choice, giving you full frames and (optionally) cropped faces, ready for training.

The goal is to let you go from a video clip to a high-quality, organized dataset with a single command.

It's free, open-source, and all the technical details are in the `README`.

* **GitHub Link:** [https://github.com/codeprimate/personfromvid](https://github.com/codeprimate/personfromvid)
* **Install with:** `pip install personfromvid`

Hope this is helpful! I'd love to hear what you think or if you have any feedback. Since I'm still new to the LoRA side of things, I'm sure there are features that could make it even better for your workflow. Let me know!

**CAVEAT EMPTOR**: I've only tested this on a Mac


https://redd.it/1lbfo0c
@rStableDiffusion
I unintentionally scared myself by using the I2V generation model

While experimenting with the video generation model, I had the idea of taking a picture of my room and using it in the ComfyUI workflow. I thought it could be fun.

So, I decided to take a photo with my phone and transfer it to my computer. Apart from the furniture and walls, nothing else appeared in the picture. I selected the image in the workflow and wrote a very short prompt to test: "A guy in the room." My main goal was to see if the room would maintain its consistency in the generated video.

Once the rendering was complete, I felt the onset of a panic attack. Why? The man generated in the AI video was none other than myself. I jumped up from my chair, completely panicked and plunged into total confusion as all the most extravagant theories raced through my mind.

Once I had calmed down, though still perplexed, I started analyzing the photo I had taken. After a few minutes of investigation, I finally discovered a faint reflection of myself taking the picture.

https://redd.it/1lbgtwq
@rStableDiffusion
3 ComfyUI Settings I Wish I Changed Sooner

# 1. ⚙️ Lock the Right Seed

Open the settings menu (bottom left) and use the search bar. Search for "widget control mode" and change it to Before.
By default, the KSampler uses the current seed for the next generation, not the one that made your last image.
Switching this setting means you can lock in the exact seed that generated your current image. Just set it from increment or randomize to fixed, and now you can test prompts, settings, or LoRAs against the same starting point.

# 2. 🎨 Slick Dark Theme

The default ComfyUI theme looks like wet concrete.
Go to Settings → Appearance → Color Palettes and pick one you like. I use Github.
Now everything looks like slick black marble instead of a construction site. 🙂

# 3. 🧩 Perfect Node Alignment

Use the search bar in settings and look for "snap to grid", then turn it on. Set "snap to grid size" to 10 (or whatever feels best to you).
By default, you can place nodes anywhere, even a pixel off. This keeps everything clean and locked in for neater workflows.

If you're just getting started, I shared this post over on r/ComfyUI:
👉 Beginner-Friendly Workflows Meant to Teach, Not Just Use 🙏

https://redd.it/1lbgwj8
@rStableDiffusion
What I keep getting locally vs published image (zoomed in) for Cyberrealistic Pony v11. Exactly the same workflow, no loras, FP16 - no quantization (link in comments) Anyone know what's causing this or how to fix this?
https://redd.it/1lbiyy3
@rStableDiffusion
Why is Stable Diffusion suddenly so slow? No settings changed (Windows).

I was using SD just fine last night, turned my computer off, then today when generating images it is taking incredibly long. I changed nothing.

I am not looking for bandaid fixes adding code to the webui to make it faster, I want to get to the bottom of why it's so slow. No other programs seem to be using gpu or cpu, I have plenty storage, so I am stuck.


Any help appreciated

https://redd.it/1lbkxi5
@rStableDiffusion
I built ChatFlow to make Flux even better on iPhone

https://preview.redd.it/tlmdyqvkqz6f1.jpg?width=2304&format=pjpg&auto=webp&s=1e5fc25630cf5923f0e3ce299751cca4c02adaac


I've been really impressed with the new FLUX model, but found it wasn't the easiest to use on my phone. So, I decided to build a simple app for it, and I'm excited to share my side-project, **ChatFlow**, with you all.

The idea was to make AI image creation as easy as chatting. You just type what you want to see, and the AI brings it to life. You can also tweak existing photos.

Here's a quick rundown of the features:

* **Text-to-Image:** Describe an image, and it appears.
* **Image-to-Image:** Give a new style to one of your photos.
* **Magic Prompt:** It helps optimize your prompts and can even translate them into English automatically. (Powered by OpenRouter)
* **Custom LoRA:** Includes 6 built-in commonly used LoRAs, and you can manage your own LoRAs.
* **Simple Chat Interface:** No complex settings, just create.

**A quick heads-up on how it works:** To keep the app completely **free** for everyone, it runs using your own API keys from Fal (for image generation) and OpenRouter (for the Magic Prompt feature). This way, you have full control and I don't have to charge for server costs.

I'm still actively working on it, so any feedback, ideas, or bug reports would be incredibly helpful! Let me know what you think.

You can grab it on the App Store here: [https://apps.apple.com/app/chatflow-create-now/id6746847699](https://apps.apple.com/app/chatflow-create-now/id6746847699)

https://redd.it/1lbo2lv
@rStableDiffusion
Looking for help turning a burning house photo into a realistic video (flames, smoke, dust, lens flares)
https://redd.it/1lbp9yj
@rStableDiffusion