r/StableDiffusion – Telegram

r/StableDiffusion

@rStableDiffusion

58 subscribers

39.5K photos

2.47K videos

1 file

18K links

reddit.com/r/StableDiffusion || reddit.com/r/sdforall

@reddit2telegram || @r_channels

Download Telegram

About

Blog

Apps

Platform

r/StableDiffusion

r/StableDiffusion

Media is too big

VIEW IN TELEGRAM

Chatterbox Audiobook - turning Japanese to English

https://redd.it/1ldwiw0
@rStableDiffusion

6 views20:40

r/StableDiffusion

Progress on the "unsettling dream/movie" LORA for Flux

https://redd.it/1ldybdh
@rStableDiffusion

From the StableDiffusion community on Reddit: Progress on the "unsettling dream/movie" LORA for Flux

Explore this post and more from the StableDiffusion community

6 views21:40

r/StableDiffusion

Chatterbox-TTS fork updated to include Voice Conversion, per generation json settings export, and more.

After seeing this community post here:
https://www.reddit.com/r/StableDiffusion/comments/1ldn88o/chatterbox\_audiobook\_and\_podcast\_studio\_all\_local/

And this other community post:
https://www.reddit.com/r/StableDiffusion/comments/1ldu8sf/video\_guide\_how\_to\_sync\_chatterbox\_tts\_with/

Here is my latest updated fork of Chatterbox-TTS.
NEW FEATURES:
It remembers your last settings and they will be reloaded when you restart the script.

Saves a json file for each audio generation that contains all your configuration data, including the seed, so when you want to use the same settings for other generations, you can load that json file into the json file upload/drag and drop box and all the settings contained in the json file will automatically be applied.

You can now select an alternate whisper sync validation model (faster-whisper) for faster validation and to use less VRAM. For example with the largest models: large (\~10–13 GB OpenAI / \~4.5–6.5 GB faster-whisper)

Added the VOICE CONVERSION feature that some had asked for which is already included in the original repo. This is where you can record yourself saying whatever, then take another voice and convert your voice to theirs saying the same thing in the same way, same intonation, timing, etc..

|Category|Features|
|:-|:-|
|Input|Text, multi-file upload, reference audio, load/save settings|
|Output|WAV/MP3/FLAC, per-gen .json/.csv settings, downloadable & previewable in UI|
|Generation|Multi-gen, multi-candidate, random/fixed seed, voice conditioning|
|Batching|Sentence batching, smart merge, parallel chunk processing, split by punctuation/length|
|Text Preproc|Lowercase, spacing normalization, dot-letter fix, inline ref number removal, sound word edit|
|Audio Postproc|Auto-editor silence trim, threshold/margin, keep original, normalization (ebu/peak)|
|Whisper Sync|Model selection, faster-whisper, bypass, per-chunk validation, retry logic|
|Voice Conversion|Input+target voice, watermark disabled, chunked processing, crossfade, WAV output|

https://redd.it/1le0194
@rStableDiffusion

From the StableDiffusion community on Reddit: Chatterbox Audiobook (and Podcast) Studio - All Local

Explore this post and more from the StableDiffusion community

7 views00:40

r/StableDiffusion

NVidia Cosmos Predict2! New txt2img model at 2B and 14B!

ComfyUI Guide for local use

https://docs.comfy.org/tutorials/image/cosmos/cosmos-predict2-t2i

This model just dropped out of the blue and I have been performing a few test:

1) SPEED TEST on a RTX 3090 @ 1MP (unless indicated otherwise)

FLUX.1-Dev FP16 = 1.45sec / it

Cosmos Predict2 2B = 1.2sec / it. @ 1MP & 1.5MP

Cosmos Predict2 2B = 1.8sec / it. @ 2MP

HiDream Full FP16 = 4.5sec / it.

Cosmos Predict2 14B = 4.9sec / it.

Cosmos Predict2 14B = 7.7sec / it. @ 1.5MP

Cosmos Predict2 14B = 10.65sec / it. @ 2MP

The thing to note here is that the 2B model can produce images at an impressive speed @ 2MP, while the 14B one reaches an atrocious speed.

Prompt: A Photograph of a russian woman with natural blue eyes and blonde hair is walking on the beach at dusk while wearing a red bikini. She is making the peace sign with one hand and winking

2B Model

14B Model

2) PROMPT TEST:

Prompt: An ethereal elven woman stands poised in a vibrant springtime valley, draped in an ornate, skimpy armor adorned with one magical gemstone embedded in its chest. A regal cloak flows behind her, lined with pristine white fur at the neck, adding to her striking presence. She wields a mystical spear pulsating with arcane energy, its luminous aura casting shifting colors across the landscape. Western Anime Style

2B Model

Prompt: A muscled Orc stands poised in a springtime valley, draped in an ornate, leather armor adorned with a small animal skulls. A regal black cloak flows behind him, lined with matted brown fur at the neck, adding to his menacing presence. He wields a rustic large Axe with both hands

2B Model

14B Model

Prompt: A massive spaceship glides silently through the void, approaching the curvature of a distant planet. Its sleek metallic hull reflects the light of a distant star as it prepares for orbital entry. The ship’s thrusters emit a faint, glowing trail, creating a mesmerizing contrast against the deep, inky blackness of space. Wisps of atmospheric haze swirl around its edges as it crosses into the planet’s gravitational pull, the moment captured in a cinematic, hyper-realistic style, emphasizing the grand scale and futuristic elegance of the vessel.

2B Model

Prompt: Under the soft pink canopy of a blooming Sakura tree, a man and a woman stand together, immersed in an intimate exchange. The gentle breeze stirs the delicate petals, causing a flurry of blossoms to drift around them like falling snow. The man, dressed in elegant yet casual attire, gazes at the woman with a warm, knowing smile, while she responds with a shy, delighted laugh, her long hair catching the light. Their interaction is subtle yet deeply expressive—an unspoken understanding conveyed through fleeting touches and lingering glances. The setting is painted in a dreamy, semi-realistic style, emphasizing the poetic beauty of the moment, where nature and emotion intertwine in perfect harmony.

2B Model

PERSONAL CONCLUSIONS FROM THE (PRELIMINARY) TEST:

Cosmos-Predict2-2B-Text2Image A bit weak in understanding styles (maybe it was not

Cosmos Predict2 Text-to-Image ComfyUI Official Example - ComfyUI

This guide demonstrates how to complete Cosmos-Predict2 text-to-image workflow in ComfyUI

3 views02:40

r/StableDiffusion

trained in them?), but relatively fast even at 2MP and with good prompt adherence (I'll have to test more).

Cosmos-Predict2-14B-Text2Image doesn't seem, to be "better" at first glance than it's 2B "mini-me", and it is HiDream sloooow.

Also, it has a text to Video brother! But, I am not testing it here yet.

The MEME:

Just don't prompt a woman laying on the grass!

Prompt: Photograph of a woman laying on the grass and eating a banana

https://preview.redd.it/9qipubalok7f1.jpg?width=1088&format=pjpg&auto=webp&s=3b7502d820964911e1ec807713ef3014d3d0a417

https://redd.it/1le28bw
@rStableDiffusion

6 views02:40

r/StableDiffusion

Im desperate, please help me understand LoRA training

Hello, 2 weeks ago i created my own realistic AI model ("incluencer"). Since then, I've trained like 8 LoRAs and none of them are good. The only LoRA that is giving me the face I want is unable to give me any other hairstyles then those on learning pictures. So I obviously tried to train another one, with better pictures, more hairstyles, emotions, from every angle, I had like 150 pictures - and it's complete bulls*it. Face resembles her maybe 4 out of 10 times.

Since im completely new in AI world, I've used ChatGPT for everything and he told me the more pics - the better for training. What I've noticed tho, CC on YT usually use only like 20-30pics so I'm now confused.

At this point I don't even care if its flux or sdxl, i have programs for both, but please can someone help me with definite answer on how many training pics i need? And do i train only the face or also the body? Or should it be done separately in 2 LoRAs?

Thank you so much🙈🙈❤️

https://redd.it/1le961p
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

10 views07:40

r/StableDiffusion

Let's Benchmark ! Your GPU against others - Wan Edition

Welcome to Let's Benchmark ! Your GPU against others - Where we share our generation time to see if we are on the good track compared to others in the community !

To do that, please always include at least the following (mine for reference):

Generation time : 4:01min
GPU : RTX 3090 24GB VRAM
RAM : 128GB
Model : Wan2.1 14B 720P GGUF Q8
Speedup Lora(s) : Kijai Self Forcing 14B (https://huggingface.co/Kijai/WanVideo\_comfy/blob/main/Wan21\_T2V\_14B\_lightx2v\_cfg\_step\_distill\_lora\_rank32.safetensors)
Steps : 4
Frames : 81 (5sec video)
Resolution : 720x1280

I think I'm average, but not sure ! That's why I'm creating this post so everyone can compare and share together !

https://redd.it/1lee9sh
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views11:40

r/StableDiffusion

Krea co-founder is considering open-sourcing their new model trained in collaboration with Black Forest Labs - Maybe go there and leave an encouraging comment?

https://preview.redd.it/j6qshjdiao7f1.jpg?width=1182&format=pjpg&auto=webp&s=9f5da751e086c7c3a8cd882f5b7648211daae50c

https://reddit.com/link/1leexi9/video/bs096nikao7f1/player

Link to the post: https://x.com/viccpoes/status/1934983545233277428

https://redd.it/1leexi9
@rStableDiffusion

6 views12:40

r/StableDiffusion

Flux Uncensored in ComfyUI | Master Full Body & Ultra-Realistic AI Workflow
https://youtu.be/N7GbJ97vJow

https://redd.it/1leh4tm
@rStableDiffusion

Flux Uncensored in ComfyUI | Master Full Body & Ultra-Realistic AI Workflow

Flux Uncensored in ComfyUI | Master Full Body & Ultra-Realistic AI Workflow

In this video, I’ll show you how to master Flux Uncensored inside ComfyUI to create ultra-realistic full-body and portrait images. This step-by-step tutorial walks you through everything…

8 views13:40

r/StableDiffusion

Qwen2VL-Flux ControlNet is available since Nov 2024 but most people missed it. Fully compatible with Flux Dev and ComfyUI. Works with Depth and Canny (kinda works with Tile and Realistic Lineart)

https://redd.it/1lefv07
@rStableDiffusion

From the StableDiffusion community on Reddit: Qwen2VL-Flux ControlNet is available since Nov 2024 but most people missed it. Fully…

Explore this post and more from the StableDiffusion community

5 views14:40

r/StableDiffusion

What is the best video upscaler besides Topaz?

Based on my research, it seems like Topaz is the best video upscaler currently. Topaz has been around for several years now. I am wondering why there hasn't been a newcomer yet with better quality.

Is your experience the same with video upscaler software, and what is the best OS video upscaler software?

https://redd.it/1ledzsc
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views15:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

Automatic video on BPM

https://redd.it/1lek2d0
@rStableDiffusion

6 views16:40

r/StableDiffusion

Which UI is better, Comfyui, Automatic1111, or Forge?

I'm going to start working with AI soon, and I'd like to know which one is the most recommended.

https://redd.it/1lekbm7
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

8 views17:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

Sources VS Output Comparaison: Trying to use 3D reference some with camera motion from blender to see if i can control the output

https://redd.it/1lensll
@rStableDiffusion

8 views18:40

r/StableDiffusion

Chroma - Diffusers released!

I look at the Chroma site and what do I see? It is now available in diffusers format!

https://huggingface.co/lodestones/Chroma/tree/main

https://redd.it/1lepqtg
@rStableDiffusion

From the StableDiffusion community on Reddit: Chroma - Diffusers released!

Explore this post and more from the StableDiffusion community

8 views19:40

r/StableDiffusion

Wan2 1 VACE Video Masking using Florence2 and SAM2 Segmentation
https://youtu.be/QON-XxE9r50?si=0-aHFMwARIId6jdY

https://redd.it/1ler7zz
@rStableDiffusion

Wan2 1 VACE Video Masking using Florence2 and SAM2 Segmentation

In this Tutorial I attempt to give a complete walkthrough of what it takes to use video masking to swap out one object for another using a reference image, SAM2 segementation, and Florence2Run in Wan 2.1 VACE.

Free Workflows can be found at: https://pat…

❤1

8 views20:40

r/StableDiffusion

FameGrid SDXL [Checkpoint]

https://redd.it/1leoe44
@rStableDiffusion

From the StableDiffusion community on Reddit: FameGrid SDXL [Checkpoint]

Explore this post and more from the StableDiffusion community

8 views21:40

r/StableDiffusion

Nvidia cosmos-predict2-2B

https://redd.it/1lerak2
@rStableDiffusion

From the StableDiffusion community on Reddit: Nvidia cosmos-predict2-2B

Explore this post and more from the StableDiffusion community

9 views22:40

r/StableDiffusion

7 views22:40