r/StableDiffusion – Telegram

r/StableDiffusion

@rStableDiffusion

59 subscribers

37K photos

2.28K videos

1 file

16.9K links

reddit.com/r/StableDiffusion || reddit.com/r/sdforall

@reddit2telegram || @r_channels

Download Telegram

About

Blog

Apps

Platform

r/StableDiffusion

r/StableDiffusion

Media is too big

VIEW IN TELEGRAM

A Time Traveler's VLOG | Google VEO 3 + Downloadable Assets

https://redd.it/1l788rq
@rStableDiffusion

2 views16:40

r/StableDiffusion

About 5060ti and stabble difussion

Am i safe buying it to generate stuff using forge ui and flux? I remember when they came out reading something about ppl not being able to use that card because of some cuda stuff, i am kinda new into this and since i cant find stuff like benchmarks on youtube is making me doubt about buying it. Thx if anyone is willing to help and srry about the broken english.

https://redd.it/1l7a9k3
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

3 views17:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

How to make similar visual?

https://redd.it/1l7bw5n
@rStableDiffusion

3 views18:40

r/StableDiffusion

HeyGem Lipsync Avatar Demos & Guide!
https://youtu.be/Lefc84zlroA

https://redd.it/1l75bso
@rStableDiffusion

HeyGem: Free AI Lipsync Avatar Videos! HeyGen & Synthesia Open-Source Alternative, Run Locally!

In this video, I introduce HeyGem, a revolutionary open-source tool for generating AI lipsync videos—completely free. HeyGem gives creators the power to animate talking avatars up to 30 minutes long, producing results that rival top paid services like HeyGen…

2 views19:40

r/StableDiffusion

Framepack Studio: Exclusive First Look at the New Update (6/10/25) + Behind-the-Scenes with the Dev
https://youtu.be/hUvZ9VR-9_8

https://redd.it/1l7eug0
@rStableDiffusion

Framepack Studio: Exclusive First Look at the New Update + Behind-the-Scenes with the Dev

The massive new update to Framepack Studio is about to drop—and we’ve got your exclusive first look. GetGoingFast.pro sits down with developer Colin Urbs for a full walkthrough of the powerful new features coming your way. From video generation to post-processing…

3 views20:40

r/StableDiffusion

5070 ti vs 4070 ti super. Only $80 difference. But I am seeing a lot of backlash for the 5070 ti, should I getvthe 4070 ti super for $cheaper

Saw some posts regarding performance and PCIe compatibility issues with 5070 ti. Anyone here facing issues with image generations? Should I go with 4070 ti s. There is only around 8% performance difference between the two in benchmarks. Any other reasons I should go with 5070 ti.

https://redd.it/1l7eva5
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

3 views21:40

r/StableDiffusion

Explaining AI Image Generation

Howdy everybody,

I am college professor. In some of my classes we're using ai image generation as part of the assignments. I'm looking for a good way to explain how it works and I want to check my own understanding ai image generation. Below is what I have written for students (college level). Does this all check out?

So how exactly does this work? What is a prompt, what does it mean for an AI to have been trained on your work, and how does an AI create an image? When we create images with AI we’re prompting a Large Language Model (LLM) to make something. The model is built on information called training data. The way the LLM understands the training data is tied to concepts called the Deep Learning system and the Latent Space it produces. The LLM then uses Diffusion to create an image from randomized image noise. Outside of image making we interact with AI systems all of the time of many differing kinds. We usually are not aware of it.

When you prompt an AI you are asking a Large Language Model (LLM) to create an image for you. A LLM is an AI that has been trained on vast amounts of text and image data. That data allows it to understand language and image making. So if something is missing from the data set or is poorly represented in the data the LLM will produce nonsense. Similarly crafting a well made prompt will make the results more predictable.

The LLM’s ability to understand what you are asking is based in part on the way you interact with it. LLMs are tied to an Application Program Interface (API). For example the chat window in Midjourney or Opensea’s ChatGPT. You can also have more complex APIs like Adobe’s Firefly or Diffusionbee (a Stable Diffusion API) that in addition to text prompting include options for selecting styles, model, art Vs photography, etc.

Training data sets can be quite small or quite large. For most of the big name AI models the training data is vast. However you can train AI on additional smaller data sets called Low-Rank Adaption(LoRa) to be especially good at producing images of a certain kind. For example Cindy Sherman has been experimenting with AI generation and may have trained a LoRa on her oeuvre to produce new Cindy Sherman like images.

The training data can be Internet text forums, image forums, books, news, videos, movies, really any bit of culture or human interaction that has been fed into it. This can be much more than what is available on the open Internet. If something exists digitally you should assume someone somewhere has fed it or will feed it into a training data set for an LLM. This includes any conversations you have with an AI.

When something is used to train an LLM it influences the possible outcome of a prompt. So if as an artist your work features praying mantises and someone prompts for an image of a mantis your work will influence the result produced. The AI is not copying the work. The randomness in the diffusion step prevents copying though through concise prompting a very strong influence can be reflected in the final image.

In order for the AI to make sense of the training data it is ran through a Deep Learning system. This system identifies, categorizes, and systematizes the data into a Latent Space. To understand what this means let’s talk about what a digital image actually is. In the digital environment each image is made up of pixels. A pixel is a tiny square of light in a digital display that when combined with other squares of light make up an image. For example the images in this show started as 1792x2668 pixels in size (I later upscaled them for printing). Each of these squares can be one of 16,777,216 color values.

In the deep learning system the AI learns what pixel values and placement that are usually associated with something, for example a smiley face. This allows the LLM to create a latent space where it understands what you mean by a smiley face. It would know what a smile is by data tied to smiling emojis, pictures of people or animals smiling, children’s

2 views22:40

r/StableDiffusion

drawings, and so on. It would associate faces with human and animal faces but also the face of a cliff or maybe Facebook. However a ‘smiley face’ usually means an emoji so If I asked for a smiley face the LLM would probably give an emoji.

Finally we get to Diffusion. You can think of the latent space as labeled image noise (random pixels) in a great big soup of image noise. In the latent space the LLM can draw out from that noise images based on what it knows something should look like. As it draws the image further out of the noise more detail emerges.

Let’s simplify this process with a metaphor. Let’s say you have a box full of dice where half of the sides are painted black and half are painted white (2 possible colors instead of 16+million). The box holds enough dice that they can lay flat across the bottom of the 400 dice by 600 dice. You ask a scientist to make a smiley face with dice in the box. The scientist picks up the box and gives it a good shaking randomizing the placement of dice. For the sake of the metaphor imagine that all of the dice fall flat and fill out the bottom of the box. The scientist looks at the randomly placed dice and decides that some of them are starting form a smiley face. They then glue those dice to the bottom of the box and give it another shake. Some of the dice compliment the dice that were glued down in forming a smiley face. The scientist then glues those dice down as well. Maybe some of the originally glued down ones do not make sense anymore, they are broken off from the bottom of the box. They repeat shaking and gluing the dice down until they have a smiley face and all of the dice are glued to the bottom. Once they are all glued they show you the face.

In this metaphor you are prompting the scientist for a smiley face. The scientist knows what a smiley face is from their life experience (training data) and conceptualizes it in their mind (latent space). They then shake the box creating the first round of random shapes in the box (diffusion). Based on their conceptualizing of a smiley face they look for that pattern in the dice and fix those ones in place. They then continue to refine the smiley face by continuing to shake and glue dice in place. When done they show you the box (the results). You could further refine your results by asking for a large face or a small face or one off to the left and so on.

Since the dice are randomized it is extremely unlikely that any result will perfectly match another result or that it would prefectly match a smiley face that the scientist had seen in the past. However since there is a set number of dice there is a set number of possible combinations. This is true for all digital art. For an 8 bit image (the kind made by most AI) the number of possible combinations is so vast the likelihood of producing exactly the same image is quite low.1

https://redd.it/1l7hlyk
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views22:40

r/StableDiffusion

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
https://redd.it/1l7jf0z
@rStableDiffusion

4 views23:40

r/StableDiffusion

Forge/SwarmUI/Reforge/Comfy/a1111 which one do you use?

https://redd.it/1l7g6dv
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

4 views01:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

Comparison Video between Wan 2.1 and Google Veo 2 of 2 female spies fighting a man enemy agent. This is the first time I have tried 2 against 1 in a fight. This a first generation for each. Prompt was basically describing the female agents by color of clothing for the fighting moves.

https://redd.it/1l7n8k2
@rStableDiffusion

4 views02:40

r/StableDiffusion

🚀 Everlyn.app – Fast Image/Video Gen with Motion Control, 30s Length, and Free Images (Now Live)

Hey folks!

We just launched [Everlyn.app](http://Everlyn.app) — a new platform for video generation that is fast, powered by our newly developed tech in collaboration with world-class professors, and built with an intuitive UI. You can generate high-quality images and videos up to 30 seconds, add optional image input, use our intelligent prompt enhancement, and control the motions.

Key Features:

* ⚡ Fast inference (typically under 30s)
* 🎬 Long videos (up to 30s, multi-paragraph prompts supported)
* 📸 Free image generation (unlimited, watermark-free)
* 🎯 Fine-grained motion control
* 🤖 AI-powered prompt enhancement

💬 Since I’ve learned so much from this community and friends here, I’d love to give back. If you leave your email in the comments, I’ll personally send you 50 free credits to try Everlyn.ai.

https://redd.it/1l7omhx
@rStableDiffusion

www.everlyn.app

Everlyn AI-Free Fast Unlimited AI Video and Image Generator

Everlyn AI,World's fastest cheapest leading unlimited AI video generator,free unlimited AI image generator.image to video,text to video,text to image supported

6 views03:40

r/StableDiffusion

People who've trained LORA models on both Kohya and OneTrainer with the same datasets, what differences have you noticed between the two?

https://redd.it/1l7p387
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

4 views04:40

r/StableDiffusion

I made this thanks to JankuV4, a good LoRA, Canva and more

https://redd.it/1l7nz6f
@rStableDiffusion

From the StableDiffusion community on Reddit: I made this thanks to JankuV4, a good LoRA, Canva and more

Explore this post and more from the StableDiffusion community

4 views05:40

r/StableDiffusion

3 views05:40

r/StableDiffusion

Whats the best Virtual Try-On model today?

I know none of them are perfect at assigning patterns/textures/text. But from what you've researched, which do you think in today's age is the most accurate at them?

I tried Flux Kontext Pro on Fal and it wasnt very accurate in determining what to change and what not to, same with 4o Image Gen. I wanted to try the google "dressup" virtual try on, but I cant seem to find it anywhere.

OSS models would be ideal as I can tweak the workflow rather than just the prompt on ComfyUI.

https://redd.it/1l7roia
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

4 views06:40

r/StableDiffusion

Does anyone know what ai software and prompts this guy uses to make these kinds of morphs?
https://youtu.be/5Vh61jy9suo?si=kSGsjeSd42hK0L2o

https://redd.it/1l7s5ja
@rStableDiffusion

Volkswagen Best Car Models Evolution from 1938 to 2045

Discover the fascinating history of Volkswagen cars from the legendary Beetle (Type 1) to the futuristic all-electric ID. Trinity in just 4 minutes. This video showcases 45 iconic Volkswagen models in a stunning digital transformation timeline, highlighting…

4 views07:40

r/StableDiffusion

Self Forcing: The new Holy Grail for video generation?

https://self-forcing.github.io/

> Our model generates high-quality 480P videos with an initial latency of ~0.8 seconds, after which frames are generated in a streaming fashion at ~16 FPS on a single H100 GPU and ~10 FPS on a single 4090 with some optimizations.

> Our method has the same speed as CausVid but has much better video quality, free from over-saturation artifacts and having more natural motion. Compared to Wan, SkyReels, and MAGI, our approach is 150–400× faster in terms of latency, while achieving comparable or superior visual quality.

https://redd.it/1l7sxh3
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

5 views08:40

r/StableDiffusion

Fluxmania Legacy - WF in comments.

https://redd.it/1l7sy59
@rStableDiffusion

From the StableDiffusion community on Reddit: Fluxmania Legacy - WF in comments.

Explore this post and more from the StableDiffusion community

6 views09:40

r/StableDiffusion

5 views09:40