Are Diffusion Models Fundamentally Limited in 3D Understanding?

So if I understand correctly, Stable Diffusion is essentially a denoising algorithm. This means that all models based on this technology are, in their current form, incapable of truly understanding the 3D geometry of objects.
As a result, they would fail to reliably convert a third-person view into a first-person perspective or to change the viewing angle of a scene without introducing hallucinations or inconsistencies.

Am I wrong in thinking this way?

Edit: they can't be used for editing existing images/ videos. Only for generating new content?

Edit: after thinking about it I think I found where I was wrong. I was thinking about a one step scene angle transition like from a 3d scene to a first person view of someone in that scene. Clearly it won't work in one step. But if we let it render all the steps in between, like letting it use time dimension, then it will be able to do that accurately.

I would be happy if someone could illustrate it on an example.

https://redd.it/1kv12pw
@rStableDiffusion
Am I the only one who feels like the have an AI drug addiction?

Seriously. Between all the free online AI resources (Github, Discord, YouTube, Reddit) and having a system that can run these apps fairly decently 5800X, 96GB RAM, 4090 24GB VRAM, I feel like I'm a kid in a candy store.. or a crack addict in a free crack store? I get to download all kinds of amazing AI applications FOR FREE, many of which you can even use commercially for free. I feel almost like I have an AI problem and I need an intervention... but I don't want one :D

https://redd.it/1kv5i6f
@rStableDiffusion
PSA: Flux loras works EXTREMELY well on Chroma. Like very, VERY well

Tried a couple and, Well, saying I was mesmerized is an understatement.
Plus Chroma is fully uncensored so... Uh, yeah.

https://redd.it/1kvenmw
@rStableDiffusion
How come Jenga is not talked about here

https://github.com/dvlab-research/Jenga

This looks like an amazing piece of research, enabling Hunyuan and soon WAN2.1 at a much lower cost. They managed to 10x the generation time of Hunyuan t2v and 4x Hunyuan i2v. Excited to see what's gonna go down with WAN2.1 with this project.

https://redd.it/1kvfauk
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
I Just Open-Sourced 10 Camera Control Wan LoRAs & made a free HuggingFace Space

https://redd.it/1kviphp
@rStableDiffusion
Simple, uncensored model sharing site like early Civitai. Would you use it?


What do you guys think about a simple, uncensored model sharing site like early Civitai with no generation, no paywalls, just filters, tags, and clean search for models?

I want to build it, but to be honest, I can’t fund it right now. I’ve been unemployed for about 3 years and currently live on food stamps.

Estimated costs:

- ~$15/month for 1TB on Cloudflare R2
- Bandwidth depends on traffic
- $10/year for a Namecheap domain
- Everything else runs on free tiers

I plan to keep it fully transparent, with a /costs page showing real usage, bills, and how long donations cover it. If it grows too big, I’ll ask the community for help or add optional premium features (like faster downloads), but core browsing and downloading will always stay free.

I believe people respect honesty more than surprise paywalls, so I want to be upfront from the start.

Still figuring out if there’s enough interest to make it worth doing. If you’d use it or want to help, please let me know.

PS. Very ashamed self promo: you can check out what I made here: http://lucyradio.com/get. It’s a tiny music app for lofi, jazz, deep house, etc., with all images AI generated locally. So yeah, I owe a debt to the community for the checkpoints and LoRAs I used.


Thanks u/Fresh_Diffusor for encouraging me to post here.

https://redd.it/1kvjt40
@rStableDiffusion
Can i make my SD run slower on purpose?

My GPU is very loud when running Stable Diffusion. SD takes like 30 sec to finish an image.

Is it possible to make SD run normaly, like i'm playing a game, thus maybe making it longer to finish an image?
I don't mind waiting longer.

Thanks a lot!

https://redd.it/1kvu922
@rStableDiffusion
The censorship and paywall gatekeeping behind Video Generative AI is really depressing. So much potential, so little freedom

We live in a world where every corporation desires utmost control over their product. We also live in a world where for every person who sees that as wrong, we have 10-20 people defending these practices and another 100-200 on top of that who neither understand nor notice what is going on.

Google, Kling, Vidu, they all have such amazingly powerful tools, yet all these tools keep getting more and more censored, they keep getting more and more out of reach for the average consumer.

My take is that, so what if somebody uses these tools to make illegal "porn" for personal satisfaction? It's all fake, no real human beings are harmed, no the training data isn't equal to taking images of existing people and putting them in compromising positions or situations unless celebrity LORAs are being used with 100% likeness or loras/images of existing people are used. This is difficult to control sure, but ultimately it's a small price to pay for having complete and absolute freedom of choice, freedom of creativity and freedom of expression.

Artists capable of photorealistic art can still draw photorealism, if they have twisted desires they will take the time to draw themselves something twisted. IF they don't they won't. But regardless, paint, brushes, paper, canvas, other art tools, none of that is censored.

AI might have a lower skill entry on the surface, but creating cohesive, long, well put together videos or images that have custom framing, colors, lighting, individual and specific positions and expressions for each character requires time and skill too.

I don't like where AI is going

it's just another amazing thing that is slowly taken away and destroyed by corporate greed and corporate control.

I have zero interest in people's statements who defend these practices, not a single word you say interests me or will I accept it. All I see is how wonderfully creative tools are being dangled in front of us, then taken away while the local and free alternatives are starting to severely lag behind.

https://redd.it/1kw28p7
@rStableDiffusion