r/StableDiffusion

19 views17:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

Control FLUX.2 with reference images instead of training a LoRA — demo
https://redd.it/1tjqssg
@rStableDiffusion

19 views18:40

r/StableDiffusion

0:36

This media is not supported in your browser

VIEW IN TELEGRAM

SAM3 added to Comfyui-Angelo (sampler/inpainter/refiner)

https://redd.it/1tjp4ir
@rStableDiffusion

7 views19:40

r/StableDiffusion

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

decided to actually make stable diffusion

https://redd.it/1tjsv9s
@rStableDiffusion

7 views20:40

r/StableDiffusion

What happened to Hunyuan?

Hello!

I really liked the hunyuan model, did they go closed sources with further developments?
Any news about that? I think ltx is okay, but the visual quality of hunyuan sometimes even exceeded wan2.2, imo.

Best

https://redd.it/1tjvuvq
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views21:40

r/StableDiffusion

As someone who can already run most of the larger models (RTX 5090) I'm extremely glad I gave Anima Base a chance

I'll be honest. I didn't expect much from a 2B parameter model. I had initially written it off as being not worth the time simply because I had access to such powerful models with much higher parameter counts. I didn't see how it could possibly outdo what I already had. But wow, they really did one hell of a job on this, and I find that it produces better anime images (with easier prompting) than most of what's out there.

It doesn't suffer from a lot of the NLP problems where you get near identical outputs each time. It reminds me more of the SDXL / Pony era where you could give a general idea of what you wanted with tags (or yes NLP as well) and the model itself would find a way to make it interesting. This is one of those models where you don't even need an LLM to rewrite your prompts. Just give it a general direction and let it go.

The fact that it can understand NLP means it has a lot of the strengths of the older models without the weakness of getting shit confused. Like a blue hat and a red hat and 2 orange hats.

https://redd.it/1tjymfl
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views22:40

r/StableDiffusion

LTX 2.3 growing frustration

I have been defending LTX and had moved away from Wan 2.2 since LTX 2.3 came out. Now that I am trying to create a short narrative film I'm getting very frustrated with ltx's inability to follow prompt directions. For example shot of two estimate next to each other and all I want is for the camera to zoom in on one of the men as he talks. LTX keeps giving me a pullout or zoom out instead of a zoom in. Mo matter how I prompt for it it just won't do it. Should something so simple like that shot be so difficult to achieve. And I have used different workflows for example the new LTX director that has the prompt relay embedded.

Anyone else gets frustrated with this model.

https://redd.it/1tjtdi5
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views23:40

Update from comfy-flow.com, I made a plugin for comfyui

https://redd.it/1tk4kod
@rStableDiffusion

7 views02:40

r/StableDiffusion

Microsoft Lens seems to be back.
https://huggingface.co/microsoft/Lens-Turbo

https://redd.it/1tkajke
@rStableDiffusion

huggingface.co

microsoft/Lens-Turbo · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

7 views06:40

r/StableDiffusion

Been testing Krea 2 Large and Medium

https://redd.it/1tkd293
@rStableDiffusion

From the StableDiffusion community on Reddit: Been testing Krea 2 Large and Medium

Explore this post and more from the StableDiffusion community

7 views09:40

r/StableDiffusion

6 views09:40

r/StableDiffusion

0:08

This media is not supported in your browser

VIEW IN TELEGRAM

I built a free demo for Pixal3D (Tencent new image-to-3D model)

https://redd.it/1tkepzb
@rStableDiffusion

6 views10:40

r/StableDiffusion

0:14

This media is not supported in your browser

VIEW IN TELEGRAM

LTX 2.3 + LTX Director Testing

https://redd.it/1tkg4r9
@rStableDiffusion

6 views11:40

r/StableDiffusion

Phosphene 3.0 — open source AI video + image suite for Apple Silicon. Train your own LTX characters.

Sharing Phosphene 3.0. It's a free panel that runs LTX-Video 2.3 and a couple of image models natively on Apple Silicon. Local, MIT license, no subs, no cloud.

The thing that sets it apart from "yet another LTX wrapper": you can **train your own characters** inside the panel. Drop 30 to 80 photos, click Train, get a face LoRA back. Add a voice clip and you get a voice LoRA too. Auto-captions with Gemma 3 12B locally. \~3 hours per character on an M4 Max 64 GB.

**What 3.0 ships**

\- Text → video+audio (LTX-2 generates joint audio+video in one pass)

\- Image → video+audio

\- Audio → video (drive a clip with an audio reference)

\- FFLF (first frame + last frame interpolation)

\- Extend (continue an existing clip)

\- Character training (face + optional voice LoRA, from a single dataset)

\- Image Studio with three engines: Qwen-Image-Edit-2511, HiDream-O1, and the FLUX.1 family. Multi-reference composition up to 3 subjects.

**HiDream-O1 ported to MLX**

HiDream released their O1 image model on May 14. Got it running natively on Apple Silicon five days later. Photoreal portraits, instruction edits, multi-subject. \~67 seconds per 1024² on a 64 GB Mac.

**Hardware**

Apple Silicon only. Capability tiers auto-detected:

\- 16 / 24 GB: 512 px video, text-to-image works

\- 32 GB: 768 px

\- 64 GB+: 1024×576 video, full HD image, character training

\- A 7-second character clip with synced audio renders in \~6 min on M4 Max 64 GB

\- Character training takes \~3 hours per character

**Install**

One-click via Pinokio (search Phosphene). Or clone the repo and run the panel directly.

**Credits**

LTX Video 2.3 by Lightricks (their license on the weights). MLX port by `dgrauet/ltx-2-mlx`. HiDream by HiDream AI. Phosphene the panel is MIT.

**Honest limits**

\- Apple Silicon only. No Intel Mac, no Windows, no Linux.

\- Dialogue audio is hit-or-miss. Ambient/diegetic sound is where LTX-2 shines.

\- Character LoRAs are video-only (face + voice). Image LoRAs work in the Studio via Qwen/HiDream + a separate LoRA stack.

\- First run downloads \~28 GB of weights. Takes a while.

Repo: github.com/mrbizarro/phosphene

X: x.com/PhospheneAI

Dev: https://x.com/AIBizarrothe

Feedback welcome. Especially curious what people make with the character training side.

https://redd.it/1tkh9c2
@rStableDiffusion

GitHub

GitHub - mrbizarro/phosphene: Local generative video panel for Apple Silicon. Wraps LTX-2 MLX, joint audio+video, one-click Pinokio…

Local generative video panel for Apple Silicon. Wraps LTX-2 MLX, joint audio+video, one-click Pinokio install. - mrbizarro/phosphene

5 views12:40

About

Blog

Apps

Platform