This media is not supported in your browser
VIEW IN TELEGRAM
Control FLUX.2 with reference images instead of training a LoRA — demo
https://redd.it/1tjqssg
@rStableDiffusion
What happened to Hunyuan?

Hello!

I really liked the hunyuan model, did they go closed sources with further developments?
Any news about that? I think ltx is okay, but the visual quality of hunyuan sometimes even exceeded wan2.2, imo.

Best

https://redd.it/1tjvuvq
@rStableDiffusion
As someone who can already run most of the larger models (RTX 5090) I'm extremely glad I gave Anima Base a chance

I'll be honest. I didn't expect much from a 2B parameter model. I had initially written it off as being not worth the time simply because I had access to such powerful models with much higher parameter counts. I didn't see how it could possibly outdo what I already had. But wow, they really did one hell of a job on this, and I find that it produces better anime images (with easier prompting) than most of what's out there.


It doesn't suffer from a lot of the NLP problems where you get near identical outputs each time. It reminds me more of the SDXL / Pony era where you could give a general idea of what you wanted with tags (or yes NLP as well) and the model itself would find a way to make it interesting. This is one of those models where you don't even need an LLM to rewrite your prompts. Just give it a general direction and let it go.

The fact that it can understand NLP means it has a lot of the strengths of the older models without the weakness of getting shit confused. Like a blue hat and a red hat and 2 orange hats.

https://redd.it/1tjymfl
@rStableDiffusion
LTX 2.3 growing frustration

I have been defending LTX and had moved away from Wan 2.2 since LTX 2.3 came out. Now that I am trying to create a short narrative film I'm getting very frustrated with ltx's inability to follow prompt directions. For example shot of two estimate next to each other and all I want is for the camera to zoom in on one of the men as he talks. LTX keeps giving me a pullout or zoom out instead of a zoom in. Mo matter how I prompt for it it just won't do it. Should something so simple like that shot be so difficult to achieve. And I have used different workflows for example the new LTX director that has the prompt relay embedded.

Anyone else gets frustrated with this model.

https://redd.it/1tjtdi5
@rStableDiffusion
Phosphene 3.0 — open source AI video + image suite for Apple Silicon. Train your own LTX characters.

Sharing Phosphene 3.0. It's a free panel that runs LTX-Video 2.3 and a couple of image models natively on Apple Silicon. Local, MIT license, no subs, no cloud.



The thing that sets it apart from "yet another LTX wrapper": you can **train your own characters** inside the panel. Drop 30 to 80 photos, click Train, get a face LoRA back. Add a voice clip and you get a voice LoRA too. Auto-captions with Gemma 3 12B locally. \~3 hours per character on an M4 Max 64 GB.



**What 3.0 ships**

\- Text → video+audio (LTX-2 generates joint audio+video in one pass)

\- Image → video+audio

\- Audio → video (drive a clip with an audio reference)

\- FFLF (first frame + last frame interpolation)

\- Extend (continue an existing clip)

\- Character training (face + optional voice LoRA, from a single dataset)

\- Image Studio with three engines: Qwen-Image-Edit-2511, HiDream-O1, and the FLUX.1 family. Multi-reference composition up to 3 subjects.



**HiDream-O1 ported to MLX**

HiDream released their O1 image model on May 14. Got it running natively on Apple Silicon five days later. Photoreal portraits, instruction edits, multi-subject. \~67 seconds per 1024² on a 64 GB Mac.



**Hardware**

Apple Silicon only. Capability tiers auto-detected:

\- 16 / 24 GB: 512 px video, text-to-image works

\- 32 GB: 768 px

\- 64 GB+: 1024×576 video, full HD image, character training

\- A 7-second character clip with synced audio renders in \~6 min on M4 Max 64 GB

\- Character training takes \~3 hours per character



**Install**

One-click via Pinokio (search Phosphene). Or clone the repo and run the panel directly.



**Credits**

LTX Video 2.3 by Lightricks (their license on the weights). MLX port by `dgrauet/ltx-2-mlx`. HiDream by HiDream AI. Phosphene the panel is MIT.



**Honest limits**

\- Apple Silicon only. No Intel Mac, no Windows, no Linux.

\- Dialogue audio is hit-or-miss. Ambient/diegetic sound is where LTX-2 shines.

\- Character LoRAs are video-only (face + voice). Image LoRAs work in the Studio via Qwen/HiDream + a separate LoRA stack.

\- First run downloads \~28 GB of weights. Takes a while.



Repo: github.com/mrbizarro/phosphene

X: x.com/PhospheneAI

Dev: https://x.com/AIBizarrothe



Feedback welcome. Especially curious what people make with the character training side.

https://redd.it/1tkh9c2
@rStableDiffusion