What happened to Hunyuan?
Hello!
I really liked the hunyuan model, did they go closed sources with further developments?
Any news about that? I think ltx is okay, but the visual quality of hunyuan sometimes even exceeded wan2.2, imo.
Best
https://redd.it/1tjvuvq
@rStableDiffusion
Hello!
I really liked the hunyuan model, did they go closed sources with further developments?
Any news about that? I think ltx is okay, but the visual quality of hunyuan sometimes even exceeded wan2.2, imo.
Best
https://redd.it/1tjvuvq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
As someone who can already run most of the larger models (RTX 5090) I'm extremely glad I gave Anima Base a chance
I'll be honest. I didn't expect much from a 2B parameter model. I had initially written it off as being not worth the time simply because I had access to such powerful models with much higher parameter counts. I didn't see how it could possibly outdo what I already had. But wow, they really did one hell of a job on this, and I find that it produces better anime images (with easier prompting) than most of what's out there.
It doesn't suffer from a lot of the NLP problems where you get near identical outputs each time. It reminds me more of the SDXL / Pony era where you could give a general idea of what you wanted with tags (or yes NLP as well) and the model itself would find a way to make it interesting. This is one of those models where you don't even need an LLM to rewrite your prompts. Just give it a general direction and let it go.
The fact that it can understand NLP means it has a lot of the strengths of the older models without the weakness of getting shit confused. Like a blue hat and a red hat and 2 orange hats.
https://redd.it/1tjymfl
@rStableDiffusion
I'll be honest. I didn't expect much from a 2B parameter model. I had initially written it off as being not worth the time simply because I had access to such powerful models with much higher parameter counts. I didn't see how it could possibly outdo what I already had. But wow, they really did one hell of a job on this, and I find that it produces better anime images (with easier prompting) than most of what's out there.
It doesn't suffer from a lot of the NLP problems where you get near identical outputs each time. It reminds me more of the SDXL / Pony era where you could give a general idea of what you wanted with tags (or yes NLP as well) and the model itself would find a way to make it interesting. This is one of those models where you don't even need an LLM to rewrite your prompts. Just give it a general direction and let it go.
The fact that it can understand NLP means it has a lot of the strengths of the older models without the weakness of getting shit confused. Like a blue hat and a red hat and 2 orange hats.
https://redd.it/1tjymfl
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
LTX 2.3 growing frustration
I have been defending LTX and had moved away from Wan 2.2 since LTX 2.3 came out. Now that I am trying to create a short narrative film I'm getting very frustrated with ltx's inability to follow prompt directions. For example shot of two estimate next to each other and all I want is for the camera to zoom in on one of the men as he talks. LTX keeps giving me a pullout or zoom out instead of a zoom in. Mo matter how I prompt for it it just won't do it. Should something so simple like that shot be so difficult to achieve. And I have used different workflows for example the new LTX director that has the prompt relay embedded.
Anyone else gets frustrated with this model.
https://redd.it/1tjtdi5
@rStableDiffusion
I have been defending LTX and had moved away from Wan 2.2 since LTX 2.3 came out. Now that I am trying to create a short narrative film I'm getting very frustrated with ltx's inability to follow prompt directions. For example shot of two estimate next to each other and all I want is for the camera to zoom in on one of the men as he talks. LTX keeps giving me a pullout or zoom out instead of a zoom in. Mo matter how I prompt for it it just won't do it. Should something so simple like that shot be so difficult to achieve. And I have used different workflows for example the new LTX director that has the prompt relay embedded.
Anyone else gets frustrated with this model.
https://redd.it/1tjtdi5
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
I built a free demo for Pixal3D (Tencent new image-to-3D model)
https://redd.it/1tkepzb
@rStableDiffusion
https://redd.it/1tkepzb
@rStableDiffusion
Phosphene 3.0 — open source AI video + image suite for Apple Silicon. Train your own LTX characters.
Sharing Phosphene 3.0. It's a free panel that runs LTX-Video 2.3 and a couple of image models natively on Apple Silicon. Local, MIT license, no subs, no cloud.
The thing that sets it apart from "yet another LTX wrapper": you can **train your own characters** inside the panel. Drop 30 to 80 photos, click Train, get a face LoRA back. Add a voice clip and you get a voice LoRA too. Auto-captions with Gemma 3 12B locally. \~3 hours per character on an M4 Max 64 GB.
**What 3.0 ships**
\- Text → video+audio (LTX-2 generates joint audio+video in one pass)
\- Image → video+audio
\- Audio → video (drive a clip with an audio reference)
\- FFLF (first frame + last frame interpolation)
\- Extend (continue an existing clip)
\- Character training (face + optional voice LoRA, from a single dataset)
\- Image Studio with three engines: Qwen-Image-Edit-2511, HiDream-O1, and the FLUX.1 family. Multi-reference composition up to 3 subjects.
**HiDream-O1 ported to MLX**
HiDream released their O1 image model on May 14. Got it running natively on Apple Silicon five days later. Photoreal portraits, instruction edits, multi-subject. \~67 seconds per 1024² on a 64 GB Mac.
**Hardware**
Apple Silicon only. Capability tiers auto-detected:
\- 16 / 24 GB: 512 px video, text-to-image works
\- 32 GB: 768 px
\- 64 GB+: 1024×576 video, full HD image, character training
\- A 7-second character clip with synced audio renders in \~6 min on M4 Max 64 GB
\- Character training takes \~3 hours per character
**Install**
One-click via Pinokio (search Phosphene). Or clone the repo and run the panel directly.
**Credits**
LTX Video 2.3 by Lightricks (their license on the weights). MLX port by `dgrauet/ltx-2-mlx`. HiDream by HiDream AI. Phosphene the panel is MIT.
**Honest limits**
\- Apple Silicon only. No Intel Mac, no Windows, no Linux.
\- Dialogue audio is hit-or-miss. Ambient/diegetic sound is where LTX-2 shines.
\- Character LoRAs are video-only (face + voice). Image LoRAs work in the Studio via Qwen/HiDream + a separate LoRA stack.
\- First run downloads \~28 GB of weights. Takes a while.
Repo: github.com/mrbizarro/phosphene
X: x.com/PhospheneAI
Dev: https://x.com/AIBizarrothe
Feedback welcome. Especially curious what people make with the character training side.
https://redd.it/1tkh9c2
@rStableDiffusion
Sharing Phosphene 3.0. It's a free panel that runs LTX-Video 2.3 and a couple of image models natively on Apple Silicon. Local, MIT license, no subs, no cloud.
The thing that sets it apart from "yet another LTX wrapper": you can **train your own characters** inside the panel. Drop 30 to 80 photos, click Train, get a face LoRA back. Add a voice clip and you get a voice LoRA too. Auto-captions with Gemma 3 12B locally. \~3 hours per character on an M4 Max 64 GB.
**What 3.0 ships**
\- Text → video+audio (LTX-2 generates joint audio+video in one pass)
\- Image → video+audio
\- Audio → video (drive a clip with an audio reference)
\- FFLF (first frame + last frame interpolation)
\- Extend (continue an existing clip)
\- Character training (face + optional voice LoRA, from a single dataset)
\- Image Studio with three engines: Qwen-Image-Edit-2511, HiDream-O1, and the FLUX.1 family. Multi-reference composition up to 3 subjects.
**HiDream-O1 ported to MLX**
HiDream released their O1 image model on May 14. Got it running natively on Apple Silicon five days later. Photoreal portraits, instruction edits, multi-subject. \~67 seconds per 1024² on a 64 GB Mac.
**Hardware**
Apple Silicon only. Capability tiers auto-detected:
\- 16 / 24 GB: 512 px video, text-to-image works
\- 32 GB: 768 px
\- 64 GB+: 1024×576 video, full HD image, character training
\- A 7-second character clip with synced audio renders in \~6 min on M4 Max 64 GB
\- Character training takes \~3 hours per character
**Install**
One-click via Pinokio (search Phosphene). Or clone the repo and run the panel directly.
**Credits**
LTX Video 2.3 by Lightricks (their license on the weights). MLX port by `dgrauet/ltx-2-mlx`. HiDream by HiDream AI. Phosphene the panel is MIT.
**Honest limits**
\- Apple Silicon only. No Intel Mac, no Windows, no Linux.
\- Dialogue audio is hit-or-miss. Ambient/diegetic sound is where LTX-2 shines.
\- Character LoRAs are video-only (face + voice). Image LoRAs work in the Studio via Qwen/HiDream + a separate LoRA stack.
\- First run downloads \~28 GB of weights. Takes a while.
Repo: github.com/mrbizarro/phosphene
X: x.com/PhospheneAI
Dev: https://x.com/AIBizarrothe
Feedback welcome. Especially curious what people make with the character training side.
https://redd.it/1tkh9c2
@rStableDiffusion
GitHub
GitHub - mrbizarro/phosphene: Local generative video panel for Apple Silicon. Wraps LTX-2 MLX, joint audio+video, one-click Pinokio…
Local generative video panel for Apple Silicon. Wraps LTX-2 MLX, joint audio+video, one-click Pinokio install. - mrbizarro/phosphene
Tencent released Z-Image 6B with pixel space gen. No VAE & 1k Resolution.
https://redd.it/1tkipk6
@rStableDiffusion
https://redd.it/1tkipk6
@rStableDiffusion
Creating character turnaround sheets with Flux 2 Klein in ComfyUI
I made a small ComfyUI workflow for creating multi angle reference sheets from a single input image.
The main use case is character sheets. You give it one character image, and the workflow tries to generate multiple consistent views like front three quarter, side profile, rear view, rear three quarter, high angle, low angle, and a close detail view. The goal is to keep the same face, outfit, pose, expression, proportions, and general design while only changing the camera angle.
I built it mostly with native ComfyUI nodes. The only non native part, as far as I remember, is the GGUF loader. The prompts are written in a generic way, so it can also work for people, props, vehicles, creatures, or objects, but I mainly made it for character sheet generation.
I tested it with the Flux 2 Klein 4B Q4 GGUF model because I currently have access to only 4 GB VRAM. For such a small setup, it is giving acceptable results. It is not perfect, especially with difficult rear views or fine clothing continuity, but it is usable for blocking out reference angles and building rough character sheets.
I expect the 9B variant to give much better consistency and detail, especially for faces, costume continuity, proportions, and rear view inference.
This is not meant to be a final polished character turnaround solution. It is more of a practical workflow for quickly getting usable angle references from one image, especially when working with AI video, inpainting, first frame last frame generation, or character continuity.
Sharing it in case it is useful to anyone experimenting with Flux 2 Klein on low VRAM setups.
https://pastebin.com/EyRM0zed
https://preview.redd.it/y8v7v06d4o2h1.png?width=5824&format=png&auto=webp&s=3d7acb275bf8652b68501e9efb33af7d324e75ca
https://redd.it/1tkf9uc
@rStableDiffusion
I made a small ComfyUI workflow for creating multi angle reference sheets from a single input image.
The main use case is character sheets. You give it one character image, and the workflow tries to generate multiple consistent views like front three quarter, side profile, rear view, rear three quarter, high angle, low angle, and a close detail view. The goal is to keep the same face, outfit, pose, expression, proportions, and general design while only changing the camera angle.
I built it mostly with native ComfyUI nodes. The only non native part, as far as I remember, is the GGUF loader. The prompts are written in a generic way, so it can also work for people, props, vehicles, creatures, or objects, but I mainly made it for character sheet generation.
I tested it with the Flux 2 Klein 4B Q4 GGUF model because I currently have access to only 4 GB VRAM. For such a small setup, it is giving acceptable results. It is not perfect, especially with difficult rear views or fine clothing continuity, but it is usable for blocking out reference angles and building rough character sheets.
I expect the 9B variant to give much better consistency and detail, especially for faces, costume continuity, proportions, and rear view inference.
This is not meant to be a final polished character turnaround solution. It is more of a practical workflow for quickly getting usable angle references from one image, especially when working with AI video, inpainting, first frame last frame generation, or character continuity.
Sharing it in case it is useful to anyone experimenting with Flux 2 Klein on low VRAM setups.
https://pastebin.com/EyRM0zed
https://preview.redd.it/y8v7v06d4o2h1.png?width=5824&format=png&auto=webp&s=3d7acb275bf8652b68501e9efb33af7d324e75ca
https://redd.it/1tkf9uc
@rStableDiffusion
Pastebin
{ "id": "154e9cc6-7022-4964-b7df-b3aa9402b32a", "revision": 0, "last_no - Pastebin.com
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.