This media is not supported in your browser
VIEW IN TELEGRAM
Cel animation outpainting: Avatar: The Last Airbender 4:3 -> 16:9 with no crop
https://redd.it/1tbjinj
@rStableDiffusion
https://redd.it/1tbjinj
@rStableDiffusion
I built a local GUI + AI builder for creating ComfyUI custom node packs
I've been working on ComfyUI Node Builder, a local app for building custom ComfyUI nodes without hand-writing all the boilerplate every time.
The demo shows:
1. user describes a node idea
2. AI creates the node contract and Python
3. dependencies/files are updated
4. the pack is deployed and tested in ComfyUI
It is open-source and local. The AI Builder can create nodes, edit generated files, explain validation errors, run checks, and request deploy only when deploy permission is enabled.
GitHub:
https://github.com/caoool/comfyui-node-canvas
Landing page:
https://caoool.github.io/comfyui-node-canvas/
Node ideas and feedback:
https://github.com/caoool/comfyui-node-canvas/issues/2
I'd especially like feedback from people who build custom nodes: what node authoring workflow should this support next?
https://redd.it/1tbk8zv
@rStableDiffusion
I've been working on ComfyUI Node Builder, a local app for building custom ComfyUI nodes without hand-writing all the boilerplate every time.
The demo shows:
1. user describes a node idea
2. AI creates the node contract and Python
3. dependencies/files are updated
4. the pack is deployed and tested in ComfyUI
It is open-source and local. The AI Builder can create nodes, edit generated files, explain validation errors, run checks, and request deploy only when deploy permission is enabled.
GitHub:
https://github.com/caoool/comfyui-node-canvas
Landing page:
https://caoool.github.io/comfyui-node-canvas/
Node ideas and feedback:
https://github.com/caoool/comfyui-node-canvas/issues/2
I'd especially like feedback from people who build custom nodes: what node authoring workflow should this support next?
https://redd.it/1tbk8zv
@rStableDiffusion
GitHub
GitHub - caoool/comfyui-node-canvas: AI-powered GUI app for building, editing, deploying, and testing ComfyUI custom nodes and…
AI-powered GUI app for building, editing, deploying, and testing ComfyUI custom nodes and node packs. - caoool/comfyui-node-canvas
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
https://github.com/zghhui/OmniNFT
https://redd.it/1tbmfzm
@rStableDiffusion
https://github.com/zghhui/OmniNFT
https://redd.it/1tbmfzm
@rStableDiffusion
GitHub
GitHub - zghhui/OmniNFT: Code for "OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation"
Code for "OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation" - zghhui/OmniNFT
LTX 2.3 INT8 Benchmarks (2x Faster on Ampere)
Saw some interest in INT8 for LTX 2.3 after my last post, so here are the resources.
>Quick Warning: INT8 acceleration is specifically effective for Ampere GPUs (e.g., RTX 3080 Ti). If you’re already rocking an RTX 5090, you can safely ignore this.
The setup is easy—only the model loading part of the workflow changes. Everything else stays the same.
https://preview.redd.it/p1kqwomsgu0h1.png?width=931&format=png&auto=webp&s=626a72c691107d452a492acb4e1f3c169c7490e1
Performance Gain:
Stock: 118.77s
INT8: 66.45s
Result: \~2x speedup 🚀
Links:
weight & comfyui workflow
custom node
https://redd.it/1tbqxb5
@rStableDiffusion
Saw some interest in INT8 for LTX 2.3 after my last post, so here are the resources.
>Quick Warning: INT8 acceleration is specifically effective for Ampere GPUs (e.g., RTX 3080 Ti). If you’re already rocking an RTX 5090, you can safely ignore this.
The setup is easy—only the model loading part of the workflow changes. Everything else stays the same.
https://preview.redd.it/p1kqwomsgu0h1.png?width=931&format=png&auto=webp&s=626a72c691107d452a492acb4e1f3c169c7490e1
Performance Gain:
Stock: 118.77s
INT8: 66.45s
Result: \~2x speedup 🚀
Links:
weight & comfyui workflow
custom node
https://redd.it/1tbqxb5
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
LTX2.3 I2V Messing up the text details, anyone facing the same??
https://redd.it/1tbpd7h
@rStableDiffusion
https://redd.it/1tbpd7h
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX 2.3 adding unwanted subtitles in generated videos even when not mentioned in prompt
https://redd.it/1tbrsf7
@rStableDiffusion
https://redd.it/1tbrsf7
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Scenema Audio: Zero-shot expressive voice cloning and speech generation
https://redd.it/1tbzgi3
@rStableDiffusion
https://redd.it/1tbzgi3
@rStableDiffusion
ComfyUI Pixaroma Nodes: New Load Image, Notify & Utility Nodes (Ep17)
https://www.youtube.com/watch?v=dXH7Qx9pzyc
https://redd.it/1tc2fuz
@rStableDiffusion
https://www.youtube.com/watch?v=dXH7Qx9pzyc
https://redd.it/1tc2fuz
@rStableDiffusion
YouTube
ComfyUI Pixaroma Nodes: New Load Image, Notify & Utility Nodes (Ep17)
In this episode, I’ll show you the latest updates in the Pixaroma node pack for ComfyUI and Easy Install. We’ll look at the new Pixaroma Load Image node, new Copy and Open buttons, filename outputs, date-based save folders, smarter image resizing, width and…
LTX 2.3 video generation notes after testing H100, RTX 5090, A100, L40, FP8, BF16, and CPU offload
This community helped me a lot in my last post so here's my contribution back. If you're looking to generate LTX 2.3 videos, these notes might save you a few hundred dollars on wasted cloud rentals.
H100:
\- 5s distilled FP8, 704x1280, 121f: 48s
\- 5s distilled no-quant, 704x1280, 121f: 45s
\- 5s HQ/no-quant, 704x1280, 121f, 20 steps: 121s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps: 321s
\- 20s HQ/no-quant, 704x1280, 481f, 28 steps: 380-390s
RTX 5090:
\- 5s distilled FP8, 704x1280, 121f: 43s
\- 5s HQ FP8, 704x1280, 121f, 20 steps: 151s
\- 20s distilled FP8, 704x1280, 481f: failed/OOM after 55s
\- 20s distilled FP8, 576x1024, 481f: 104s
\- 20s distilled, no quantization, CPU offload, 704x1280, 481f: 299s
A100:
\- 5s image-conditioned, 704x1280: 401-425s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless render step: 608s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless remote total: 713s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless local wall time: 797s
L40:
(I left a note about this in the lessons paragraph below.)
\- 5s distilled, no quantization, CPU offload, 704x1280, 121f: 1199s
\- 5s distilled FP8, 704x1280, 121f: 197s
\- 20s distilled FP8, 704x1280, 481f, max batch 4: failed/OOM after 189s
\- 20s distilled FP8 low-memory, 704x1280, 481f, max batch 1: 365s
\- 20s distilled FP8 low-memory, 704x1280, 481f, repeated runs: 433-453s
Some lessons:
\- For some reason, the output of A100 was worse than H100 for exact setup. I generated around 20 videos on each GPU from the same cloud host and A100 output was always worse. A100 scenes were less realistic than H100.
\- I did not like 5090 results on distilled + FP8. Distilled with offloading to CPU RAM is better.
- The L40 cloud I rented could generate 20s 704x1280 clips, but only with a lower-memory FP8 setup for some reason. I am guessing the cloud rental device was not in the best state.
\- For spoken words, try to target around 45-52 words per 20 seconds.
\- Avoid ending with important words. The model sometimes cuts off the final syllable. A short final sentence helps.
I am still exploring this so feel free to let me know if there's anything additional I can do. Happy to contribute to the community if you're looking for any generated samples or examples.
https://redd.it/1tc5s73
@rStableDiffusion
This community helped me a lot in my last post so here's my contribution back. If you're looking to generate LTX 2.3 videos, these notes might save you a few hundred dollars on wasted cloud rentals.
H100:
\- 5s distilled FP8, 704x1280, 121f: 48s
\- 5s distilled no-quant, 704x1280, 121f: 45s
\- 5s HQ/no-quant, 704x1280, 121f, 20 steps: 121s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps: 321s
\- 20s HQ/no-quant, 704x1280, 481f, 28 steps: 380-390s
RTX 5090:
\- 5s distilled FP8, 704x1280, 121f: 43s
\- 5s HQ FP8, 704x1280, 121f, 20 steps: 151s
\- 20s distilled FP8, 704x1280, 481f: failed/OOM after 55s
\- 20s distilled FP8, 576x1024, 481f: 104s
\- 20s distilled, no quantization, CPU offload, 704x1280, 481f: 299s
A100:
\- 5s image-conditioned, 704x1280: 401-425s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless render step: 608s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless remote total: 713s
\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless local wall time: 797s
L40:
(I left a note about this in the lessons paragraph below.)
\- 5s distilled, no quantization, CPU offload, 704x1280, 121f: 1199s
\- 5s distilled FP8, 704x1280, 121f: 197s
\- 20s distilled FP8, 704x1280, 481f, max batch 4: failed/OOM after 189s
\- 20s distilled FP8 low-memory, 704x1280, 481f, max batch 1: 365s
\- 20s distilled FP8 low-memory, 704x1280, 481f, repeated runs: 433-453s
Some lessons:
\- For some reason, the output of A100 was worse than H100 for exact setup. I generated around 20 videos on each GPU from the same cloud host and A100 output was always worse. A100 scenes were less realistic than H100.
\- I did not like 5090 results on distilled + FP8. Distilled with offloading to CPU RAM is better.
- The L40 cloud I rented could generate 20s 704x1280 clips, but only with a lower-memory FP8 setup for some reason. I am guessing the cloud rental device was not in the best state.
\- For spoken words, try to target around 45-52 words per 20 seconds.
\- Avoid ending with important words. The model sometimes cuts off the final syllable. A short final sentence helps.
I am still exploring this so feel free to let me know if there's anything additional I can do. Happy to contribute to the community if you're looking for any generated samples or examples.
https://redd.it/1tc5s73
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Media is too big
VIEW IN TELEGRAM
DramaBox - Most Expressive Voice model ever based on LTX 2.3
https://redd.it/1tc6i8w
@rStableDiffusion
https://redd.it/1tc6i8w
@rStableDiffusion
SenseNova-U1 Technical Report: VAE-free Pixel-level Flow Matching with 32x Compression
https://redd.it/1tc2anx
@rStableDiffusion
https://redd.it/1tc2anx
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: SenseNova-U1 Technical Report: VAE-free Pixel-level Flow Matching with 32x Compression
Explore this post and more from the StableDiffusion community