LTX-2.3 PolarQuant Q5: 88% size reduction, near lossless quality (Cosine Similarity: 0.9986).
https://redd.it/1t7mhaw
@rStableDiffusion
3 years of training with AI tools finally put to use

I have learned so much from this community and I want to say thank you all who have contributed endlessly to this subreddit. Me and 2 other AI users teamed up to make children's music videos. Here are some of the clips that utilized WAN22. Not everything on the youtube channel is opensourced, so I won' t post the link here unless it's requested. These are all made with standard WAN22 FFLF workflow which I have tweaked over the years.


The one thing I realized along the way is that WAN can do some amazing things, it's all in the prompt. Such as block transition, crash zoom, pan, dolly, tilt, rotate. It can pretty much do it all.

Here is the workflow for the first video.

https://reddit.com/link/1t7nqgz/video/8dsi4qysuzzg1/player

https://reddit.com/link/1t7nqgz/video/01c16z8tuzzg1/player

https://reddit.com/link/1t7nqgz/video/0tz5363vuzzg1/player

https://reddit.com/link/1t7nqgz/video/n1guckfxuzzg1/player

https://reddit.com/link/1t7nqgz/video/plda65pxuzzg1/player




https://redd.it/1t7nqgz
@rStableDiffusion
LTX 2.3 Sulphur vs 10Eros

For those that have tried these models? Which one do you prefer and why? What strengths and weaknesses have you found with each model?

https://redd.it/1t7os5i
@rStableDiffusion
Why did we move away from booru tags?

I’m obviously wrong for this opinion but I believe booru tags are a far better descriptor of visual medium than natural language. Simply listing the contents in an image is far more clearer than “the light dramatically plays against blah blah” which I think is just subjective abstruseness.

Most new models now are using massive text encoders which is excellent for understanding, but there are too many ways to naturally describe an image.

Same for video, we could have time stamped tags describing scenes in a comma separated booru style method. Removes ambiguity.

Can anyone tell me why the open source community chose natural language over booru style?

https://redd.it/1t8150y
@rStableDiffusion