Pixelsmile works in comfyui -Enabling fine-grained microexpression control. Workflow included.
https://redd.it/1sefy2t
@rStableDiffusion
https://redd.it/1sefy2t
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Pixelsmile works in comfyui -Enabling fine-grained microexpression control. Workflow…
Explore this post and more from the StableDiffusion community
The people getting the best outputs from AI tools aren't the best prompt engineers. They're the ones who already know what good looks like.
I've been thinking about this after using both Stable Diffusion and Claude/Lovable for a commercial website rebuild, and I think the same principle applies to both.
When I started using SD, I assumed the people getting extraordinary results just had better prompt formulas. But the more I paid attention, the more I noticed something else. The photographers were getting better photorealistic outputs. The illustrators were getting better illustrated outputs. The designers were getting better-designed outputs.
The tool didn't level the playing field. It amplified whatever eyes and taste people already had.
The prompt "a woman in dramatic lighting" gets very different results from someone who understands Rembrandt lighting, split lighting, practical light sources, and motivated shadows — versus someone who just knows the words. Same prompt on paper. Completely different understanding behind it.
I think this is the underappreciated skill gap in AI generation. It's not prompting. It's visual literacy. The ability to look at an output and know precisely what's wrong and why — and then know what words to change to fix it.
A useful exercise I found: when I get an output that's close but not quite right, instead of changing the prompt randomly, I ask Claude to analyse what specific visual or compositional principle is being violated. Then I know exactly what to change. Treating the AI as both executor and critic is more powerful than using it as just one or the other.
Curious if others here have found the same pattern. Does domain expertise in visual fields transfer more than people think? Or does the tool actually reduce the advantage of having a trained eye?
https://redd.it/1sej6y0
@rStableDiffusion
I've been thinking about this after using both Stable Diffusion and Claude/Lovable for a commercial website rebuild, and I think the same principle applies to both.
When I started using SD, I assumed the people getting extraordinary results just had better prompt formulas. But the more I paid attention, the more I noticed something else. The photographers were getting better photorealistic outputs. The illustrators were getting better illustrated outputs. The designers were getting better-designed outputs.
The tool didn't level the playing field. It amplified whatever eyes and taste people already had.
The prompt "a woman in dramatic lighting" gets very different results from someone who understands Rembrandt lighting, split lighting, practical light sources, and motivated shadows — versus someone who just knows the words. Same prompt on paper. Completely different understanding behind it.
I think this is the underappreciated skill gap in AI generation. It's not prompting. It's visual literacy. The ability to look at an output and know precisely what's wrong and why — and then know what words to change to fix it.
A useful exercise I found: when I get an output that's close but not quite right, instead of changing the prompt randomly, I ask Claude to analyse what specific visual or compositional principle is being violated. Then I know exactly what to change. Treating the AI as both executor and critic is more powerful than using it as just one or the other.
Curious if others here have found the same pattern. Does domain expertise in visual fields transfer more than people think? Or does the tool actually reduce the advantage of having a trained eye?
https://redd.it/1sej6y0
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
LTX2.3 Multi Image reference
https://youtube.com/shorts/XfgfMuNB99g
https://redd.it/1secxzc
@rStableDiffusion
https://youtube.com/shorts/XfgfMuNB99g
https://redd.it/1secxzc
@rStableDiffusion
YouTube
LTX2.3 Multi Image Reference Shot #ltx2.3 #comfyui
LTX2.3 모델의 다중 이미지 참조 기능을 활용해서 일관성 있는 장면 연출을 시도해보았습니다. TEST 영상 입니다I tried to create consistent scenes by leveraging the multi-image reference capabilities of ...
Media is too big
VIEW IN TELEGRAM
The tool you've been waiting for, a FREE LOCAL ComfyUI based Full Movie Pipeline Agent. Enter anything in the prompt with a desired scejne time and let it go. Plenty of cool features. Enjoy :) KupkaProd Cinema Pipeline. 9 Min Video in post created with less than 40 words.
https://redd.it/1selevy
@rStableDiffusion
https://redd.it/1selevy
@rStableDiffusion
Ace-step 1.5XL's already up! I hope it will soon be available in a Comfyui format! ❤️
https://huggingface.co/collections/ACE-Step/ace-step-15-xl
https://redd.it/1semc1e
@rStableDiffusion
https://huggingface.co/collections/ACE-Step/ace-step-15-xl
https://redd.it/1semc1e
@rStableDiffusion
huggingface.co
Ace-Step 1.5-xl - a ACE-Step Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Help me find optimal hyper-parameters for Ultimate Stable Diffusion Upscale and complete my masters degree!
https://redd.it/1semi3t
@rStableDiffusion
https://redd.it/1semi3t
@rStableDiffusion
Ace Step 1.5 XL is out!!!
https://huggingface.co/ACE-Step/acestep-v15-xl-turbo
https://huggingface.co/ACE-Step/acestep-v15-xl-base
https://huggingface.co/ACE-Step/acestep-v15-xl-sft
Have fun all!
https://redd.it/1ses85i
@rStableDiffusion
https://huggingface.co/ACE-Step/acestep-v15-xl-turbo
https://huggingface.co/ACE-Step/acestep-v15-xl-base
https://huggingface.co/ACE-Step/acestep-v15-xl-sft
Have fun all!
https://redd.it/1ses85i
@rStableDiffusion
huggingface.co
ACE-Step/acestep-v15-xl-turbo · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Here's a trick you can perform with Depth map + FFLF
https://www.youtube.com/watch?v=1QvTmkXF-HY
https://redd.it/1seqv34
@rStableDiffusion
https://www.youtube.com/watch?v=1QvTmkXF-HY
https://redd.it/1seqv34
@rStableDiffusion
YouTube
Gourmet Pyramids
This showcase here is an AI visual technique I devised that enables the creation of food arranged in any desired shape. I think it can be utilised for TV commercials.
All of my AI demos:
https://www.youtube.com/playlist?list=PLe3OBqR7FeRhZM6SNoIWibQ1PA2JREYtL
All of my AI demos:
https://www.youtube.com/playlist?list=PLe3OBqR7FeRhZM6SNoIWibQ1PA2JREYtL
Open Sourcing my 10M model for video interpolations with comfy nodes. (FrameFusion)
Hello everyone, today I’m releasing on GitHub the model that I use in my commercial application, FrameFusion Motion Interpolation.
# A bit about me
(You can skip this part if you want.)
Before talking about the model, I just wanted to write a little about myself and this project.
I started learning Python and PyTorch about six years ago, when I developed Rife-App together with Wenbo Bao, who also created the DAIN model for image interpolation.
Even though this is not my main occupation, it is something I had a lot of pleasure developing, and it brought me some extra income during some difficult periods of my life.
Since then, I never really stopped developing and learning about ML. Eventually, I started creating and training my own algorithms. Right now, this model is used in my commercial application, and I think it has reached a good enough point for me to release it as open source. I still intend to keep working on improving the model, since this is something I genuinely enjoy doing.
# About the model and my goals in creating it
My focus with this model has always been to make it run at an acceptable speed on low-end hardware. After hundreds of versions, I think it has reached a reasonable balance between quality and speed, with the final model having a little under 10M parameters and a file size of about 37MB in fp32.
The downside of making a model this small and fast is that sometimes the interpolations are not the best in the world. I made this video with examples so people can get an idea of what to expect from the model. It was trained on both live action and anime, so it works decently for both.
I’m just a solo developer, and the model was fully trained using Kaggle, so I do not have much to share in terms of papers. But if anyone has questions about the architecture, I can try to answer. The source code is very simple, though, so probably any LLM can read it and explain it better than I can.
# Video example:
https://reddit.com/link/1sezpz7/video/qltsdwpzgstg1/player
It seen that Reddit is having some trouble showing the video, the same video can be seen on youtube:
https://youtu.be/qavwjDj7ei8
# A bit about the architecture
Honestly, the main idea behind the architecture is basically “throw a bunch of things at the wall and see what sticks”, but the main point is that the model outputs motion flows, which are then used to warp the original images.
This limits the result a little, since it does not use RGB information directly, but at the same time it can reduce artifacts, besides being lighter to run.
# Comfy
I do not use ComfyUI that much. I used it a few times to test one thing or another, but with the help of coding agents I tried to put together two nodes to use the model inside it.
Inside the GitHub repo, you can find the folder ComfyUI_FrameFusion with the custom nodes and also the safetensor, since the model is only 32MB and I was able to upload it directly to GitHub.
You can also find the file "FrameFusion Simple Workflow.json" with a very simple workflow using the nodes inside Comfy.
I feel like I may still need to update these nodes a bit, but I’ll wait for some feedback from people who use Comfy more than I do.
# Shameless self-promotion
If you like the model and want an easier way to use it on Windows, take a look at my commercial app on Steam. It uses exactly the same model that I’m releasing on GitHub, it just has more tools and options for working with videos, runs 100% offline, and is still in development, so it may still have some issues that I’m fixing little by little. (There is a link for it on the github)
I hope the model is useful for some people here. I can try to answer any questions you may have. I’m also using an LLM to help format this post a little, so I hope it does not end up looking like slop or anything.
# And finally, the link:
GitHub:
Hello everyone, today I’m releasing on GitHub the model that I use in my commercial application, FrameFusion Motion Interpolation.
# A bit about me
(You can skip this part if you want.)
Before talking about the model, I just wanted to write a little about myself and this project.
I started learning Python and PyTorch about six years ago, when I developed Rife-App together with Wenbo Bao, who also created the DAIN model for image interpolation.
Even though this is not my main occupation, it is something I had a lot of pleasure developing, and it brought me some extra income during some difficult periods of my life.
Since then, I never really stopped developing and learning about ML. Eventually, I started creating and training my own algorithms. Right now, this model is used in my commercial application, and I think it has reached a good enough point for me to release it as open source. I still intend to keep working on improving the model, since this is something I genuinely enjoy doing.
# About the model and my goals in creating it
My focus with this model has always been to make it run at an acceptable speed on low-end hardware. After hundreds of versions, I think it has reached a reasonable balance between quality and speed, with the final model having a little under 10M parameters and a file size of about 37MB in fp32.
The downside of making a model this small and fast is that sometimes the interpolations are not the best in the world. I made this video with examples so people can get an idea of what to expect from the model. It was trained on both live action and anime, so it works decently for both.
I’m just a solo developer, and the model was fully trained using Kaggle, so I do not have much to share in terms of papers. But if anyone has questions about the architecture, I can try to answer. The source code is very simple, though, so probably any LLM can read it and explain it better than I can.
# Video example:
https://reddit.com/link/1sezpz7/video/qltsdwpzgstg1/player
It seen that Reddit is having some trouble showing the video, the same video can be seen on youtube:
https://youtu.be/qavwjDj7ei8
# A bit about the architecture
Honestly, the main idea behind the architecture is basically “throw a bunch of things at the wall and see what sticks”, but the main point is that the model outputs motion flows, which are then used to warp the original images.
This limits the result a little, since it does not use RGB information directly, but at the same time it can reduce artifacts, besides being lighter to run.
# Comfy
I do not use ComfyUI that much. I used it a few times to test one thing or another, but with the help of coding agents I tried to put together two nodes to use the model inside it.
Inside the GitHub repo, you can find the folder ComfyUI_FrameFusion with the custom nodes and also the safetensor, since the model is only 32MB and I was able to upload it directly to GitHub.
You can also find the file "FrameFusion Simple Workflow.json" with a very simple workflow using the nodes inside Comfy.
I feel like I may still need to update these nodes a bit, but I’ll wait for some feedback from people who use Comfy more than I do.
# Shameless self-promotion
If you like the model and want an easier way to use it on Windows, take a look at my commercial app on Steam. It uses exactly the same model that I’m releasing on GitHub, it just has more tools and options for working with videos, runs 100% offline, and is still in development, so it may still have some issues that I’m fixing little by little. (There is a link for it on the github)
I hope the model is useful for some people here. I can try to answer any questions you may have. I’m also using an LLM to help format this post a little, so I hope it does not end up looking like slop or anything.
# And finally, the link:
GitHub:
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community