Here's a trick you can perform with Depth map + FFLF
https://www.youtube.com/watch?v=1QvTmkXF-HY
https://redd.it/1seqv34
@rStableDiffusion
https://www.youtube.com/watch?v=1QvTmkXF-HY
https://redd.it/1seqv34
@rStableDiffusion
YouTube
Gourmet Pyramids
This showcase here is an AI visual technique I devised that enables the creation of food arranged in any desired shape. I think it can be utilised for TV commercials.
All of my AI demos:
https://www.youtube.com/playlist?list=PLe3OBqR7FeRhZM6SNoIWibQ1PA2JREYtL
All of my AI demos:
https://www.youtube.com/playlist?list=PLe3OBqR7FeRhZM6SNoIWibQ1PA2JREYtL
Open Sourcing my 10M model for video interpolations with comfy nodes. (FrameFusion)
Hello everyone, today I’m releasing on GitHub the model that I use in my commercial application, FrameFusion Motion Interpolation.
# A bit about me
(You can skip this part if you want.)
Before talking about the model, I just wanted to write a little about myself and this project.
I started learning Python and PyTorch about six years ago, when I developed Rife-App together with Wenbo Bao, who also created the DAIN model for image interpolation.
Even though this is not my main occupation, it is something I had a lot of pleasure developing, and it brought me some extra income during some difficult periods of my life.
Since then, I never really stopped developing and learning about ML. Eventually, I started creating and training my own algorithms. Right now, this model is used in my commercial application, and I think it has reached a good enough point for me to release it as open source. I still intend to keep working on improving the model, since this is something I genuinely enjoy doing.
# About the model and my goals in creating it
My focus with this model has always been to make it run at an acceptable speed on low-end hardware. After hundreds of versions, I think it has reached a reasonable balance between quality and speed, with the final model having a little under 10M parameters and a file size of about 37MB in fp32.
The downside of making a model this small and fast is that sometimes the interpolations are not the best in the world. I made this video with examples so people can get an idea of what to expect from the model. It was trained on both live action and anime, so it works decently for both.
I’m just a solo developer, and the model was fully trained using Kaggle, so I do not have much to share in terms of papers. But if anyone has questions about the architecture, I can try to answer. The source code is very simple, though, so probably any LLM can read it and explain it better than I can.
# Video example:
https://reddit.com/link/1sezpz7/video/qltsdwpzgstg1/player
It seen that Reddit is having some trouble showing the video, the same video can be seen on youtube:
https://youtu.be/qavwjDj7ei8
# A bit about the architecture
Honestly, the main idea behind the architecture is basically “throw a bunch of things at the wall and see what sticks”, but the main point is that the model outputs motion flows, which are then used to warp the original images.
This limits the result a little, since it does not use RGB information directly, but at the same time it can reduce artifacts, besides being lighter to run.
# Comfy
I do not use ComfyUI that much. I used it a few times to test one thing or another, but with the help of coding agents I tried to put together two nodes to use the model inside it.
Inside the GitHub repo, you can find the folder ComfyUI_FrameFusion with the custom nodes and also the safetensor, since the model is only 32MB and I was able to upload it directly to GitHub.
You can also find the file "FrameFusion Simple Workflow.json" with a very simple workflow using the nodes inside Comfy.
I feel like I may still need to update these nodes a bit, but I’ll wait for some feedback from people who use Comfy more than I do.
# Shameless self-promotion
If you like the model and want an easier way to use it on Windows, take a look at my commercial app on Steam. It uses exactly the same model that I’m releasing on GitHub, it just has more tools and options for working with videos, runs 100% offline, and is still in development, so it may still have some issues that I’m fixing little by little. (There is a link for it on the github)
I hope the model is useful for some people here. I can try to answer any questions you may have. I’m also using an LLM to help format this post a little, so I hope it does not end up looking like slop or anything.
# And finally, the link:
GitHub:
Hello everyone, today I’m releasing on GitHub the model that I use in my commercial application, FrameFusion Motion Interpolation.
# A bit about me
(You can skip this part if you want.)
Before talking about the model, I just wanted to write a little about myself and this project.
I started learning Python and PyTorch about six years ago, when I developed Rife-App together with Wenbo Bao, who also created the DAIN model for image interpolation.
Even though this is not my main occupation, it is something I had a lot of pleasure developing, and it brought me some extra income during some difficult periods of my life.
Since then, I never really stopped developing and learning about ML. Eventually, I started creating and training my own algorithms. Right now, this model is used in my commercial application, and I think it has reached a good enough point for me to release it as open source. I still intend to keep working on improving the model, since this is something I genuinely enjoy doing.
# About the model and my goals in creating it
My focus with this model has always been to make it run at an acceptable speed on low-end hardware. After hundreds of versions, I think it has reached a reasonable balance between quality and speed, with the final model having a little under 10M parameters and a file size of about 37MB in fp32.
The downside of making a model this small and fast is that sometimes the interpolations are not the best in the world. I made this video with examples so people can get an idea of what to expect from the model. It was trained on both live action and anime, so it works decently for both.
I’m just a solo developer, and the model was fully trained using Kaggle, so I do not have much to share in terms of papers. But if anyone has questions about the architecture, I can try to answer. The source code is very simple, though, so probably any LLM can read it and explain it better than I can.
# Video example:
https://reddit.com/link/1sezpz7/video/qltsdwpzgstg1/player
It seen that Reddit is having some trouble showing the video, the same video can be seen on youtube:
https://youtu.be/qavwjDj7ei8
# A bit about the architecture
Honestly, the main idea behind the architecture is basically “throw a bunch of things at the wall and see what sticks”, but the main point is that the model outputs motion flows, which are then used to warp the original images.
This limits the result a little, since it does not use RGB information directly, but at the same time it can reduce artifacts, besides being lighter to run.
# Comfy
I do not use ComfyUI that much. I used it a few times to test one thing or another, but with the help of coding agents I tried to put together two nodes to use the model inside it.
Inside the GitHub repo, you can find the folder ComfyUI_FrameFusion with the custom nodes and also the safetensor, since the model is only 32MB and I was able to upload it directly to GitHub.
You can also find the file "FrameFusion Simple Workflow.json" with a very simple workflow using the nodes inside Comfy.
I feel like I may still need to update these nodes a bit, but I’ll wait for some feedback from people who use Comfy more than I do.
# Shameless self-promotion
If you like the model and want an easier way to use it on Windows, take a look at my commercial app on Steam. It uses exactly the same model that I’m releasing on GitHub, it just has more tools and options for working with videos, runs 100% offline, and is still in development, so it may still have some issues that I’m fixing little by little. (There is a link for it on the github)
I hope the model is useful for some people here. I can try to answer any questions you may have. I’m also using an LLM to help format this post a little, so I hope it does not end up looking like slop or anything.
# And finally, the link:
GitHub:
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Anima preview3 was released
For those who has been following Anima, a new preview version was released around 2 hours ago.
Huggingface: https://huggingface.co/circlestone-labs/Anima
Civitai: https://civitai.com/models/2458426/anima-official?modelVersionId=2836417
The model is still in training. It is made by circlestone-labs.
The changes in preview3 (mentioned by the creator in the links above):
Highres training is in progress. Trained for much longer at 1024 resolution than preview2.
Expanded dataset to help learn less common artists (roughly 50-100 post count).
https://redd.it/1sf6w2x
@rStableDiffusion
For those who has been following Anima, a new preview version was released around 2 hours ago.
Huggingface: https://huggingface.co/circlestone-labs/Anima
Civitai: https://civitai.com/models/2458426/anima-official?modelVersionId=2836417
The model is still in training. It is made by circlestone-labs.
The changes in preview3 (mentioned by the creator in the links above):
Highres training is in progress. Trained for much longer at 1024 resolution than preview2.
Expanded dataset to help learn less common artists (roughly 50-100 post count).
https://redd.it/1sf6w2x
@rStableDiffusion
huggingface.co
circlestone-labs/Anima · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Last week in Generative Image & Video
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from the last week:
GEMS \- Closed-loop system for spatial logic and text rendering in image generation. Outperforms Nano Banana 2 on GenEval2. [GitHub](https://github.com/lcqysl/GEMS) | [Paper](https://arxiv.org/abs/2603.28088)
https://preview.redd.it/16r9ffhd9wtg1.png?width=1456&format=png&auto=webp&s=325ef8a75d23cfa625ac33dfd4d9727c690c11b0
ComfyUI Post-Processing Suite \- Photorealism suite by thezveroboy. Simulates sensor noise, analog artifacts, and camera metadata with base64 EXIF transfer and calibrated DNG writing. GitHub
https://preview.redd.it/mhs0fi5f9wtg1.png?width=990&format=png&auto=webp&s=716128b81d8dd091615d3ede8f0acbcb3d1327a6
CutClaw \- Open multi-agent video editing framework. Autonomously cuts hours of footage into narrative shorts. [Paper](https://arxiv.org/abs/2603.29664) | [GitHub](https://github.com/GVCLab/CutClaw) | [Hugging Face](https://huggingface.co/papers/2603.29664)
https://reddit.com/link/1sfj9dt/video/uw4oz84j9wtg1/player
Netflix VOID \- Video object deletion with physics simulation. Built on CogVideoX-5B and SAM 2. Project | Hugging Face Space
https://reddit.com/link/1sfj9dt/video/1vzz6zck9wtg1/player
Flux FaceIR \- Flux-2-klein LoRA for blind or reference-guided face restoration. [GitHub](https://github.com/cosmicrealm/ComfyUI-Flux-FaceIR)
https://preview.redd.it/05o2181m9wtg1.png?width=1456&format=png&auto=webp&s=691420332c1e42d9511c7d1cbecf305a5d885d67
Flux-restoration \- Unified face restoration LoRA on FLUX.2-klein-base-4B. GitHub
https://preview.redd.it/l69v7cfn9wtg1.png?width=1456&format=png&auto=webp&s=1711dc1321b997d4247e5db0ac8e13ec4e56180b
LTX2.3 Cameraman LoRA \- Transfers camera motion from reference videos to new scenes. No trigger words. [Hugging Face](https://huggingface.co/Cseti/LTX2.3-22B_IC-LoRA-Cameraman_v1)
https://reddit.com/link/1sfj9dt/video/v8jl2nlq9wtg1/player
Honorable Mentions:
Gen-Searcher \- Agentic search image generation across styles. Hugging Face | GitHub
https://preview.redd.it/suqsu3et9wtg1.png?width=1268&format=png&auto=webp&s=8008783b5d3e298703a8673b6a15c54f4d2155bd
OmniVoice \- 600+ language TTS with voice cloning. [Hugging Face](https://huggingface.co/k2-fsa/OmniVoice) | [ComfyUI](https://github.com/Saganaki22/ComfyUI-OmniVoice-TTS)
https://reddit.com/link/1sfj9dt/video/im1ywh7gcwtg1/player
DreamLite \- On-device 1024x1024 image gen and editing in under a second on a smartphone. (I couldnt find models on HF) GitHub
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1sfj9dt
@rStableDiffusion
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from the last week:
GEMS \- Closed-loop system for spatial logic and text rendering in image generation. Outperforms Nano Banana 2 on GenEval2. [GitHub](https://github.com/lcqysl/GEMS) | [Paper](https://arxiv.org/abs/2603.28088)
https://preview.redd.it/16r9ffhd9wtg1.png?width=1456&format=png&auto=webp&s=325ef8a75d23cfa625ac33dfd4d9727c690c11b0
ComfyUI Post-Processing Suite \- Photorealism suite by thezveroboy. Simulates sensor noise, analog artifacts, and camera metadata with base64 EXIF transfer and calibrated DNG writing. GitHub
https://preview.redd.it/mhs0fi5f9wtg1.png?width=990&format=png&auto=webp&s=716128b81d8dd091615d3ede8f0acbcb3d1327a6
CutClaw \- Open multi-agent video editing framework. Autonomously cuts hours of footage into narrative shorts. [Paper](https://arxiv.org/abs/2603.29664) | [GitHub](https://github.com/GVCLab/CutClaw) | [Hugging Face](https://huggingface.co/papers/2603.29664)
https://reddit.com/link/1sfj9dt/video/uw4oz84j9wtg1/player
Netflix VOID \- Video object deletion with physics simulation. Built on CogVideoX-5B and SAM 2. Project | Hugging Face Space
https://reddit.com/link/1sfj9dt/video/1vzz6zck9wtg1/player
Flux FaceIR \- Flux-2-klein LoRA for blind or reference-guided face restoration. [GitHub](https://github.com/cosmicrealm/ComfyUI-Flux-FaceIR)
https://preview.redd.it/05o2181m9wtg1.png?width=1456&format=png&auto=webp&s=691420332c1e42d9511c7d1cbecf305a5d885d67
Flux-restoration \- Unified face restoration LoRA on FLUX.2-klein-base-4B. GitHub
https://preview.redd.it/l69v7cfn9wtg1.png?width=1456&format=png&auto=webp&s=1711dc1321b997d4247e5db0ac8e13ec4e56180b
LTX2.3 Cameraman LoRA \- Transfers camera motion from reference videos to new scenes. No trigger words. [Hugging Face](https://huggingface.co/Cseti/LTX2.3-22B_IC-LoRA-Cameraman_v1)
https://reddit.com/link/1sfj9dt/video/v8jl2nlq9wtg1/player
Honorable Mentions:
Gen-Searcher \- Agentic search image generation across styles. Hugging Face | GitHub
https://preview.redd.it/suqsu3et9wtg1.png?width=1268&format=png&auto=webp&s=8008783b5d3e298703a8673b6a15c54f4d2155bd
OmniVoice \- 600+ language TTS with voice cloning. [Hugging Face](https://huggingface.co/k2-fsa/OmniVoice) | [ComfyUI](https://github.com/Saganaki22/ComfyUI-OmniVoice-TTS)
https://reddit.com/link/1sfj9dt/video/im1ywh7gcwtg1/player
DreamLite \- On-device 1024x1024 image gen and editing in under a second on a smartphone. (I couldnt find models on HF) GitHub
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1sfj9dt
@rStableDiffusion
GitHub
GitHub - lcqysl/GEMS: GEMS: Agent-Native Multimodal Generation with Memory and Skills
GEMS: Agent-Native Multimodal Generation with Memory and Skills - lcqysl/GEMS
A new SOTA local video model (HappyHorse 1.0) will be released in april 10th.
https://redd.it/1sfo3dq
@rStableDiffusion
https://redd.it/1sfo3dq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: A new SOTA local video model (HappyHorse 1.0) will be released in april 10th.
Explore this post and more from the StableDiffusion community
Built a tool for anyone drowning in huge image folders: HybridScorer
https://redd.it/1sg5paj
@rStableDiffusion
https://redd.it/1sg5paj
@rStableDiffusion
Anima Preview 3 is out and its better than illustrious or pony.
this is the biggest potential "best diffuser ever" for anime kind of diffusers. just take a look at it on civitai try it and you will never want to use illustrious or pony ever again.
https://redd.it/1sgfjbs
@rStableDiffusion
this is the biggest potential "best diffuser ever" for anime kind of diffusers. just take a look at it on civitai try it and you will never want to use illustrious or pony ever again.
https://redd.it/1sgfjbs
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Vibe Code Your First ComfyUI Custom Node Step by Step (Ep12)
https://www.youtube.com/watch?v=oiiCkrX8hq4
https://redd.it/1sfvnnz
@rStableDiffusion
https://www.youtube.com/watch?v=oiiCkrX8hq4
https://redd.it/1sfvnnz
@rStableDiffusion
YouTube
Vibe Code Your First ComfyUI Custom Node Step by Step (Ep12)
Learn how to create your first ComfyUI custom node step by step with AI, even if you have no coding experience. In this episode, I show how to vibe code a working custom node for ComfyUI using tools like Gemini and Claude, how custom nodes are structured…
ACE-Step 1.5 XL Turbo — BF16 version (converted from FP32)
I converted the ACE-Step 1.5 XL Turbo model from FP32 to BF16.
The original weights were \~18.8 GB in FP32, this version is \~9.97 GB — same quality, lower VRAM usage.
🤗 https://huggingface.co/marcorez8/acestep-v15-xl-turbo-bf16
https://redd.it/1sgiqg7
@rStableDiffusion
I converted the ACE-Step 1.5 XL Turbo model from FP32 to BF16.
The original weights were \~18.8 GB in FP32, this version is \~9.97 GB — same quality, lower VRAM usage.
🤗 https://huggingface.co/marcorez8/acestep-v15-xl-turbo-bf16
https://redd.it/1sgiqg7
@rStableDiffusion
huggingface.co
ACE-Step/acestep-v15-xl-turbo · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Qwen 2512 is so Underrated, prompt understanding is really great, only Flux 2 Dev is better. I'm using Q4KS with 4-6 steps and it is fast (20-30 sec per gen), almost as fast as Anima model. It just need that LoRA love from the community.
https://redd.it/1sgnfv0
@rStableDiffusion
https://redd.it/1sgnfv0
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Qwen 2512 is so Underrated, prompt understanding is really great, only Flux 2 Dev…
Explore this post and more from the StableDiffusion community