Trained a Vit model from scratch for auto tagging
I recently trained a new anime image tagging model. To prep the data, I used SmilingWolf v3 to fix 300k bad tags and fill in 1M missing ones. I also trained an initial baseline model to help identify and add around 30k low-frequency tags.
The current V1 model is a 320x320 ViT. V1.1 is currently training at 448x448, and the higher resolution is already improving accuracy. My next goal is to wait for a 2025 dataset, clean it heavily, and train from scratch with better vocab structures (e.g.,
You can find the model, card, and demo space on HuggingFace: https://huggingface.co/Grio43/OppaiOracle Live use of the model: https://huggingface.co/spaces/Grio43/OppaiOracle
CPU based tagger
https://huggingface.co/spaces/Grio43/OppaiCPU
https://redd.it/1t8bzb3
@rStableDiffusion
I recently trained a new anime image tagging model. To prep the data, I used SmilingWolf v3 to fix 300k bad tags and fill in 1M missing ones. I also trained an initial baseline model to help identify and add around 30k low-frequency tags.
The current V1 model is a 320x320 ViT. V1.1 is currently training at 448x448, and the higher resolution is already improving accuracy. My next goal is to wait for a 2025 dataset, clean it heavily, and train from scratch with better vocab structures (e.g.,
artist:name).You can find the model, card, and demo space on HuggingFace: https://huggingface.co/Grio43/OppaiOracle Live use of the model: https://huggingface.co/spaces/Grio43/OppaiOracle
CPU based tagger
https://huggingface.co/spaces/Grio43/OppaiCPU
https://redd.it/1t8bzb3
@rStableDiffusion
Its still nuts to me how realistic AI is getting, incredible i can run it on a RTX2060 and get these results. (Z-image-Turbo)
https://redd.it/1t8ehyj
@rStableDiffusion
https://redd.it/1t8ehyj
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Its still nuts to me how realistic AI is getting, incredible i can run it on a RTX2060…
Explore this post and more from the StableDiffusion community
Media is too big
VIEW IN TELEGRAM
Anyone else using LTX locally on Mac via Draw Things? Here’s a WWII-style short I made.
https://redd.it/1t8lagy
@rStableDiffusion
https://redd.it/1t8lagy
@rStableDiffusion
How I feel after upvoting a post that got downvoted by bots for mentioning Forge Neo.
https://redd.it/1t8oha2
@rStableDiffusion
https://redd.it/1t8oha2
@rStableDiffusion
Wan 2.2 with LTX 2.3 ID-LoRA
Wan 2.2 with LTX 2.3 ID-LoRA workflow
This is a workflow that combines the Comfy Wan 2.2 image-to-video workflow with the Comfy LTX 2.3 ID-LoRA workflow. You can use Wan 2.2 to make your initial video then it will automatically run through LTX 2.3 to add audio to your Wan 2.2 video and extend the Wan 2.2 video with whatever you want to happen next.
Wan 2.2 image-to-video of Crystal Sparkle throwing a champagne bottle against a yacht to christen the yacht
LTX 2.3 adds the foley audio to the Wan 2.2 clip for bottle smashing against boat and ID-LoRA adds Crystal Sparkle's actual voice
Here is a link to the workflow: https://huggingface.co/ussaaron/workflows/blob/main/wan2\_2\_i2v-with-ltx-id-lora.json
https://redd.it/1t8qloh
@rStableDiffusion
Wan 2.2 with LTX 2.3 ID-LoRA workflow
This is a workflow that combines the Comfy Wan 2.2 image-to-video workflow with the Comfy LTX 2.3 ID-LoRA workflow. You can use Wan 2.2 to make your initial video then it will automatically run through LTX 2.3 to add audio to your Wan 2.2 video and extend the Wan 2.2 video with whatever you want to happen next.
Wan 2.2 image-to-video of Crystal Sparkle throwing a champagne bottle against a yacht to christen the yacht
LTX 2.3 adds the foley audio to the Wan 2.2 clip for bottle smashing against boat and ID-LoRA adds Crystal Sparkle's actual voice
Here is a link to the workflow: https://huggingface.co/ussaaron/workflows/blob/main/wan2\_2\_i2v-with-ltx-id-lora.json
https://redd.it/1t8qloh
@rStableDiffusion
Hi-Dream 01 Out : 2k Images in 20seconds on a 4090 (fp8 dev) ComfyUI
https://redd.it/1t8ypmd
@rStableDiffusion
https://redd.it/1t8ypmd
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Hi-Dream 01 Out : 2k Images in 20seconds on a 4090 (fp8 dev) ComfyUI
Explore this post and more from the StableDiffusion community