Trained a Vit model from scratch for auto tagging

I recently trained a new anime image tagging model. To prep the data, I used SmilingWolf v3 to fix 300k bad tags and fill in 1M missing ones. I also trained an initial baseline model to help identify and add around 30k low-frequency tags.

The current V1 model is a 320x320 ViT. V1.1 is currently training at 448x448, and the higher resolution is already improving accuracy. My next goal is to wait for a 2025 dataset, clean it heavily, and train from scratch with better vocab structures (e.g., artist:name).

You can find the model, card, and demo space on HuggingFace: https://huggingface.co/Grio43/OppaiOracle Live use of the model: https://huggingface.co/spaces/Grio43/OppaiOracle

CPU based tagger
https://huggingface.co/spaces/Grio43/OppaiCPU

https://redd.it/1t8bzb3
@rStableDiffusion
Media is too big
VIEW IN TELEGRAM
Anyone else using LTX locally on Mac via Draw Things? Here’s a WWII-style short I made.

https://redd.it/1t8lagy
@rStableDiffusion
How I feel after upvoting a post that got downvoted by bots for mentioning Forge Neo.
https://redd.it/1t8oha2
@rStableDiffusion
Wan 2.2 with LTX 2.3 ID-LoRA

Wan 2.2 with LTX 2.3 ID-LoRA workflow

This is a workflow that combines the Comfy Wan 2.2 image-to-video workflow with the Comfy LTX 2.3 ID-LoRA workflow. You can use Wan 2.2 to make your initial video then it will automatically run through LTX 2.3 to add audio to your Wan 2.2 video and extend the Wan 2.2 video with whatever you want to happen next.

Wan 2.2 image-to-video of Crystal Sparkle throwing a champagne bottle against a yacht to christen the yacht

LTX 2.3 adds the foley audio to the Wan 2.2 clip for bottle smashing against boat and ID-LoRA adds Crystal Sparkle's actual voice

Here is a link to the workflow: https://huggingface.co/ussaaron/workflows/blob/main/wan2\_2\_i2v-with-ltx-id-lora.json

https://redd.it/1t8qloh
@rStableDiffusion