ggufy: easy quantization for the GPU poor
Hello.
I was frustrated by the lack of tooling around image model conversion / quantization, or the extreme RAM requirements and complexity of the scant existing tooling, so I wrote my own. People have said I should post it here, so here it is:
https://github.com/qskousen/ggufy
It has a CLI and a GUI. The GUI is easy to use, you can drag and drop files in. Both CLI and GUI are single-file executables, written in Zig because I like writing in Zig. It's pretty efficient with RAM, and takes about 1.5 minutes to quantize ZiT on my machine.
It supports all the main models that I am aware of, and you can convert to/from gguf or safetensors. It supports I think all the datatypes that are generally supported, such as q3k through q80, f32, bf16, f16, f8e4m3, f8e5m2, scaled fp8, mxfp8, and nvfp4. It doesn't do SDNQ yet, but I would like to add it if I can get some time to figure out the format.
It's cross platform, and builds for Linux, Windows, and MacOS (both ARM64 and x86). Github Actions pre-built binaries are available on the releases page.
If there are features you think are in scope and would be useful, or additional models or formats that it doesn't support yet, please open an issue or let me know here. Thanks.
Cross-posted to r/ComfyUI.
https://redd.it/1tj5nhq
@rStableDiffusion
Hello.
I was frustrated by the lack of tooling around image model conversion / quantization, or the extreme RAM requirements and complexity of the scant existing tooling, so I wrote my own. People have said I should post it here, so here it is:
https://github.com/qskousen/ggufy
It has a CLI and a GUI. The GUI is easy to use, you can drag and drop files in. Both CLI and GUI are single-file executables, written in Zig because I like writing in Zig. It's pretty efficient with RAM, and takes about 1.5 minutes to quantize ZiT on my machine.
It supports all the main models that I am aware of, and you can convert to/from gguf or safetensors. It supports I think all the datatypes that are generally supported, such as q3k through q80, f32, bf16, f16, f8e4m3, f8e5m2, scaled fp8, mxfp8, and nvfp4. It doesn't do SDNQ yet, but I would like to add it if I can get some time to figure out the format.
It's cross platform, and builds for Linux, Windows, and MacOS (both ARM64 and x86). Github Actions pre-built binaries are available on the releases page.
If there are features you think are in scope and would be useful, or additional models or formats that it doesn't support yet, please open an issue or let me know here. Thanks.
Cross-posted to r/ComfyUI.
https://redd.it/1tj5nhq
@rStableDiffusion
GitHub
GitHub - qskousen/ggufy: CLI/GUI tool for efficient and easy safetensors and gguf model conversion
CLI/GUI tool for efficient and easy safetensors and gguf model conversion - qskousen/ggufy