#python #auto_tuning #deep_learning #knowledge_distillation #low_precision #post_training_quantization #pruning #quantization #quantization_aware_training #sparsity
https://github.com/intel/neural-compressor
https://github.com/intel/neural-compressor
GitHub
GitHub - intel/neural-compressor: SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression…
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime - intel/neural-compressor