dipampaul17/KVSplit
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
Language:Python
Total stars: 144
Stars trend:
#python
#applesilicon, #generativeai, #kvcache, #llamacpp, #llm, #m1, #m2, #m3, #memoryoptimization, #metal, #optimization, #quantization
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
Language:Python
Total stars: 144
Stars trend:
16 May 2025
7pm ▏ +1
8pm █████▌ +44
9pm ████▊ +38
10pm ███▋ +29
11pm ██▎ +18
#python
#applesilicon, #generativeai, #kvcache, #llamacpp, #llm, #m1, #m2, #m3, #memoryoptimization, #metal, #optimization, #quantization