hpcaitech/SwiftInfer
Efficient AI Inference & Serving
Language: Python
#artificial_intelligence #deep_learning #gpt #inference #llama #llama2 #llm_inference #llm_serving
Stars: 299 Issues: 3 Forks: 14
https://github.com/hpcaitech/SwiftInfer
Efficient AI Inference & Serving
Language: Python
#artificial_intelligence #deep_learning #gpt #inference #llama #llama2 #llm_inference #llm_serving
Stars: 299 Issues: 3 Forks: 14
https://github.com/hpcaitech/SwiftInfer
GitHub
GitHub - hpcaitech/SwiftInfer: Efficient AI Inference & Serving
Efficient AI Inference & Serving. Contribute to hpcaitech/SwiftInfer development by creating an account on GitHub.
zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
Language: C++
#cpm #cuda #gpt #inference_engine #llama #llm #llm_serving #minicpm #pytorch #qwen
Stars: 192 Issues: 1 Forks: 16
https://github.com/zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
Language: C++
#cpm #cuda #gpt #inference_engine #llama #llm #llm_serving #minicpm #pytorch #qwen
Stars: 192 Issues: 1 Forks: 16
https://github.com/zhihu/ZhiLight
GitHub
GitHub - zhihu/ZhiLight: A highly optimized LLM inference acceleration engine for Llama and its variants.
A highly optimized LLM inference acceleration engine for Llama and its variants. - zhihu/ZhiLight
MoonshotAI/MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
Language: Python
#flash_attention #llm #llm_serving #llm_training #moe #pytorch #transformer
Stars: 521 Issues: 2 Forks: 16
https://github.com/MoonshotAI/MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
Language: Python
#flash_attention #llm #llm_serving #llm_training #moe #pytorch #transformer
Stars: 521 Issues: 2 Forks: 16
https://github.com/MoonshotAI/MoBA
GitHub
GitHub - MoonshotAI/MoBA: MoBA: Mixture of Block Attention for Long-Context LLMs
MoBA: Mixture of Block Attention for Long-Context LLMs - MoonshotAI/MoBA