zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
Language: C++
#cpm #cuda #gpt #inference_engine #llama #llm #llm_serving #minicpm #pytorch #qwen
Stars: 192 Issues: 1 Forks: 16
https://github.com/zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
Language: C++
#cpm #cuda #gpt #inference_engine #llama #llm #llm_serving #minicpm #pytorch #qwen
Stars: 192 Issues: 1 Forks: 16
https://github.com/zhihu/ZhiLight
GitHub
GitHub - zhihu/ZhiLight: A highly optimized LLM inference acceleration engine for Llama and its variants.
A highly optimized LLM inference acceleration engine for Llama and its variants. - zhihu/ZhiLight
👍1
therealoliver/Deepdive-llama3-from-scratch
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
Language: Jupyter Notebook
#attention #attention_mechanism #gpt #inference #kv_cache #language_model #llama #llm_configuration #llms #mask #multi_head_attention #positional_encoding #residuals #rms #rms_norm #rope #rotary_position_encoding #swiglu #tokenizer #transformer
Stars: 388 Issues: 0 Forks: 28
https://github.com/therealoliver/Deepdive-llama3-from-scratch
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
Language: Jupyter Notebook
#attention #attention_mechanism #gpt #inference #kv_cache #language_model #llama #llm_configuration #llms #mask #multi_head_attention #positional_encoding #residuals #rms #rms_norm #rope #rotary_position_encoding #swiglu #tokenizer #transformer
Stars: 388 Issues: 0 Forks: 28
https://github.com/therealoliver/Deepdive-llama3-from-scratch
GitHub
GitHub - therealoliver/Deepdive-llama3-from-scratch: Achieve the llama3 inference step-by-step, grasp the core concepts, master…
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code. - therealoliver/Deepdive-llama3-from-scratch
👍1