FMInference/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.
Language: Python
#chatgpt #deep_learning #gpt_3 #high_throughput #large_language_models #machine_learning #offloading #opt
Stars: 1799 Issues: 11 Forks: 72
https://github.com/FMInference/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.
Language: Python
#chatgpt #deep_learning #gpt_3 #high_throughput #large_language_models #machine_learning #offloading #opt
Stars: 1799 Issues: 11 Forks: 72
https://github.com/FMInference/FlexGen
GitHub
GitHub - FMInference/FlexiGen: Running large language models on a single GPU for throughput-oriented scenarios.
Running large language models on a single GPU for throughput-oriented scenarios. - FMInference/FlexiGen