GenAi, Deep Learning and Computer Vision
3.22K subscribers
31 photos
5 videos
5 files
168 links
Deep Learning๐Ÿ’ก,
Computer Vision ๐Ÿ“ฝ &
#Ai ๐Ÿง 

Get #free_books,
#Online_courses,
#Research_papers,
#Codes, and #Projects,
Tricks and Hacks, coding, training Stuff

Suggestion @AIindian
Download Telegram
How Much GPU Memory Needed To Server A LLM ?

This is a common question that consistnetly comes up in interview or during the disscusiion with your business stakeholders.

And itโ€™s not just a random question โ€” itโ€™s a key indicator of how well you understand the deployment and scalability of these powerful models in production.

As a data scientist understanding and estimating the require GPU memory is essential.

LLM's (Large Language Models) size vary from 7 billion parameters to trillions of parameters. One size certainly doesnโ€™t fit all.

Letโ€™s dive into the math that will help you estimate the GPU memory needed for deploying these models effectively.

๐“๐ก๐ž ๐Ÿ๐จ๐ซ๐ฆ๐ฎ๐ฅ๐š ๐ญ๐จ ๐ž๐ฌ๐ญ๐ข๐ฆ๐š๐ญ๐ž ๐†๐๐” ๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐ข๐ฌ

General formula, ๐ฆ = ((๐ * ๐ฌ๐ข๐ณ๐ž ๐ฉ๐ž๐ซ ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ)/๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐๐ž๐ง๐ฌ๐ข๐ญ๐ฒ) * ๐จ๐ฏ๐ž๐ซ๐ก๐ž๐š๐ ๐Ÿ๐š๐œ๐ญ๐จ๐ซ

Where:
- ๐ฆ is the GPU memory in Gigabytes.
- ๐ฉ is the number of parameters in the model.
- ๐ฌ๐ข๐ณ๐ž ๐ฉ๐ž๐ซ ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ typically refers to the bytes needed for each model parameter, which is typically 4 bytes for float32 precision.
- ๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐๐ž๐ง๐ฌ๐ข๐ญ๐ฒ (q) refer to the number of bits typically processed in parallel, such as 32 bits for a typical GPU memory channel.
- ๐จ๐ฏ๐ž๐ซ๐ก๐ž๐š๐ ๐Ÿ๐š๐œ๐ญ๐จ๐ซ is often applied (e.g., 1.2) to account for additional memory needed beyond just storing parameters, such as activations, temporary tensors, and any memory fragmentation or padding.

๐’๐ข๐ฆ๐ฉ๐ฅ๐ข๐Ÿ๐ข๐ž๐ ๐…๐จ๐ซ๐ฆ๐ฎ๐ฅ๐š:

M = ((P * 4B)/(32/Q)) * 1.2

With this formula in hand, I hope you'll feel more confident when discussing GPU memory requirements with your business stakeholders.

#LLM