This media is not supported in your browser
VIEW IN TELEGRAM
GPU by hand ✍️ I drew this to show how a GPU speeds up an array operation of 8 elements in parallel over 4 threads in 2 clock cycles. Read more 👇
CPU
• It has one core.
• Its global memory has 120 locations (0-119).
• To use the GPU, it needs to copy data from the global memory to the GPU.
• After GPU is done, it will copy the results back.
GPU
• It has four cores to run four threads (0-3).
• It has a register file of 28 locations (0-27)
• This register file has four banks (0-3).
• All threads share the same register file.
• But they must read/write using the four banks.
• Each bank allows 2 reads (Read 0, Read 1) and 1 write in a single clock cycle.
✉️ Our Telegram channels: https://t.me/addlist/0f6vfFbEMdAwODBk
CPU
• It has one core.
• Its global memory has 120 locations (0-119).
• To use the GPU, it needs to copy data from the global memory to the GPU.
• After GPU is done, it will copy the results back.
GPU
• It has four cores to run four threads (0-3).
• It has a register file of 28 locations (0-27)
• This register file has four banks (0-3).
• All threads share the same register file.
• But they must read/write using the four banks.
• Each bank allows 2 reads (Read 0, Read 1) and 1 write in a single clock cycle.
#AIEngineering #MachineLearning #DeepLearning #LLMs #RAG #MLOps #Python #GitHubProjects #AIForBeginners #ArtificialIntelligence #NeuralNetworks #OpenSourceAI #DataScienceCareers
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4