Reddit Programming
208 subscribers
1.22K photos
124K links
I will send you newest post from subreddit /r/programming
Download Telegram
Part 3: Building LLMs from Scratch – Model Architecture & GPU Training [Follow-up to Part 1 and 2]
https://www.reddit.com/r/programming/comments/1olwg7b/part_3_building_llms_from_scratch_model/

<!-- SC_OFF -->I’m excited to share Part 3 of my series on building an LLM from scratch. This installment dives into the guts of model architecture, multi-GPU training, memory-precision tricks, checkpointing & inference. What you’ll find inside: Two model sizes (117M & 354M parameters) and how we designed the architecture. Multi-GPU training setup: how to handle memory constraints, fp16/bf16 precision, distributed training. Experiment tracking (thanks Weights & Biases), checkpointing strategies, resume logic for long runs. Converting PyTorch checkpoints into a deployable format for inference / sharing. Real-world mistakes and learnings: out-of-memory errors, data-shape mismatches, GPU tuning headaches. Why it matters:
Even if your data pipeline and tokenizer (see Part 2) are solid, your model architecture and infrastructure matter just as much — otherwise you’ll spend more time debugging than training. This post shows how to build a robust training pipeline that actually scales. If you’ve followed along from Part 1 and Part 2, thanks for sticking with it — and if you’re just now jumping in, you can catch up on those earlier posts (links below). Resources: 🔗 Blog post (https://blog.desigeek.com/post/2025/11/building-llm-from-scratch-part3-model-architecture-gpu-training/) 🔗 GitHub codebase (https://github.com/bahree/helloLondon) 🔗Part 2: Data Collection & Custom Tokenizers (https://www.reddit.com/r/programming/comments/1o56elg/building_llms_from_scratch_part_2_data_collection/) 🔗Part 1: Quick Start & Overview (https://www.reddit.com/r/programming/comments/1nq0166/a_step_by_step_guide_on_how_to_build_a_llm_from/) 🔗 LinkedIn Post (https://www.linkedin.com/posts/amitbahree_ai-llm-generativeai-activity-7390442713931767808-xSfS) - If that is your thing. <!-- SC_ON --> submitted by /u/amitbahree (https://www.reddit.com/user/amitbahree)
[link] (https://blog.desigeek.com/post/2025/11/building-llm-from-scratch-part3-model-architecture-gpu-training/) [comments] (https://www.reddit.com/r/programming/comments/1olwg7b/part_3_building_llms_from_scratch_model/)
When Logs Become Chains: The Hidden Danger of Synchronous Logging
https://www.reddit.com/r/programming/comments/1omb1wa/when_logs_become_chains_the_hidden_danger_of/

<!-- SC_OFF -->Most applications log synchronously without thinking twice. When your code calls logger.info(”User logged in”), it doesn’t just fire-and-forget. It waits. The thread blocks until that log entry hits disk or gets acknowledged by your logging service. In normal times, this takes microseconds. But when your logging infrastructure slows down—perhaps your log aggregator is under load, or your disk is experiencing high I/O wait—those microseconds become milliseconds, then seconds. Your application thread pool drains like water through a sieve. Here’s the brutal math: If you have 200 worker threads and each log write takes 2 seconds instead of 2 milliseconds, you can only handle 100 requests per second instead of 100,000. Your application didn’t break. Your logs did. https://systemdr.substack.com/p/when-logs-become-chains-the-hidden https://www.youtube.com/watch?v=pgiHV3Ns0ac&list=PLL6PVwiVv1oR27XfPfJU4_GOtW8Pbwog4 <!-- SC_ON --> submitted by /u/Extra_Ear_10 (https://www.reddit.com/user/Extra_Ear_10)
[link] (https://systemdr.substack.com/p/when-logs-become-chains-the-hidden) [comments] (https://www.reddit.com/r/programming/comments/1omb1wa/when_logs_become_chains_the_hidden_danger_of/)