A step by step guide on how to build a LLM from scratch
https://www.reddit.com/r/programming/comments/1nq0166/a_step_by_step_guide_on_how_to_build_a_llm_from/
<!-- SC_OFF -->I wanted to share this here and hopefully it will help some folks to get deeper in this and help learn. I just published a comprehensive guide on how to build a LLM from scratch using historical London texts from 1500-1850. What I Built: Two identical models (117M & 354M parameters) trained from scratch Custom historical tokenizer with 30k vocabulary + 150+ special tokens for archaic English Complete data pipeline processing 218+ historical sources (500M+ characters) Production-ready training with multi-GPU support, WandB integration, and checkpointing Published models on Hugging Face ready for immediate use Why This Matters: Most LLM guides focus on fine-tuning existing models. This series shows you how to build from the ground up—eliminating modern biases and creating models that truly understand historical language patterns, cultural contexts, and period-specific knowledge. Resources: Blog Series: https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/ Complete Codebase: https://github.com/bahree/helloLondon Published Models: https://huggingface.co/bahree/london-historical-slm LinkedIn (if that's your thing): https://www.linkedin.com/feed/update/urn:li:share:7376863225306365952/ The models are already working and generating authentic 18th-century London text. Perfect for developers who want to understand the complete LLM development pipeline. Shoutout: Big thanks to u/Remarkable-Trick-177 (https://www.reddit.com/user/Remarkable-Trick-177/) for the inspiration! <!-- SC_ON --> submitted by /u/amitbahree (https://www.reddit.com/user/amitbahree)
[link] (https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/) [comments] (https://www.reddit.com/r/programming/comments/1nq0166/a_step_by_step_guide_on_how_to_build_a_llm_from/)
https://www.reddit.com/r/programming/comments/1nq0166/a_step_by_step_guide_on_how_to_build_a_llm_from/
<!-- SC_OFF -->I wanted to share this here and hopefully it will help some folks to get deeper in this and help learn. I just published a comprehensive guide on how to build a LLM from scratch using historical London texts from 1500-1850. What I Built: Two identical models (117M & 354M parameters) trained from scratch Custom historical tokenizer with 30k vocabulary + 150+ special tokens for archaic English Complete data pipeline processing 218+ historical sources (500M+ characters) Production-ready training with multi-GPU support, WandB integration, and checkpointing Published models on Hugging Face ready for immediate use Why This Matters: Most LLM guides focus on fine-tuning existing models. This series shows you how to build from the ground up—eliminating modern biases and creating models that truly understand historical language patterns, cultural contexts, and period-specific knowledge. Resources: Blog Series: https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/ Complete Codebase: https://github.com/bahree/helloLondon Published Models: https://huggingface.co/bahree/london-historical-slm LinkedIn (if that's your thing): https://www.linkedin.com/feed/update/urn:li:share:7376863225306365952/ The models are already working and generating authentic 18th-century London text. Perfect for developers who want to understand the complete LLM development pipeline. Shoutout: Big thanks to u/Remarkable-Trick-177 (https://www.reddit.com/user/Remarkable-Trick-177/) for the inspiration! <!-- SC_ON --> submitted by /u/amitbahree (https://www.reddit.com/user/amitbahree)
[link] (https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/) [comments] (https://www.reddit.com/r/programming/comments/1nq0166/a_step_by_step_guide_on_how_to_build_a_llm_from/)
Table sorting
https://www.reddit.com/r/programming/comments/1nq05mi/table_sorting/
<!-- SC_OFF -->Yes, that simple table sorting. 10 years ago when I started my career that was the "take home" assignment. Today after trying to sort some simple values from a website I am amazed this problem was not solved yet. Just include the god damn sorting in the HTML spec and be done with it. Every table everywhere gets sort capabilities without coding. Thanks for reading my 3AM rant. <!-- SC_ON --> submitted by /u/FrostyCartoonist8523 (https://www.reddit.com/user/FrostyCartoonist8523)
[link] (https://localhost.com/) [comments] (https://www.reddit.com/r/programming/comments/1nq05mi/table_sorting/)
https://www.reddit.com/r/programming/comments/1nq05mi/table_sorting/
<!-- SC_OFF -->Yes, that simple table sorting. 10 years ago when I started my career that was the "take home" assignment. Today after trying to sort some simple values from a website I am amazed this problem was not solved yet. Just include the god damn sorting in the HTML spec and be done with it. Every table everywhere gets sort capabilities without coding. Thanks for reading my 3AM rant. <!-- SC_ON --> submitted by /u/FrostyCartoonist8523 (https://www.reddit.com/user/FrostyCartoonist8523)
[link] (https://localhost.com/) [comments] (https://www.reddit.com/r/programming/comments/1nq05mi/table_sorting/)
Yt-dlp: Soon you'll need Deno or another supported JS runtime, to keep YouTube downloads working as normal.
https://www.reddit.com/r/programming/comments/1nq0shd/ytdlp_soon_youll_need_deno_or_another_supported/
submitted by /u/TheTwelveYearOld (https://www.reddit.com/user/TheTwelveYearOld)
[link] (https://github.com/yt-dlp/yt-dlp/issues/14404) [comments] (https://www.reddit.com/r/programming/comments/1nq0shd/ytdlp_soon_youll_need_deno_or_another_supported/)
https://www.reddit.com/r/programming/comments/1nq0shd/ytdlp_soon_youll_need_deno_or_another_supported/
submitted by /u/TheTwelveYearOld (https://www.reddit.com/user/TheTwelveYearOld)
[link] (https://github.com/yt-dlp/yt-dlp/issues/14404) [comments] (https://www.reddit.com/r/programming/comments/1nq0shd/ytdlp_soon_youll_need_deno_or_another_supported/)
Create systems of equations and basic algebra app
https://www.reddit.com/r/programming/comments/1nq0tif/create_systems_of_equations_and_basic_algebra_app/
<!-- SC_OFF -->I want to create an app to 1. Use letters, Greek characters and subscripts for variable and equations. So typing “/Omega” will make Ω appear in its place. Perhaps there would be a panel of Greek characters I could click on as well 2. Input known variables 3. Input the equations apart of the system of equations 4. Automatically solve the system of equations 5. Add additional equations that will utilize the results of solving the system of equations. 6. Store equations I use over and over that I can quickly select. 7. The equations and values will need to be changeable at any point of the process I want the UI to be clean and the subscripts to actually look like subscripts when they are imputed and outputted. What language(s) should I use to create this? <!-- SC_ON --> submitted by /u/SkiMtVidGame-aineer (https://www.reddit.com/user/SkiMtVidGame-aineer)
[link] (http://thisisnotareallink.com/) [comments] (https://www.reddit.com/r/programming/comments/1nq0tif/create_systems_of_equations_and_basic_algebra_app/)
https://www.reddit.com/r/programming/comments/1nq0tif/create_systems_of_equations_and_basic_algebra_app/
<!-- SC_OFF -->I want to create an app to 1. Use letters, Greek characters and subscripts for variable and equations. So typing “/Omega” will make Ω appear in its place. Perhaps there would be a panel of Greek characters I could click on as well 2. Input known variables 3. Input the equations apart of the system of equations 4. Automatically solve the system of equations 5. Add additional equations that will utilize the results of solving the system of equations. 6. Store equations I use over and over that I can quickly select. 7. The equations and values will need to be changeable at any point of the process I want the UI to be clean and the subscripts to actually look like subscripts when they are imputed and outputted. What language(s) should I use to create this? <!-- SC_ON --> submitted by /u/SkiMtVidGame-aineer (https://www.reddit.com/user/SkiMtVidGame-aineer)
[link] (http://thisisnotareallink.com/) [comments] (https://www.reddit.com/r/programming/comments/1nq0tif/create_systems_of_equations_and_basic_algebra_app/)
A smart way to get C++ speed for voice AI in Python: a look at the TEN framework
https://www.reddit.com/r/programming/comments/1nq11of/a_smart_way_to_get_c_speed_for_voice_ai_in_python/
<!-- SC_OFF -->We all know that getting real-time performance in Python can be tricky, especially with I/O-heavy tasks like audio streaming. I've been looking for a good way to tackle this without having to rewrite everything in C++. I recently stumbled upon the TEN framework, and its architecture is clever. It uses a high-performance C++ core for the heavy lifting but has a clean, first-class Python API. Their new v0.10 release really refines this, so you can write all your main logic in Python and let the C++ backend handle the speed-critical parts. It’s the same hybrid approach that makes libraries like NumPy so powerful. They've also built out a whole suite of tools for things like voice activity and turn detection, so you're not starting from scratch. If you're building any application where responsiveness is critical, this project is definitely worth a look. It seems like it's built by engineers who've actually faced these problems before. <!-- SC_ON --> submitted by /u/Global-Biscotti-8449 (https://www.reddit.com/user/Global-Biscotti-8449)
[link] (https://github.com/TEN-framework) [comments] (https://www.reddit.com/r/programming/comments/1nq11of/a_smart_way_to_get_c_speed_for_voice_ai_in_python/)
https://www.reddit.com/r/programming/comments/1nq11of/a_smart_way_to_get_c_speed_for_voice_ai_in_python/
<!-- SC_OFF -->We all know that getting real-time performance in Python can be tricky, especially with I/O-heavy tasks like audio streaming. I've been looking for a good way to tackle this without having to rewrite everything in C++. I recently stumbled upon the TEN framework, and its architecture is clever. It uses a high-performance C++ core for the heavy lifting but has a clean, first-class Python API. Their new v0.10 release really refines this, so you can write all your main logic in Python and let the C++ backend handle the speed-critical parts. It’s the same hybrid approach that makes libraries like NumPy so powerful. They've also built out a whole suite of tools for things like voice activity and turn detection, so you're not starting from scratch. If you're building any application where responsiveness is critical, this project is definitely worth a look. It seems like it's built by engineers who've actually faced these problems before. <!-- SC_ON --> submitted by /u/Global-Biscotti-8449 (https://www.reddit.com/user/Global-Biscotti-8449)
[link] (https://github.com/TEN-framework) [comments] (https://www.reddit.com/r/programming/comments/1nq11of/a_smart_way_to_get_c_speed_for_voice_ai_in_python/)
How to implement the Outbox pattern in Go and Postgres
https://www.reddit.com/r/programming/comments/1nq1sob/how_to_implement_the_outbox_pattern_in_go_and/
submitted by /u/der_gopher (https://www.reddit.com/user/der_gopher)
[link] (https://packagemain.tech/p/how-to-implement-the-outbox-pattern-in-golang) [comments] (https://www.reddit.com/r/programming/comments/1nq1sob/how_to_implement_the_outbox_pattern_in_go_and/)
https://www.reddit.com/r/programming/comments/1nq1sob/how_to_implement_the_outbox_pattern_in_go_and/
submitted by /u/der_gopher (https://www.reddit.com/user/der_gopher)
[link] (https://packagemain.tech/p/how-to-implement-the-outbox-pattern-in-golang) [comments] (https://www.reddit.com/r/programming/comments/1nq1sob/how_to_implement_the_outbox_pattern_in_go_and/)
Parallel Streaming Pattern in Go: How to Scan Large S3 or GCS Buckets Significantly Faster
https://www.reddit.com/r/programming/comments/1nq2xc7/parallel_streaming_pattern_in_go_how_to_scan/
submitted by /u/destel116 (https://www.reddit.com/user/destel116)
[link] (https://destel.dev/blog/fast-listing-of-files-from-s3-gcs-and-other-object-storages) [comments] (https://www.reddit.com/r/programming/comments/1nq2xc7/parallel_streaming_pattern_in_go_how_to_scan/)
https://www.reddit.com/r/programming/comments/1nq2xc7/parallel_streaming_pattern_in_go_how_to_scan/
submitted by /u/destel116 (https://www.reddit.com/user/destel116)
[link] (https://destel.dev/blog/fast-listing-of-files-from-s3-gcs-and-other-object-storages) [comments] (https://www.reddit.com/r/programming/comments/1nq2xc7/parallel_streaming_pattern_in_go_how_to_scan/)
From Rust to Reality: The Hidden Journey of fetch_max
https://www.reddit.com/r/programming/comments/1nq47v8/from_rust_to_reality_the_hidden_journey_of_fetch/
submitted by /u/_shadowbannedagain (https://www.reddit.com/user/_shadowbannedagain)
[link] (https://questdb.com/blog/rust-fetch-max-compiler-journey/) [comments] (https://www.reddit.com/r/programming/comments/1nq47v8/from_rust_to_reality_the_hidden_journey_of_fetch/)
https://www.reddit.com/r/programming/comments/1nq47v8/from_rust_to_reality_the_hidden_journey_of_fetch/
submitted by /u/_shadowbannedagain (https://www.reddit.com/user/_shadowbannedagain)
[link] (https://questdb.com/blog/rust-fetch-max-compiler-journey/) [comments] (https://www.reddit.com/r/programming/comments/1nq47v8/from_rust_to_reality_the_hidden_journey_of_fetch/)
Windows App SDK 1.8.1 released
https://www.reddit.com/r/programming/comments/1nq4px6/windows_app_sdk_181_released/
submitted by /u/reps_up (https://www.reddit.com/user/reps_up)
[link] (https://learn.microsoft.com/en-us/windows/apps/windows-app-sdk/stable-channel) [comments] (https://www.reddit.com/r/programming/comments/1nq4px6/windows_app_sdk_181_released/)
https://www.reddit.com/r/programming/comments/1nq4px6/windows_app_sdk_181_released/
submitted by /u/reps_up (https://www.reddit.com/user/reps_up)
[link] (https://learn.microsoft.com/en-us/windows/apps/windows-app-sdk/stable-channel) [comments] (https://www.reddit.com/r/programming/comments/1nq4px6/windows_app_sdk_181_released/)
Knotty: A domain-specific language for knitting patterns
https://www.reddit.com/r/programming/comments/1nq577e/knotty_a_domainspecific_language_for_knitting/
submitted by /u/GarethX (https://www.reddit.com/user/GarethX)
[link] (https://t0mpr1c3.github.io/knotty/index.html) [comments] (https://www.reddit.com/r/programming/comments/1nq577e/knotty_a_domainspecific_language_for_knitting/)
https://www.reddit.com/r/programming/comments/1nq577e/knotty_a_domainspecific_language_for_knitting/
submitted by /u/GarethX (https://www.reddit.com/user/GarethX)
[link] (https://t0mpr1c3.github.io/knotty/index.html) [comments] (https://www.reddit.com/r/programming/comments/1nq577e/knotty_a_domainspecific_language_for_knitting/)
PostgreSQL 18 Released!
https://www.reddit.com/r/programming/comments/1nq6g8p/postgresql_18_released/
submitted by /u/jskatz05 (https://www.reddit.com/user/jskatz05)
[link] (https://www.postgresql.org/about/news/postgresql-18-released-3142/) [comments] (https://www.reddit.com/r/programming/comments/1nq6g8p/postgresql_18_released/)
https://www.reddit.com/r/programming/comments/1nq6g8p/postgresql_18_released/
submitted by /u/jskatz05 (https://www.reddit.com/user/jskatz05)
[link] (https://www.postgresql.org/about/news/postgresql-18-released-3142/) [comments] (https://www.reddit.com/r/programming/comments/1nq6g8p/postgresql_18_released/)
Zellij's creator on WebAssembly
https://www.reddit.com/r/programming/comments/1nq8xog/zellijs_creator_on_webassembly/
<!-- SC_OFF -->Zellij's creator on WebAssembly https://youtube.com/shorts/epM7hNOg7S8?feature=share <!-- SC_ON --> submitted by /u/perecastor (https://www.reddit.com/user/perecastor)
[link] (https://youtube.com/shorts/epM7hNOg7S8?feature=share) [comments] (https://www.reddit.com/r/programming/comments/1nq8xog/zellijs_creator_on_webassembly/)
https://www.reddit.com/r/programming/comments/1nq8xog/zellijs_creator_on_webassembly/
<!-- SC_OFF -->Zellij's creator on WebAssembly https://youtube.com/shorts/epM7hNOg7S8?feature=share <!-- SC_ON --> submitted by /u/perecastor (https://www.reddit.com/user/perecastor)
[link] (https://youtube.com/shorts/epM7hNOg7S8?feature=share) [comments] (https://www.reddit.com/r/programming/comments/1nq8xog/zellijs_creator_on_webassembly/)
A Very Early History of Algebraic Data Types
https://www.reddit.com/r/programming/comments/1nqepj3/a_very_early_history_of_algebraic_data_types/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://www.hillelwayne.com/post/algdt-history/) [comments] (https://www.reddit.com/r/programming/comments/1nqepj3/a_very_early_history_of_algebraic_data_types/)
https://www.reddit.com/r/programming/comments/1nqepj3/a_very_early_history_of_algebraic_data_types/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://www.hillelwayne.com/post/algdt-history/) [comments] (https://www.reddit.com/r/programming/comments/1nqepj3/a_very_early_history_of_algebraic_data_types/)
CHERI and the efforts to get Linux running on it
https://www.reddit.com/r/programming/comments/1nqeps9/cheri_and_the_efforts_to_get_linux_running_on_it/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://lwn.net/SubscriberLink/1037974/903c6f9a42f7782a/) [comments] (https://www.reddit.com/r/programming/comments/1nqeps9/cheri_and_the_efforts_to_get_linux_running_on_it/)
https://www.reddit.com/r/programming/comments/1nqeps9/cheri_and_the_efforts_to_get_linux_running_on_it/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://lwn.net/SubscriberLink/1037974/903c6f9a42f7782a/) [comments] (https://www.reddit.com/r/programming/comments/1nqeps9/cheri_and_the_efforts_to_get_linux_running_on_it/)
Tracing JITs in the real world @ CPython Core Dev Sprint
https://www.reddit.com/r/programming/comments/1nqetmw/tracing_jits_in_the_real_world_cpython_core_dev/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://antocuni.eu/2025/09/24/tracing-jits-in-the-real-world--cpython-core-dev-sprint/) [comments] (https://www.reddit.com/r/programming/comments/1nqetmw/tracing_jits_in_the_real_world_cpython_core_dev/)
https://www.reddit.com/r/programming/comments/1nqetmw/tracing_jits_in_the_real_world_cpython_core_dev/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://antocuni.eu/2025/09/24/tracing-jits-in-the-real-world--cpython-core-dev-sprint/) [comments] (https://www.reddit.com/r/programming/comments/1nqetmw/tracing_jits_in_the_real_world_cpython_core_dev/)
Graal Truffle tutorial part 0 – what is Truffle?
https://www.reddit.com/r/programming/comments/1nqeu2d/graal_truffle_tutorial_part_0_what_is_truffle/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://www.endoflineblog.com/graal-truffle-tutorial-part-0-what-is-truffle) [comments] (https://www.reddit.com/r/programming/comments/1nqeu2d/graal_truffle_tutorial_part_0_what_is_truffle/)
https://www.reddit.com/r/programming/comments/1nqeu2d/graal_truffle_tutorial_part_0_what_is_truffle/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://www.endoflineblog.com/graal-truffle-tutorial-part-0-what-is-truffle) [comments] (https://www.reddit.com/r/programming/comments/1nqeu2d/graal_truffle_tutorial_part_0_what_is_truffle/)
Fundamental of Virtual Memory
https://www.reddit.com/r/programming/comments/1nqevwx/fundamental_of_virtual_memory/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://nghiant3223.github.io/2025/05/29/fundamental_of_virtual_memory.html) [comments] (https://www.reddit.com/r/programming/comments/1nqevwx/fundamental_of_virtual_memory/)
https://www.reddit.com/r/programming/comments/1nqevwx/fundamental_of_virtual_memory/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://nghiant3223.github.io/2025/05/29/fundamental_of_virtual_memory.html) [comments] (https://www.reddit.com/r/programming/comments/1nqevwx/fundamental_of_virtual_memory/)
Specification, speed and (a) schedule
https://www.reddit.com/r/programming/comments/1nqew5e/specification_speed_and_a_schedule/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://kaleidawave.github.io/posts/specification-speed-schedule/) [comments] (https://www.reddit.com/r/programming/comments/1nqew5e/specification_speed_and_a_schedule/)
https://www.reddit.com/r/programming/comments/1nqew5e/specification_speed_and_a_schedule/
submitted by /u/ketralnis (https://www.reddit.com/user/ketralnis)
[link] (https://kaleidawave.github.io/posts/specification-speed-schedule/) [comments] (https://www.reddit.com/r/programming/comments/1nqew5e/specification_speed_and_a_schedule/)
Immutable Infrastructure DevOps: Why You Should Replace, Not Patch
https://www.reddit.com/r/programming/comments/1nqg0j9/immutable_infrastructure_devops_why_you_should/
submitted by /u/trolleid (https://www.reddit.com/user/trolleid)
[link] (https://lukasniessen.medium.com/immutable-infrastructure-devops-why-you-should-replace-not-patch-e9a2cf71785e) [comments] (https://www.reddit.com/r/programming/comments/1nqg0j9/immutable_infrastructure_devops_why_you_should/)
https://www.reddit.com/r/programming/comments/1nqg0j9/immutable_infrastructure_devops_why_you_should/
submitted by /u/trolleid (https://www.reddit.com/user/trolleid)
[link] (https://lukasniessen.medium.com/immutable-infrastructure-devops-why-you-should-replace-not-patch-e9a2cf71785e) [comments] (https://www.reddit.com/r/programming/comments/1nqg0j9/immutable_infrastructure_devops_why_you_should/)
Decision Log: Why writing down your technical choices is a game-changer
https://www.reddit.com/r/programming/comments/1nqgn7x/decision_log_why_writing_down_your_technical/
submitted by /u/dmp0x7c5 (https://www.reddit.com/user/dmp0x7c5)
[link] (https://l.perspectiveship.com/re-decl) [comments] (https://www.reddit.com/r/programming/comments/1nqgn7x/decision_log_why_writing_down_your_technical/)
https://www.reddit.com/r/programming/comments/1nqgn7x/decision_log_why_writing_down_your_technical/
submitted by /u/dmp0x7c5 (https://www.reddit.com/user/dmp0x7c5)
[link] (https://l.perspectiveship.com/re-decl) [comments] (https://www.reddit.com/r/programming/comments/1nqgn7x/decision_log_why_writing_down_your_technical/)
Zero downtime Postgres upgrades using logical replication
https://www.reddit.com/r/programming/comments/1nqj0k7/zero_downtime_postgres_upgrades_using_logical/
submitted by /u/rizzlesaurus_rex (https://www.reddit.com/user/rizzlesaurus_rex)
[link] (https://gadget.dev/blog/zero-downtime-postgres-upgrades-using-logical-replication) [comments] (https://www.reddit.com/r/programming/comments/1nqj0k7/zero_downtime_postgres_upgrades_using_logical/)
https://www.reddit.com/r/programming/comments/1nqj0k7/zero_downtime_postgres_upgrades_using_logical/
submitted by /u/rizzlesaurus_rex (https://www.reddit.com/user/rizzlesaurus_rex)
[link] (https://gadget.dev/blog/zero-downtime-postgres-upgrades-using-logical-replication) [comments] (https://www.reddit.com/r/programming/comments/1nqj0k7/zero_downtime_postgres_upgrades_using_logical/)