Mechanical sympathy for QR codes: making NSW check-in better
QR codes are now critical infrastructure here in NSW, Australia. Let's learn how to make them better.
https://huonw.github.io/blog/2021/10/nsw-covid-qr/
@DevMisc
#qrcode #optimization #misc
QR codes are now critical infrastructure here in NSW, Australia. Let's learn how to make them better.
https://huonw.github.io/blog/2021/10/nsw-covid-qr/
@DevMisc
#qrcode #optimization #misc
Fast character case conversion
...or how to compress sparse arrays.
https://github.com/apankrat/notes/tree/master/fast-case-conversion
@DevMisc
#algorithm #optimization #misc
...or how to compress sparse arrays.
https://github.com/apankrat/notes/tree/master/fast-case-conversion
@DevMisc
#algorithm #optimization #misc
Understanding why our build got 15x slower with Webpack 5
https://engineering.tines.com/blog/understanding-why-our-build-got-15x-slower-with-webpack
@DevMisc
#javascript #v8 #optimization
https://engineering.tines.com/blog/understanding-why-our-build-got-15x-slower-with-webpack
@DevMisc
#javascript #v8 #optimization
Saving a third of our memory by re-ordering Go struct fields
https://wagslane.dev/posts/go-struct-ordering
@DevMisc
#golang #memory #optimization
https://wagslane.dev/posts/go-struct-ordering
@DevMisc
#golang #memory #optimization
Speeding up my AoC solution by a factor of 2700 with Dijkstra’s
https://blog.siraben.dev/2021/12/28/aoc-speedup.html
@DevMisc
#optimization #misc
https://blog.siraben.dev/2021/12/28/aoc-speedup.html
@DevMisc
#optimization #misc
Counting Bytes Faster Than You'd Think Possible
https://blog.mattstuchlik.com/2024/07/21/fastest-memory-read.html
@DevMisc
#asm #cpp #optimization
- The author was able to significantly optimize a byte-counting program, achieving a ~550x speedup over a naive implementation.
- The key optimization was using an interleaved memory access pattern, reading from different 4KB pages in a round-robin fashion, instead of sequential access.
- This interleaved access pattern takes advantage of the "Streamer" hardware prefetcher in modern CPUs, which can maintain separate forward and backward access streams for each 4KB page.
- Interleaving 8 different 4KB pages was found to be the optimal approach, providing up to a 30% performance boost over sequential access.
- The author also unrolled the inner loop to process 2 cache lines (64 bytes) at a time, and added a prefetch instruction to fetch the next set of data.
- The final solution uses AVX2 SIMD instructions to perform the byte counting in a highly efficient manner.
- The author was able to achieve a ranking of #13 on the HighLoad leaderboard with this optimized solution.
- The interleaved memory access pattern seems to be an under-discussed optimization technique, with the author not recalling seeing it used in other code.
- The author encourages readers to share any other memory-based optimizations they are aware of, as the author is interested in learning about them.
- The document provides the full source code for the optimized byte-counting program, allowing readers to study and potentially apply the techniques in their own work.
https://blog.mattstuchlik.com/2024/07/21/fastest-memory-read.html
@DevMisc
#asm #cpp #optimization
❤1🤯1