From DNS Failures to Resilience: How NodeLocal DNSCache Saved the Day
https://engineering.mercari.com/en/blog/entry/20250515-from-dns-failures-to-resilience-how-nodelocal-dnscache-saved-the-day
https://engineering.mercari.com/en/blog/entry/20250515-from-dns-failures-to-resilience-how-nodelocal-dnscache-saved-the-day
Who the Hell is Going to Pay For This?
https://www.adatosystems.com/2025/02/10/who-the-hell-is-going-to-pay-for-this
I’ve specialized in monitoring and observability for 27 years now, and I’ve seen a lot of tools and techniques come and go (RMon, anyone?); and more than a few come and stay (Rumors of the death of SNMP have been – and continue to be – greatly exaggerated.). Lately I’ve been exploring one of the more recent improvements in the space – OpenTelemetry (which I’m abbreviating to “OTel” for the remainder of this blog). I wrote about my decision to dive into OTel recently.
For the most part, I’m enjoying the journey. But there’s a problem that has existed with observability for a while now, and it’s something OTel is not helping. The title of this post hints at the issue, but I want to be more explicit. Let’s start with some comparison shopping.
Before I piss off every vendor in town, I want to be clear that these are broad, rough, high level numbers. I’ve linked to the pricing pages if you want to check the details, and I acknowledge what you see below isn’t necessarily indicative of the price you might actually pay after getting a quote on a real production environment.
https://www.adatosystems.com/2025/02/10/who-the-hell-is-going-to-pay-for-this
Terraform depends_on: What it is, When to use it, and Best Practices
https://dev.to/techielass/terraform-dependson-what-it-is-when-to-use-it-and-best-practices-5ene
When working with Terraform, managing resource dependencies effectively is key to avoiding deployment issues. Terraform is great at automatically determining the order of resource creation, but sometimes it needs a little help, this is where depends_on comes in.
In this guide, we’ll explain Terraform depends_on, how to use it, when to use it, and best practices for writing clean and efficient Terraform code.
https://dev.to/techielass/terraform-dependson-what-it-is-when-to-use-it-and-best-practices-5ene
Configuration Management at Ant Group: Generated Manifest & Immutable Desired State
https://blog.kusionstack.io/configuration-management-at-ant-group-generated-manifest-immutable-desired-state-3c50e363a3fb
https://blog.kusionstack.io/configuration-management-at-ant-group-generated-manifest-immutable-desired-state-3c50e363a3fb
Readiness vs. Liveness probes: what is the difference? (and startup probes!)
https://medium.com/@jrkessl/readiness-vs-liveness-probes-what-is-the-difference-and-startup-probes-215560f043e4
https://medium.com/@jrkessl/readiness-vs-liveness-probes-what-is-the-difference-and-startup-probes-215560f043e4
Optimizing Kubernetes Log Aggregation: Tackling Fluent Bit Buffering and Backpressure Challenges
https://arteraai.medium.com/optimizing-kubernetes-log-aggregation-tackling-fluent-bit-buffering-and-backpressure-challenges-fb3129dc5031
https://arteraai.medium.com/optimizing-kubernetes-log-aggregation-tackling-fluent-bit-buffering-and-backpressure-challenges-fb3129dc5031
From LB Ingress to ZTM — A New Approach to Cluster Service Exposure
https://addozhang.medium.com/from-lb-ingress-to-ztm-a-new-approach-to-cluster-service-exposure-99d32a3065ec
https://addozhang.medium.com/from-lb-ingress-to-ztm-a-new-approach-to-cluster-service-exposure-99d32a3065ec
Debugging Distroless Kubernetes Containers
https://levelup.gitconnected.com/debugging-distroless-kubernetes-containers-74cfde06b196
https://levelup.gitconnected.com/debugging-distroless-kubernetes-containers-74cfde06b196
OpenTelemetry Resource Attributes: Best Practices for Kubernetes
https://www.dash0.com/guides/opentelemetry-kubernetes-attributes-best-practices
https://www.dash0.com/guides/opentelemetry-kubernetes-attributes-best-practices
Helm Chart Validation Just Got Smarter Thanks to This Google-Powered Tool
https://hackernoon.com/helm-chart-validation-just-got-smarter-thanks-to-this-google-powered-tool
https://hackernoon.com/helm-chart-validation-just-got-smarter-thanks-to-this-google-powered-tool
dockprom
https://github.com/stefanprodan/dockprom
Docker hosts and containers monitoring with Prometheus, Grafana, cAdvisor, NodeExporter and AlertManager
https://github.com/stefanprodan/dockprom
pgrwl
https://github.com/hashmap-kz/pgrwl
pgrwl is a PostgreSQL write-ahead log (WAL) receiver written in Go. It’s a drop-in, container-friendly alternative to pg_receivewal, supporting streaming replication, encryption, compression, and remote storage (S3, SFTP).
Designed for disaster recovery and PITR (Point-in-Time Recovery), pgrwl ensures zero data loss (RPO=0) and seamless integration with Kubernetes environments.
https://github.com/hashmap-kz/pgrwl
Moving on from Nix
https://carlosbecker.com/posts/bye-nix
After using nix in my dotfiles for over 2 years, I’m now moving away from it.
Here’s why.
https://carlosbecker.com/posts/bye-nix
Staying on Nix
https://pid1.sh/blog/staying-on-nix
I have been using Nix regularly since roughly 2019, when I set up my primary build server to use Nix to manage the various toolchains, though it wasn't until 2022 that I really invested heavily, and I'm now using Nix in combination with other more traditional DevOps tools to provision and manage more than 10 physical machines and 50 VMs in my homelab.
https://pid1.sh/blog/staying-on-nix
lstr
https://github.com/bgreenwell/lstr
A blazingly fast, minimalist directory tree viewer, written in Rust. Inspired by the command line program tree, with a powerful interactive mode.
https://github.com/bgreenwell/lstr
canine
https://github.com/czhu12/canine
Canine is an easy to use intuitive deployment platform for Kubernetes clusters.
https://github.com/czhu12/canine
How We Migrated 30+ Kubernetes Clusters to Terraform
https://medium.com/learnings-from-the-paas/how-we-migrated-30-kubernetes-clusters-to-terraform-cd2b1cef8b84
https://medium.com/learnings-from-the-paas/how-we-migrated-30-kubernetes-clusters-to-terraform-cd2b1cef8b84
How We Integrated Native macOS Workloads with Kubernetes
https://medium.com/agoda-engineering/how-we-integrated-native-macos-workloads-with-kubernetes-b4d3c14881a0
https://medium.com/agoda-engineering/how-we-integrated-native-macos-workloads-with-kubernetes-b4d3c14881a0