DevOps&SRE Library

Configuration Management at Ant Group: Generated Manifest & Immutable Desired State

https://blog.kusionstack.io/configuration-management-at-ant-group-generated-manifest-immutable-desired-state-3c50e363a3fb

3.17K views15:03

DevOps&SRE Library

Can't DNAT After DNAT?

https://dev.to/reoring/cant-dnat-after-dnat-2eh0

2.79K views07:03

DevOps&SRE Library

Readiness vs. Liveness probes: what is the difference? (and startup probes!)

https://medium.com/@jrkessl/readiness-vs-liveness-probes-what-is-the-difference-and-startup-probes-215560f043e4

2.74K views15:03

DevOps&SRE Library

Optimizing Kubernetes Log Aggregation: Tackling Fluent Bit Buffering and Backpressure Challenges

https://arteraai.medium.com/optimizing-kubernetes-log-aggregation-tackling-fluent-bit-buffering-and-backpressure-challenges-fb3129dc5031

2.96K views07:04

DevOps&SRE Library

From LB Ingress to ZTM — A New Approach to Cluster Service Exposure

https://addozhang.medium.com/from-lb-ingress-to-ztm-a-new-approach-to-cluster-service-exposure-99d32a3065ec

3.32K views15:03

DevOps&SRE Library

Debugging Distroless Kubernetes Containers

https://levelup.gitconnected.com/debugging-distroless-kubernetes-containers-74cfde06b196

3.07K views07:04

DevOps&SRE Library

OpenTelemetry Resource Attributes: Best Practices for Kubernetes

https://www.dash0.com/guides/opentelemetry-kubernetes-attributes-best-practices

3.42K views15:04

DevOps&SRE Library

Helm Chart Validation Just Got Smarter Thanks to This Google-Powered Tool

https://hackernoon.com/helm-chart-validation-just-got-smarter-thanks-to-this-google-powered-tool

3.24K views07:05

DevOps&SRE Library

dockprom

Docker hosts and containers monitoring with Prometheus, Grafana, cAdvisor, NodeExporter and AlertManager

https://github.com/stefanprodan/dockprom

3.6K views15:01

DevOps&SRE Library

An interactive Kubernetes log viewer for your terminal.

https://github.com/robinovitch61/kl

3.17K views07:03

DevOps&SRE Library

pgrwl

pgrwl is a PostgreSQL write-ahead log (WAL) receiver written in Go. It’s a drop-in, container-friendly alternative to pg_receivewal, supporting streaming replication, encryption, compression, and remote storage (S3, SFTP).

Designed for disaster recovery and PITR (Point-in-Time Recovery), pgrwl ensures zero data loss (RPO=0) and seamless integration with Kubernetes environments.

https://github.com/hashmap-kz/pgrwl

4.07K views15:06

DevOps&SRE Library

Moving on from Nix

After using nix in my dotfiles for over 2 years, I’m now moving away from it.

Here’s why.

https://carlosbecker.com/posts/bye-nix

4.08K views07:02

DevOps&SRE Library

Staying on Nix

I have been using Nix regularly since roughly 2019, when I set up my primary build server to use Nix to manage the various toolchains, though it wasn't until 2022 that I really invested heavily, and I'm now using Nix in combination with other more traditional DevOps tools to provision and manage more than 10 physical machines and 50 VMs in my homelab.

https://pid1.sh/blog/staying-on-nix

4.22K views15:04

DevOps&SRE Library

lstr

A blazingly fast, minimalist directory tree viewer, written in Rust. Inspired by the command line program tree, with a powerful interactive mode.

https://github.com/bgreenwell/lstr

4.32K views07:03

DevOps&SRE Library

canine

Canine is an easy to use intuitive deployment platform for Kubernetes clusters.

https://github.com/czhu12/canine

4.19K views15:03

DevOps&SRE Library

How We Migrated 30+ Kubernetes Clusters to Terraform

https://medium.com/learnings-from-the-paas/how-we-migrated-30-kubernetes-clusters-to-terraform-cd2b1cef8b84

4K views07:03

DevOps&SRE Library

How We Integrated Native macOS Workloads with Kubernetes

https://medium.com/agoda-engineering/how-we-integrated-native-macos-workloads-with-kubernetes-b4d3c14881a0

4.08K views15:02

DevOps&SRE Library

Why Our Pods Were Breaking Bad (and How We Fixed Them)

In this article, we’ll walk through the process of diagnosing a memory leak, analyzing the root cause, and implementing effective solutions to mitigate its impact. We’ll explore practical steps that any application, regardless of the underlying stack or architecture, can follow to troubleshoot and optimize performance.

https://kshitij-nawandar.medium.com/why-our-pods-were-breaking-bad-and-how-we-fixed-them-b3c3e9e8003b

3.97K views07:03

DevOps&SRE Library

FacetController: How we made infrastructure changes at Lyft simple

https://eng.lyft.com/facetcontroller-how-we-made-infrastructure-changes-at-lyft-simple-dab49f5b27c7

3.59K views15:04

DevOps&SRE Library

Operational Considerations for Managing Stateful Workloads

When managing stateful workloads, whether in Kubernetes or traditional infrastructure, operational concerns like isolation, lifecycle management, security, disaster recovery, scalability, and observability take center stage. While the examples focus on AWS, PostgreSQL, and Kubernetes, the principles and best practices discussed here are broadly applicable to any environment. This article approaches these topics from an operations perspective, prioritizing reliability, maintainability, and resilience. The goal is not just to run a database, but to ensure it operates efficiently, scales properly, and remains secure in real-world conditions. We’ll explore key aspects of running stateful workloads, from managing failure domains to ensuring observability, and how these impact both operations teams and developers. Whether you’re running a database in a cloud-native setup or on bare metal, these strategies will help you build a robust, well-managed system.

https://dev.to/pampatzoglou/operational-considerations-for-managing-stateful-workloads-20c3

4.01K views07:01

About

Blog

Apps

Platform