DevOps&SRE Library
19.3K subscribers
427 photos
2 videos
2 files
5.25K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Terraform, Feature Flags and Configurability

Terraform has been my bread and butter for the past few years as the tool for Infrastructure as Code. I’ve dealt with a variety of patterns while working with Terraform, and noticed one pattern that is rarely discussed, but super useful. Feature flags are a way to write code that can behave differently based on how we configure things, and is as old as writing code for computers. It gives the author of the code flexibility to have different implementations, safely migrate systems from one approach or capability to a new one and choose behaviour based on the target context. Mature codebases find tooling or affordances that give them the ability to evolve and adapt, and feature flags are one pattern to achieve that. However, the public internet has very few examples about how to achieve this for Infrastructure as Code tools, including Terraform.


https://ninad.pundaliks.in/blog/2026/02/terraform-and-feature-flags
Kubernetes egress control with squid proxy

Shows how to enforce and observe Kubernetes egress traffic with Squid plus NetworkPolicy without adding a service mesh.


https://interlaye.red/kubernetes_002degress_002dsquid.html
How We Turned a Forced OS Migration into a 30% Infrastructure Reduction

Scout24 used an Amazon Linux 2 migration window to adopt Karpenter and cut EKS node count by about 30%.


https://scout24.medium.com/infinity-transformation-how-we-turned-a-forced-os-migration-into-a-30-infrastructure-reduction-1a41237307b8
Auto-scaling and Load-based Scaling

Explains reactive metric-based scaling versus scheduled scaling and where each approach fits.


https://blog.felipefr.dev/auto-scaling-and-load-based-scaling
rtk

CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies


https://github.com/rtk-ai/rtk
Integration testing with Kubernetes

Shows a Rust-based integration testing workflow on kind with Terraform and cleanup policies for parallel runs.


https://mikamu.substack.com/p/integration-testing-with-kubernetes
Vault: secure Kubernetes authentication with hashicorp Vault OIDC

Explains how to use Vault as an OIDC provider to replace static kubeconfig credentials with short-lived tokens.


https://phuchoang.sbs/posts/gitops-kubernetes-oidc-vault
Security Inside Kubernetes: Admission & Runtime Guardrails with Kyverno and KubeArmor

Covers layered Kubernetes security by combining Kyverno admission policies with KubeArmor runtime enforcement.


https://medium.com/globant/security-inside-kubernetes-admission-runtime-guardrails-with-kyverno-and-kubearmor-6d2f97264cbc
Crust-Gather - kubectl Cluster Snapshot Plugin

Open-source kubectl plugin for collecting a structured cluster snapshot for debugging and analysis.


https://github.com/crust-gather/crust-gather
Kogaro - Kubernetes Configuration Hygiene Agent

Agent project focused on improving Kubernetes configuration hygiene and reducing misconfiguration risk.


https://github.com/topiaruss/kogaro
llm-d: SOTA inference performance

Project targeting high-performance large language model inference workloads.


https://github.com/llm-d/llm-d
Kthena: Enterprise LLM serving

Enterprise-oriented platform for serving and operating LLM workloads on Kubernetes.


https://github.com/volcano-sh/kthena
Easykube: Local Kubernetes development

Tooling aimed at simplifying local Kubernetes development environments.


https://github.com/torloejborg/easykube
Guardon: Kubernetes security extension

Security-focused extension project for strengthening Kubernetes environments.


https://github.com/guardon-dev/guardon
difftastic

a structural diff that understands syntax


http://github.com/Wilfred/difftastic
We Cut Our Kubernetes Pods by 60% and Doubled Traffic Capacity

This case study explains how JVM tuning, a smaller Hikari pool, and faster HPA scale-up doubled traffic capacity while reducing baseline pods.


https://medium.com/@feridquluzade2002/we-cut-our-kubernetes-pods-by-60-and-doubled-traffic-capacity-b1cfb6850fca
Hidden Kubernetes Bad Practices Learned the Hard Way During Incidents

This article distills incident-driven lessons on troubleshooting, configuration mistakes, and operational habits that make Kubernetes outages worse.


https://hackernoon.com/hidden-kubernetes-bad-practices-learned-the-hard-way-during-incidents
From Chaos to 99.9% Uptime: Rebuilding a Kubernetes Platform for GPU Workloads

This article covers rebuilding a Kubernetes platform for GPU workloads to reach 99.9% uptime after operational instability.


https://medium.com/@mateenali66/from-chaos-to-99-9-uptime-rebuilding-a-kubernetes-platform-for-gpu-workloads-4fadb1067a0b
Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness.


https://victoriametrics.com/blog/log-collectors-benchmark-2026/index.html
Making and scaling a game server in Kubernetes using agones

This tutorial walks through building a Go game server with Agones, matchmaking, Fleet allocation, and autoscaling on Kubernetes.


https://noe-t.dev/posts/making-and-scaling-a-game-server-in-k8s-using-agones
PostgreSQL migration with CloudNativePG Logical Replication on Kubernetes - Zero-Downtime

This tutorial shows how to migrate PostgreSQL to CloudNativePG on Kubernetes with logical replication and no downtime.


https://kndoni.medium.com/postgresql-migration-with-cloudnativepg-logical-replication-on-kubernetes-zero-downtime-aef1c33a3a53