DevOps&SRE Library

Auto-scaling and Load-based Scaling

Explains reactive metric-based scaling versus scheduled scaling and where each approach fits.

https://blog.felipefr.dev/auto-scaling-and-load-based-scaling

3.76K views07:02

DevOps&SRE Library

rtk

CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies

https://github.com/rtk-ai/rtk

3.79K views15:01

DevOps&SRE Library

Integration testing with Kubernetes

Shows a Rust-based integration testing workflow on kind with Terraform and cleanup policies for parallel runs.

https://mikamu.substack.com/p/integration-testing-with-kubernetes

3.55K views07:00

DevOps&SRE Library

Vault: secure Kubernetes authentication with hashicorp Vault OIDC

Explains how to use Vault as an OIDC provider to replace static kubeconfig credentials with short-lived tokens.

https://phuchoang.sbs/posts/gitops-kubernetes-oidc-vault

3.57K views15:00

DevOps&SRE Library

Security Inside Kubernetes: Admission & Runtime Guardrails with Kyverno and KubeArmor

Covers layered Kubernetes security by combining Kyverno admission policies with KubeArmor runtime enforcement.

https://medium.com/globant/security-inside-kubernetes-admission-runtime-guardrails-with-kyverno-and-kubearmor-6d2f97264cbc

3.26K views07:00

DevOps&SRE Library

Crust-Gather - kubectl Cluster Snapshot Plugin

Open-source kubectl plugin for collecting a structured cluster snapshot for debugging and analysis.

https://github.com/crust-gather/crust-gather

3.51K views15:00

DevOps&SRE Library

Kogaro - Kubernetes Configuration Hygiene Agent

Agent project focused on improving Kubernetes configuration hygiene and reducing misconfiguration risk.

https://github.com/topiaruss/kogaro

3.48K views07:00

DevOps&SRE Library

llm-d: SOTA inference performance

Project targeting high-performance large language model inference workloads.

https://github.com/llm-d/llm-d

3.33K views15:00

DevOps&SRE Library

Kthena: Enterprise LLM serving

Enterprise-oriented platform for serving and operating LLM workloads on Kubernetes.

https://github.com/volcano-sh/kthena

3.07K views07:00

DevOps&SRE Library

Easykube: Local Kubernetes development

Tooling aimed at simplifying local Kubernetes development environments.

https://github.com/torloejborg/easykube

2.84K views15:00

DevOps&SRE Library

Guardon: Kubernetes security extension

Security-focused extension project for strengthening Kubernetes environments.

https://github.com/guardon-dev/guardon

2.8K views07:00

DevOps&SRE Library

difftastic

a structural diff that understands syntax

http://github.com/Wilfred/difftastic

3.37K views15:01

DevOps&SRE Library

We Cut Our Kubernetes Pods by 60% and Doubled Traffic Capacity

This case study explains how JVM tuning, a smaller Hikari pool, and faster HPA scale-up doubled traffic capacity while reducing baseline pods.

https://medium.com/@feridquluzade2002/we-cut-our-kubernetes-pods-by-60-and-doubled-traffic-capacity-b1cfb6850fca

3.43K views07:01

DevOps&SRE Library

Hidden Kubernetes Bad Practices Learned the Hard Way During Incidents

This article distills incident-driven lessons on troubleshooting, configuration mistakes, and operational habits that make Kubernetes outages worse.

https://hackernoon.com/hidden-kubernetes-bad-practices-learned-the-hard-way-during-incidents

4.54K views15:05

DevOps&SRE Library

From Chaos to 99.9% Uptime: Rebuilding a Kubernetes Platform for GPU Workloads

This article covers rebuilding a Kubernetes platform for GPU workloads to reach 99.9% uptime after operational instability.

https://medium.com/@mateenali66/from-chaos-to-99-9-uptime-rebuilding-a-kubernetes-platform-for-gpu-workloads-4fadb1067a0b

3.29K views07:03

DevOps&SRE Library

Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness.

https://victoriametrics.com/blog/log-collectors-benchmark-2026/index.html

3.35K views10:00

DevOps&SRE Library

Making and scaling a game server in Kubernetes using agones

This tutorial walks through building a Go game server with Agones, matchmaking, Fleet allocation, and autoscaling on Kubernetes.

https://noe-t.dev/posts/making-and-scaling-a-game-server-in-k8s-using-agones

3.3K views15:02

DevOps&SRE Library

PostgreSQL migration with CloudNativePG Logical Replication on Kubernetes - Zero-Downtime

This tutorial shows how to migrate PostgreSQL to CloudNativePG on Kubernetes with logical replication and no downtime.

https://kndoni.medium.com/postgresql-migration-with-cloudnativepg-logical-replication-on-kubernetes-zero-downtime-aef1c33a3a53

3.13K views07:05

DevOps&SRE Library

Gateway API setup on GKE with NGINX Gateway Fabric

This tutorial shows how to deploy NGINX Gateway Fabric on GKE with Terraform, split traffic paths, and automate TLS certificates.

https://medium.com/@henrikamirbekyan/gateway-api-setup-on-gke-with-nginx-gateway-fabric-1b0d0ec3bbf3

2.91K views15:01

DevOps&SRE Library

Migrating Kubernetes Off Big Cloud

This interview compares the cost and operational tradeoffs of moving a Kubernetes workload from GKE Autopilot to Hetzner with Edka.

https://kube.fm/migrating-kubernetes-off-big-cloud-fernando

3.47K views07:02

DevOps&SRE Library

GoKubeDownscaler

A horizontal autoscaler for Kubernetes workloads, saving cloud costs by scaling workloads down after hours. This is a golang port and successor of the popular (py-)kube-downscaler with improvements and quality of life changes.

https://github.com/caas-team/GoKubeDownscaler

3.5K views15:03

About

Blog

Apps

Platform