Crust-Gather - kubectl Cluster Snapshot Plugin
https://github.com/crust-gather/crust-gather
Open-source kubectl plugin for collecting a structured cluster snapshot for debugging and analysis.
https://github.com/crust-gather/crust-gather
Kogaro - Kubernetes Configuration Hygiene Agent
https://github.com/topiaruss/kogaro
Agent project focused on improving Kubernetes configuration hygiene and reducing misconfiguration risk.
https://github.com/topiaruss/kogaro
llm-d: SOTA inference performance
https://github.com/llm-d/llm-d
Project targeting high-performance large language model inference workloads.
https://github.com/llm-d/llm-d
Kthena: Enterprise LLM serving
https://github.com/volcano-sh/kthena
Enterprise-oriented platform for serving and operating LLM workloads on Kubernetes.
https://github.com/volcano-sh/kthena
Easykube: Local Kubernetes development
https://github.com/torloejborg/easykube
Tooling aimed at simplifying local Kubernetes development environments.
https://github.com/torloejborg/easykube
Guardon: Kubernetes security extension
https://github.com/guardon-dev/guardon
Security-focused extension project for strengthening Kubernetes environments.
https://github.com/guardon-dev/guardon
We Cut Our Kubernetes Pods by 60% and Doubled Traffic Capacity
https://medium.com/@feridquluzade2002/we-cut-our-kubernetes-pods-by-60-and-doubled-traffic-capacity-b1cfb6850fca
This case study explains how JVM tuning, a smaller Hikari pool, and faster HPA scale-up doubled traffic capacity while reducing baseline pods.
https://medium.com/@feridquluzade2002/we-cut-our-kubernetes-pods-by-60-and-doubled-traffic-capacity-b1cfb6850fca
Hidden Kubernetes Bad Practices Learned the Hard Way During Incidents
https://hackernoon.com/hidden-kubernetes-bad-practices-learned-the-hard-way-during-incidents
This article distills incident-driven lessons on troubleshooting, configuration mistakes, and operational habits that make Kubernetes outages worse.
https://hackernoon.com/hidden-kubernetes-bad-practices-learned-the-hard-way-during-incidents
From Chaos to 99.9% Uptime: Rebuilding a Kubernetes Platform for GPU Workloads
https://medium.com/@mateenali66/from-chaos-to-99-9-uptime-rebuilding-a-kubernetes-platform-for-gpu-workloads-4fadb1067a0b
This article covers rebuilding a Kubernetes platform for GPU workloads to reach 99.9% uptime after operational instability.
https://medium.com/@mateenali66/from-chaos-to-99-9-uptime-rebuilding-a-kubernetes-platform-for-gpu-workloads-4fadb1067a0b
Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more
https://victoriametrics.com/blog/log-collectors-benchmark-2026/index.html
At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness.
https://victoriametrics.com/blog/log-collectors-benchmark-2026/index.html
Making and scaling a game server in Kubernetes using agones
https://noe-t.dev/posts/making-and-scaling-a-game-server-in-k8s-using-agones
This tutorial walks through building a Go game server with Agones, matchmaking, Fleet allocation, and autoscaling on Kubernetes.
https://noe-t.dev/posts/making-and-scaling-a-game-server-in-k8s-using-agones
PostgreSQL migration with CloudNativePG Logical Replication on Kubernetes - Zero-Downtime
https://kndoni.medium.com/postgresql-migration-with-cloudnativepg-logical-replication-on-kubernetes-zero-downtime-aef1c33a3a53
This tutorial shows how to migrate PostgreSQL to CloudNativePG on Kubernetes with logical replication and no downtime.
https://kndoni.medium.com/postgresql-migration-with-cloudnativepg-logical-replication-on-kubernetes-zero-downtime-aef1c33a3a53
Gateway API setup on GKE with NGINX Gateway Fabric
https://medium.com/@henrikamirbekyan/gateway-api-setup-on-gke-with-nginx-gateway-fabric-1b0d0ec3bbf3
This tutorial shows how to deploy NGINX Gateway Fabric on GKE with Terraform, split traffic paths, and automate TLS certificates.
https://medium.com/@henrikamirbekyan/gateway-api-setup-on-gke-with-nginx-gateway-fabric-1b0d0ec3bbf3
Migrating Kubernetes Off Big Cloud
https://kube.fm/migrating-kubernetes-off-big-cloud-fernando
This interview compares the cost and operational tradeoffs of moving a Kubernetes workload from GKE Autopilot to Hetzner with Edka.
https://kube.fm/migrating-kubernetes-off-big-cloud-fernando
GoKubeDownscaler
https://github.com/caas-team/GoKubeDownscaler
A horizontal autoscaler for Kubernetes workloads, saving cloud costs by scaling workloads down after hours. This is a golang port and successor of the popular (py-)kube-downscaler with improvements and quality of life changes.
https://github.com/caas-team/GoKubeDownscaler
Karpenter Optimizer: cost optimization
https://github.com/kaskol10/karpenter-optimizer
This tool analyzes Karpenter NodePool usage and offers AI-powered recommendations to reduce AWS EC2 costs while maintaining performance.
https://github.com/kaskol10/karpenter-optimizer
linnix
https://github.com/linnix-os/linnix
eBPF-powered Linux observability with AI incident detection.
https://github.com/linnix-os/linnix
radar
https://github.com/skyhook-io/radar
Visualize your cluster topology, browse resources, stream logs, exec into pods, inspect container image filesystems, manage Helm releases, monitor GitOps workflows (FluxCD & ArgoCD), and forward ports — all from a single binary with zero cluster-side installation.
https://github.com/skyhook-io/radar