DevOps&SRE Library

012: The MTTI Manifesto

Mean Time to Isolate

https://www.oldschoolburke.com/the-mtti-manifesto

4.68K views07:05

Solving the Terraform Backend Chicken-and-Egg Problem

My preferred way to store Terraform state files is close to the provisioned infrastructure. In my case this is mostly Azure Blob Storage. This approach offers built-in benefits like RBAC, versioning, locking, and identity-based authentication, making it an excellent solution for state management at almost no cost.

However, there’s a catch: you need to create the storage account before Terraform can use it. This creates a chicken and egg problem - how do you provision the state storage using Terraform itself without manual steps or external scripts?

In this article, I’ll walk through a fully automated solution to deploy Terraform state storage in Azure Blob and import “self” state there, ensuring everything is managed declaratively from the start.

https://cloudchronicles.blog/blog/Solving-the-Terraform-Backend-Chicken-and-Egg-Problem

3.38K views15:01

DevOps&SRE Library

CI/CD Security: Using Checkov to enforce security with terraform

The purpose of this tutorial is to provide a solid starting point for enforcing security best practices in your Terraform scripts.

https://igorzhivilo.com/2025/02/11/checkov-ci

3.75K views07:02

DevOps&SRE Library

Terraform Modules Monorepo On GitLab

After several years of working with GitHub and Azure DevOps on a daily basis, using different tools feels counterintuitive to me. However, one of my clients is deeply integrated with GitLab. Since I was hired to resolve some issues, I saw this as the perfect opportunity to dive deep into GitLab CI and implement a robust, version-controlled approach that supports collaboration while maintaining security and documentation standards.

This guide presents an advanced implementation of a Terraform modules monorepo using GitLab, featuring automated versioning, security scanning, and documentation generation.

https://cloudchronicles.blog/blog/Terraform-Modules-Monorepo-On-GitLab

4.08K views15:02

DevOps&SRE Library

Steps to Break Up a Terralith

In this follow-up to our "What Is a Terralith?" article, we shift the focus from describing the problem to providing a detailed migration plan, practical guidance, and a handy checklist for breaking up a Terralith into smaller, more manageable root modules.

https://masterpoint.io/blog/steps-to-break-up-a-terralith

4.04K views07:01

DevOps&SRE Library

hyperfine

A command-line benchmarking tool.

https://github.com/sharkdp/hyperfine

3.93K views15:02

DevOps&SRE Library

railpack

Railpack is a tool for building images from source code with minimal configuration. It is the successor to Nixpacks and incorporates many of the learnings from running Nixpacks in production at Railway for several years.

https://github.com/railwayapp/railpack

3.93K views07:03

DevOps&SRE Library

pgdog

PgDog is a transaction pooler and logical replication manager that can shard PostgreSQL. Written in Rust, PgDog is fast, secure and can manage hundreds of databases and hundreds of thousands of connections.

https://github.com/pgdogdev/pgdog

3.95K views15:04

DevOps&SRE Library

Unleashing the Power of k3s for Edge Computing: Deploying 3000+ in-store Kubernetes Clusters — Part 1

https://jysk.tech/unleashing-the-power-of-k3s-for-edge-computing-deploying-3000-in-store-kubernetes-clusters-part-77ecc5378d31

3000+ Clusters Part 2: The journey in edge compute with Talos Linux

https://jysk.tech/3000-clusters-part-2-the-journey-in-edge-compute-with-talos-linux-82f42bf9f958

3.51K views07:03

DevOps&SRE Library

Vertical Pod Autoscaler (VPA): A Deep Dive - Part 1

In this post, I want to dive deep into VPA (version 1.3.0), explain why it could be useful for you, and provide a quick overview in the first section if you're short on time. This article mainly focuses on the Recommender component - I'll cover the other two in a future post.

https://erikzilinsky.com/posts/vpa1.html

3.82K views15:05

DevOps&SRE Library

OPA memory usage considerations and lessons from our transition to Kyverno

https://medium.com/adevinta-tech-blog/opa-memory-usage-considerations-and-lessons-from-our-transition-to-kyverno-bd23bd8a68bf

3.34K views07:00

DevOps&SRE Library

Kubernetes Best Practices I Wish I Had Known Before

1. Don't Skimp on Resource Requests and Limits
2. Namespace Like Your Life Depends on It
3. Avoid Running Multiple Containers in One Pod Unless Necessary
4. Use a Package Manager for Your YAML Files
5. Ingress and Networking Best Practices
6. Lean On Liveness, Readiness, and Startup Probes
7. Mind Your Security: RBAC, Pod Security, and Secrets
8. Monitor Everything (And Then Monitor Some More)
9. Automate Deployments with CI/CD
10. Keep Your Kubernetes Cluster and Components Updated
11. Use Labels and Annotations Wisely
12. Adopt a Multi-Environment Approach
13. Optimize Your Container Images
14. Implement a Reliable Logging Strategy
15. Treat Kubernetes Like Cattle, Not a Pet
16. Consider a Higher-Level Approach for Complex Deployments
17. Final Thoughts

https://www.pulumi.com/blog/kubernetes-best-practices-i-wish-i-had-known-before

4.11K views15:01

DevOps&SRE Library

The Ripple Effect: How a Single Push Notification Brought Down Our Kubernetes Cluster

Ever notice how major system failures rarely start with major problems? That's exactly what happened to us when a simple push notification exposed the fragility of our Kubernetes infrastructure. But here's the twist: it wasn’t a bug that took us down—it was our own success.

https://dev.to/aws-builders/the-ripple-effect-how-a-single-push-notification-brought-down-our-kubernetes-cluster-c9i

3.35K views07:02

DevOps&SRE Library

Quality gate for helm charts

What is a quality gate? A quality gate is a milestone in an IT project that requires that predefined criteria be met before the project can proceed to the next phase. We set quality gates for code programs, run unit/integration/acceptance tests, and run static code analysis before merging code from the developer's branch into the main branch. But do we set quality gates for helm charts? Or should we?

We should and I will present an example of how to do that.

https://medium.com/@michamarszaek/quality-gate-for-helm-charts-f260f5742198

4.13K views15:02

DevOps&SRE Library

How to Host a 100 CPU Core, 400 GB RAM Cluster on a Budget

In this article, I will share how I built a computing cluster with around 100 CPU cores and approximately 400 GB of RAM while keeping costs as low as possible.

https://medium.com/@florianmhlhans/how-to-host-a-100-cpu-core-400-gb-ram-cluster-on-a-budget-f6cdf992eae3

4.26K views07:02

DevOps&SRE Library

Deploying Your AKS Cluster with Terraform: Key Points for a Successful Production Rollout

https://medium.com/h7w/deploying-your-aks-cluster-with-terraform-key-points-for-a-successful-production-rollout-e92f1238906f

4.16K views15:05

DevOps&SRE Library

My PodDisruptionBudget bible to use with Karpenter and friends

https://dev.to/aws-builders/my-poddisruptionbudget-bible-to-use-with-karpenter-and-friends-59fl

4.08K views07:02

DevOps&SRE Library

Adrift in the Cloud: A Forensic Dive into Container Drift

Adding container drift detection to Google’s Container Explorer

https://detect.fyi/adrift-in-the-cloud-a-forensic-dive-into-container-drift-f29524f4f6c4

3.91K views15:01

DevOps&SRE Library

KubeDiagrams

Generate Kubernetes architecture diagrams from Kubernetes manifest files, kustomization files, Helm charts, helmfile descriptors, and actual cluster state.

https://github.com/philippemerle/KubeDiagrams

3.94K views07:00

DevOps&SRE Library

Mastering the OpenTelemetry Transformation Language (OTTL)

The OpenTelemetry ecosystem continues to evolve with powerful tools that enhance your observability strategy. Among these, the OpenTelemetry Transformation Language (OTTL) stands out as an incredible capability for manipulating and transforming telemetry data.

This guide explores what OTTL is, how it works, and how you can leverage it to maximize the value of your observability data with minimal effort.

https://www.dash0.com/guides/opentelemetry-transformation-language-ottl

3.96K views15:05

About

Blog

Apps

Platform