DevOps&SRE Library

Patterns for Terraform Multi-Account Deployments

https://awstip.com/patterns-for-terraform-multi-account-deployments-f47d77d6f250

3.51K views07:01

DevOps&SRE Library

Group wait, Group interval and Repeat interval explained

https://www.grobinson.net/group-wait-group-interval-and-repeat-interval-explained.html

3.87K views15:00

DevOps&SRE Library

terraform-target-autocompletion

Press tab after --target and get suggestions for your resources and modules.

terraform-target-autocompletion is a Go program that rely on terraform-config-inspect for the heavy lifting. So it should work with any Terraform version. You don't need anything else than the binary and the completion scripts provided. But currently you'll need Go 1.21.0 installed to build it yourself.

https://github.com/shellwhale/terraform-target-autocompletion

4.33K views07:00

DevOps&SRE Library

Reducing high cardinality in Prometheus

https://sennasemakula.medium.com/reducing-high-cardinality-in-prometheus-3f110b6d9eb5

4.06K views15:01

DevOps&SRE Library

Network health overview with mtr, ss, lsof and iperf3

https://raduzaharia.medium.com/network-health-overview-with-mtr-ss-lsof-and-iperf3-8d0d2d191781

4.02K views07:01

DevOps&SRE Library

Scaling Kafka to Support PayPal’s Data Growth

Today, our Kafka fleet consists of over 1,500 brokers that host over 20,000 topics and close to 2,000 Mirror Maker nodes which are used to mirror the data among the clusters, offering 99.99% availability for our Kafka clusters. During the 2022 Retail Friday, Kafka traffic volume peaked at about 1.3 trillion messages per day! At present, we have 85+ Kafka clusters, and every holiday season we flex up our Kafka infrastructure to handle the traffic surge. The Kafka platform continues to seamlessly scale to support this traffic growth without any impact to our business.

https://medium.com/paypal-tech/scaling-kafka-to-support-paypals-data-growth-a0b4da420fab

4.08K views15:00

DevOps&SRE Library

Prometheus Certified Associate: A Comprehensive Guide

https://medium.com/@onai.rotich/prometheus-certified-associate-a-comprehensive-guide-9c51638578d2

3.72K views07:00

DevOps&SRE Library

harden-runner

Harden-Runner provides runtime security for GitHub-hosted and self-hosted environments

https://github.com/step-security/harden-runner

3.72K views15:02

DevOps&SRE Library

How Cloudflare runs Prometheus at scale

At the moment of writing this post we run 916 Prometheus instances with a total of around 4.9 billion time series.

https://blog.cloudflare.com/how-cloudflare-runs-prometheus-at-scale

3.61K views07:01

DevOps&SRE Library

cf-terraforming

cf-terraforming is a command line utility to facilitate terraforming your existing Cloudflare resources. It does this by using your account credentials to retrieve your configurations from the Cloudflare API and converting them to Terraform configurations that can be used with the Terraform Cloudflare provider.

This tool is ideal if you already have Cloudflare resources defined but want to start managing them via Terraform, and don't want to spend the time to manually write the Terraform configuration to describe them.

https://github.com/cloudflare/cf-terraforming

3.64K views15:01

DevOps&SRE Library

Multi-Cloud Strategies with Crunchy Postgres for Kubernetes

https://www.crunchydata.com/blog/multi-cloud-strategies-with-crunchy-postgres-for-kubernetes

4.03K views07:00

DevOps&SRE Library

How Agoda Transitioned to Private Cloud

https://medium.com/agoda-engineering/private-cloud-and-you-736d8d99a51e

3.75K views15:01

DevOps&SRE Library

Understanding Kubernetes Limits and Requests

When working with containers in Kubernetes, it’s important to know what are the resources involved and how they are needed. Some processes will require more CPU or memory than others. Some are critical and should never be starved. 

Knowing that, we should configure our containers and Pods properly in order to get the best of both.

https://sysdig.com/blog/kubernetes-limits-requests

3.77K views07:01

DevOps&SRE Library

Kubernetes OOM and CPU Throttling

Troubleshooting Memory and CPU problems

https://sysdig.com/blog/troubleshoot-kubernetes-oom

4.18K views15:01

DevOps&SRE Library

Exit Codes In Containers & Kubernetes – The Complete Guide

https://komodor.com/learn/exit-codes-in-containers-and-kubernetes-the-complete-guide

3.68K views07:01

DevOps&SRE Library

Deployment previews on Kubernetes

Deployment previews - made popular by platforms like Vercel and Netlify - are not commonplace in microservice architectures. At Blueground, we brought deployment previews to K8s using ArgoCD. Well, it turned out to be so good, it is worth sharing.

https://engineering.theblueground.com/deployment-previews

3.8K views15:01

DevOps&SRE Library

Managing Prometheus alerts in Kubernetes at scale using GitOps

https://tanmay-bhat.medium.com/managing-prometheus-alerts-in-kubernetes-at-scale-using-gitops-25d0ab4a2e2d

4.55K views07:00

DevOps&SRE Library

Sampling Strategies in Distributed Tracing — A Comprehensive Guide

https://medium.com/@varun_0K/sampling-strategies-in-distributed-tracing-a-comprehensive-guide-6e80092068c3

4.42K views15:01

DevOps&SRE Library

Google Cloud Synthetic Monitoring Tutorial

https://medium.com/google-cloud/google-cloud-synthetic-monitoring-tutorial-ce502f81bb24

4.09K views07:00

DevOps&SRE Library