Best practices for avoiding race conditions in inhibition rules
https://www.grobinson.net/best-practices-for-avoiding-race-conditions-in-inhibition-rules.html
https://www.grobinson.net/best-practices-for-avoiding-race-conditions-in-inhibition-rules.html
Understanding Multi-arch Containers, Benefits and CI/CD Integration
In this blog post, we will learn what are multi-arch container images? How it works? How to build and promote them? and we will write a sample code for building a multi-arch image in the CI/CD pipeline.https://www.infracloud.io/blogs/multi-arch-containers-ci-cd-integration
skipper
Skipper is an HTTP router and reverse proxy for service composition. It's designed to handle >300k HTTP route definitions with detailed lookup conditions, and flexible augmentation of the request flow with filters. It can be used out of the box or extended with custom lookup, filter logic and configuration sources.https://github.com/zalando/skipper
Top 10 Cloud Provider Comparison 2023: VM Performance / Price
https://dev.to/dkechag/cloud-vm-performance-value-comparison-2023-perl-more-1kpp
https://dev.to/dkechag/cloud-vm-performance-value-comparison-2023-perl-more-1kpp
hyperdx
HyperDX helps engineers figure out why production is broken faster by centralizing and correlating logs, metrics, traces, exceptions and session replays in one place. An open source and developer-friendly alternative to Datadog and New Relic.https://github.com/hyperdxio/hyperdx
The Art of Building Fault-Tolerant Software Systems
Eight Pillars of Fault-tolerant Systems:https://www.codereliant.io/the-art-of-building-fault-tolerant-software-systems
- Redundancy and Replication
- Load balancing
- Modularity
- Graceful degradation
- Circuit breaker
- Fail-fast
- Retries
- Rate limiting
Patterns for Terraform Multi-Account Deployments
https://awstip.com/patterns-for-terraform-multi-account-deployments-f47d77d6f250
https://awstip.com/patterns-for-terraform-multi-account-deployments-f47d77d6f250
Group wait, Group interval and Repeat interval explained
https://www.grobinson.net/group-wait-group-interval-and-repeat-interval-explained.html
https://www.grobinson.net/group-wait-group-interval-and-repeat-interval-explained.html
terraform-target-autocompletion
Press tab after --target and get suggestions for your resources and modules.https://github.com/shellwhale/terraform-target-autocompletion
terraform-target-autocompletion is a Go program that rely on terraform-config-inspect for the heavy lifting. So it should work with any Terraform version. You don't need anything else than the binary and the completion scripts provided. But currently you'll need Go 1.21.0 installed to build it yourself.
Reducing high cardinality in Prometheus
https://sennasemakula.medium.com/reducing-high-cardinality-in-prometheus-3f110b6d9eb5
https://sennasemakula.medium.com/reducing-high-cardinality-in-prometheus-3f110b6d9eb5
Network health overview with mtr, ss, lsof and iperf3
https://raduzaharia.medium.com/network-health-overview-with-mtr-ss-lsof-and-iperf3-8d0d2d191781
https://raduzaharia.medium.com/network-health-overview-with-mtr-ss-lsof-and-iperf3-8d0d2d191781
Scaling Kafka to Support PayPal’s Data Growth
Today, our Kafka fleet consists of over 1,500 brokers that host over 20,000 topics and close to 2,000 Mirror Maker nodes which are used to mirror the data among the clusters, offering 99.99% availability for our Kafka clusters. During the 2022 Retail Friday, Kafka traffic volume peaked at about 1.3 trillion messages per day! At present, we have 85+ Kafka clusters, and every holiday season we flex up our Kafka infrastructure to handle the traffic surge. The Kafka platform continues to seamlessly scale to support this traffic growth without any impact to our business.https://medium.com/paypal-tech/scaling-kafka-to-support-paypals-data-growth-a0b4da420fab
Prometheus Certified Associate: A Comprehensive Guide
https://medium.com/@onai.rotich/prometheus-certified-associate-a-comprehensive-guide-9c51638578d2
https://medium.com/@onai.rotich/prometheus-certified-associate-a-comprehensive-guide-9c51638578d2
harden-runner
Harden-Runner provides runtime security for GitHub-hosted and self-hosted environmentshttps://github.com/step-security/harden-runner
How Cloudflare runs Prometheus at scale
At the moment of writing this post we run 916 Prometheus instances with a total of around 4.9 billion time series.https://blog.cloudflare.com/how-cloudflare-runs-prometheus-at-scale
cf-terraforming
cf-terraforming is a command line utility to facilitate terraforming your existing Cloudflare resources. It does this by using your account credentials to retrieve your configurations from the Cloudflare API and converting them to Terraform configurations that can be used with the Terraform Cloudflare provider.https://github.com/cloudflare/cf-terraforming
This tool is ideal if you already have Cloudflare resources defined but want to start managing them via Terraform, and don't want to spend the time to manually write the Terraform configuration to describe them.
Multi-Cloud Strategies with Crunchy Postgres for Kubernetes
https://www.crunchydata.com/blog/multi-cloud-strategies-with-crunchy-postgres-for-kubernetes
https://www.crunchydata.com/blog/multi-cloud-strategies-with-crunchy-postgres-for-kubernetes
How Agoda Transitioned to Private Cloud
https://medium.com/agoda-engineering/private-cloud-and-you-736d8d99a51e
https://medium.com/agoda-engineering/private-cloud-and-you-736d8d99a51e
Understanding Kubernetes Limits and Requests
When working with containers in Kubernetes, it’s important to know what are the resources involved and how they are needed. Some processes will require more CPU or memory than others. Some are critical and should never be starved.https://sysdig.com/blog/kubernetes-limits-requests
Knowing that, we should configure our containers and Pods properly in order to get the best of both.
Kubernetes OOM and CPU Throttling
Troubleshooting Memory and CPU problemshttps://sysdig.com/blog/troubleshoot-kubernetes-oom