DevOps&SRE Library

Supporting Teams with Different Maturity Levels

https://medium.com/@hans.knechtions/supporting-teams-with-different-maturity-levels-c43f5b5080eb

3.07K views07:01

DevOps&SRE Library

sre-checklist

A checklist of anyone practicing Site Reliability Engineering

https://github.com/bregman-arie/sre-checklist

3.37K views15:02

DevOps&SRE Library

How to Get an SRE Role

https://certomodo.substack.com/p/how-to-get-an-sre-role

3.16K views07:01

DevOps&SRE Library

Why bother with SLI and SLO?

Is there really any value in setting service level indicators and objectives?

https://blog.alexewerlof.com/p/why-bother-with-sli-and-slo

3.12K views15:01

DevOps&SRE Library

The System Resiliency Pyramid

https://www.codereliant.io/the-system-resiliency-pyramid

3.18K views07:00

DevOps&SRE Library

Traffic Jams in the Cloud: Are Overloads Sabotaging Your Application's Reliability?

https://blog.fluxninja.com/blog/traffic-jams-in-the-cloud-unveiling-the-true-enemy-of-reliability

2.81K views15:02

DevOps&SRE Library

Slow Down! Rate Limiting Deep Dive

https://www.codereliant.io/rate-limiting-deep-dive

2.99K views07:00

DevOps&SRE Library

PostgreSQL: No More VACUUM, No More Bloat

PostgreSQL, a powerful open-source object-relational database system, has been lauded for its robustness, functionality, and flexibility. However, it is not without its challenges – one of which is the notorious VACUUM process. However, the dawn of a new era is upon us with OrioleDB, a novel engine designed for PostgreSQL that promises to eliminate the need for the resource-consuming VACUUM.

https://www.orioledata.com/blog/no-more-vacuum-in-postgresql

3.19K views16:00

DevOps&SRE Library

Identifying GCP’s Hidden Network Inter-Zone Egress Costs

Learn how to identify your Inter-Zone Egress costs in a few easy steps, using commonly available methods.

Ever wondered where those Inter-Zone Egress costs are coming from? Found yourself looking at GCP’s network pricing page many times to break it down? Me too. So I thought I might as well try to help clear things up.

https://www.doit.com/identifying-gcps-hidden-network-inter-zone-egress-costs

2.97K views07:00

DevOps&SRE Library

faasd

faasd is OpenFaaS reimagined, but without the cost and complexity of Kubernetes. It runs on a single host with very modest requirements, making it fast and easy to manage. Under the hood it uses containerd and Container Networking Interface (CNI) along with the same core OpenFaaS components from the main project.

https://github.com/openfaas/faasd

3.1K views16:00

DevOps&SRE Library

blazingmq

BlazingMQ is an open source distributed message queueing framework, which focuses on efficiency, reliability, and a rich feature set for modern-day workflows.

At its core, BlazingMQ provides durable, fault-tolerant, highly performant, and highly available queues, along with features like various message routing strategies (e.g., work queues, priority, fan-out, broadcast, etc.), compression, strong consistency, poison pill detection, etc.

https://github.com/bloomberg/blazingmq

3.19K views07:00

DevOps&SRE Library

Scaling Terraform with Terramate

In CWISE we use Terraform a lot. The most common use cases for Terraform for us is cloud resource provisioning, Kubernetes configuration management, and SaaS services (like Github/Gitlab) management.  

We prefer Terraform over many other competitors due multiple reasons: 

- Tried and tested tool, has been around for a long time and Hashicorp is doing great work of developing it. Can be defined as mature and even boring technology;

- A large number of community resources like providers, modules, and documentation; 

- Good developer experience due to support in IDE's and support tools;

- Has got a configuration state (database);

https://www.cwise.eu/post/scaling-terraform-with-terramate

3.18K views16:00

DevOps&SRE Library

Optimizing AWS Infrastructure: Leveraging Terraform for Low Coupling and High Cohesion

https://medium.com/@itsnarayan/optimizing-aws-infrastructure-leveraging-terraform-for-low-coupling-and-high-cohesion-a5ae6049ab1e

3.07K views07:00

DevOps&SRE Library

Ultimate Guide to Passing the Terraform Exam

https://www.linkedin.com/pulse/ultimate-guide-passing-terraform-exam-mesut-oezdil

3.77K views16:01

DevOps&SRE Library

terraform-tui

TFTUI is a powerful textual GUI that empowers users to effortlessly view and interact with their Terraform state.

With its latest version you can easily visualize the complete state tree, gaining deeper insights into your infrastructure's current configuration. Additionally, the ability to inspect individual resource states allows you to focus on specific details for better analysis and management. Lastly, it's now possible to select resources and perform actions such as tainting and untainting.

https://github.com/idoavrah/terraform-tui

3.13K views07:00

DevOps&SRE Library

Building a Successful SRE Team

Successful techniques to ensure your SRE team delivers value

https://medium.com/@hans.knechtions/building-a-successful-sre-team-283232bc2694

3.37K views16:02

DevOps&SRE Library

Lesson learned while scaling Kubernetes cluster to 1000 pods in AWS EKS

https://devopslearning.medium.com/lesson-learned-while-scaling-kubernetes-cluster-to-1000-pods-in-aws-eks-d2d399152bc2

3.26K views07:01

DevOps&SRE Library

How to avoid global outage — Seamlessly migrating DaemonSet labels

As Site Reliability Engineering Team, we continuously strive to improve the systems we operate. One way to do so is to stay up-to-date with upstream components. One of the components that needed some special care turned out to be a CSI Driver, which is installed in the Kubernetes cluster as DaemonSet. Originally, the driver was installed in the cluster using YAML manifest and kubectl. As the dev team moved to support Helm, we also wanted to utilize Helm Chart for the driver to ease our lives.

https://engineering.prezi.com/intro-4727024fc2c1

3.15K views15:00

DevOps&SRE Library

Comparing Kubernetes operators for PostgreSQL. Part 2: CloudNativePG

https://blog.palark.com/cloudnativepg-and-other-kubernetes-operators-for-postgresql

3.25K views07:01

DevOps&SRE Library

Container Security Site

This is a site with some container security resources. It is (and probably always will be) a work in progress, but hopefully you’ll find some useful information.

https://www.container-security.site

3.01K views15:15

About

Blog

Apps

Platform