DevOps&SRE Library
17.8K subscribers
459 photos
4 videos
2 files
4.75K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
sre-checklist

A checklist of anyone practicing Site Reliability Engineering

https://github.com/bregman-arie/sre-checklist
Why bother with SLI and SLO?

Is there really any value in setting service level indicators and objectives?

https://blog.alexewerlof.com/p/why-bother-with-sli-and-slo
Traffic Jams in the Cloud: Are Overloads Sabotaging Your Application's Reliability?

https://blog.fluxninja.com/blog/traffic-jams-in-the-cloud-unveiling-the-true-enemy-of-reliability
Slow Down! Rate Limiting Deep Dive

https://www.codereliant.io/rate-limiting-deep-dive
PostgreSQL: No More VACUUM, No More Bloat

PostgreSQL, a powerful open-source object-relational database system, has been lauded for its robustness, functionality, and flexibility. However, it is not without its challenges – one of which is the notorious VACUUM process. However, the dawn of a new era is upon us with OrioleDB, a novel engine designed for PostgreSQL that promises to eliminate the need for the resource-consuming VACUUM.

https://www.orioledata.com/blog/no-more-vacuum-in-postgresql
Identifying GCP’s Hidden Network Inter-Zone Egress Costs

Learn how to identify your Inter-Zone Egress costs in a few easy steps, using commonly available methods.

Ever wondered where those Inter-Zone Egress costs are coming from? Found yourself looking at GCP’s network pricing page many times to break it down? Me too. So I thought I might as well try to help clear things up.

https://www.doit.com/identifying-gcps-hidden-network-inter-zone-egress-costs
faasd

faasd is OpenFaaS reimagined, but without the cost and complexity of Kubernetes. It runs on a single host with very modest requirements, making it fast and easy to manage. Under the hood it uses containerd and Container Networking Interface (CNI) along with the same core OpenFaaS components from the main project.

https://github.com/openfaas/faasd
blazingmq

BlazingMQ is an open source distributed message queueing framework, which focuses on efficiency, reliability, and a rich feature set for modern-day workflows.

At its core, BlazingMQ provides durable, fault-tolerant, highly performant, and highly available queues, along with features like various message routing strategies (e.g., work queues, priority, fan-out, broadcast, etc.), compression, strong consistency, poison pill detection, etc.

https://github.com/bloomberg/blazingmq
Scaling Terraform with Terramate

In CWISE we use Terraform a lot. The most common use cases for Terraform for us is cloud resource provisioning, Kubernetes configuration management, and SaaS services (like Github/Gitlab) management.  

We prefer Terraform over many other competitors due multiple reasons:

- Tried and tested tool, has been around for a long time and Hashicorp is doing great work of developing it. Can be defined as mature and even boring technology;

- A large number of community resources like providers, modules, and documentation;

- Good developer experience due to support in IDE's and support tools;

- Has got a configuration state (database);

https://www.cwise.eu/post/scaling-terraform-with-terramate
terraform-tui

TFTUI is a powerful textual GUI that empowers users to effortlessly view and interact with their Terraform state.

With its latest version you can easily visualize the complete state tree, gaining deeper insights into your infrastructure's current configuration. Additionally, the ability to inspect individual resource states allows you to focus on specific details for better analysis and management. Lastly, it's now possible to select resources and perform actions such as tainting and untainting.

https://github.com/idoavrah/terraform-tui
Building a Successful SRE Team

Successful techniques to ensure your SRE team delivers value

https://medium.com/@hans.knechtions/building-a-successful-sre-team-283232bc2694
How to avoid global outage — Seamlessly migrating DaemonSet labels

As Site Reliability Engineering Team, we continuously strive to improve the systems we operate. One way to do so is to stay up-to-date with upstream components. One of the components that needed some special care turned out to be a CSI Driver, which is installed in the Kubernetes cluster as DaemonSet. Originally, the driver was installed in the cluster using YAML manifest and kubectl. As the dev team moved to support Helm, we also wanted to utilize Helm Chart for the driver to ease our lives.

https://engineering.prezi.com/intro-4727024fc2c1
Comparing Kubernetes operators for PostgreSQL. Part 2: CloudNativePG

https://blog.palark.com/cloudnativepg-and-other-kubernetes-operators-for-postgresql
Container Security Site

This is a site with some container security resources. It is (and probably always will be) a work in progress, but hopefully you’ll find some useful information.

https://www.container-security.site