DevOps&SRE Library
17.8K subscribers
458 photos
4 videos
2 files
4.75K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
Dodo Pizza: Как мы улучшали сервис SRE.

https://youtu.be/OWSebmnuhBw
clutch

Clutch provides everything you need to simplify operations and in turn improve your developer experience and operational capabilities. It comes with several out-of-the-box features for managing cloud-native infrastructure, but is designed to be org-agnostic and easily taught how to find or interact with whatever you run, wherever you run it.

https://github.com/lyft/clutch
SCALING THE HOTTEST APP IN TECH ON AWS AND KUBERNETES

A cloud conversation with Blake Stoddard, Sr SRE at Basecamp / HEY.

https://info.acloud.guru/resources/kubernetes-aws-cloud-scaling-hey
Application Configuration Management in Kubernetes

- Replicate and Customize
- Parameterized Templating
- Overlay Configuration
- Programmatic Configuration

https://www.giantswarm.io/blog/application-configuration-management-in-kubernetes
What Are the Hardest Parts of Kubernetes to Learn?

Many enterprises have already adopted Kubernetes or have a Kubernetes migration plan in place, making it clear that the platform is here to stay. While it provides a lot of benefits to its users, to take advantage of them, you need to thoroughly learn Kubernetes and how it works in production. Typically, the most difficult aspects of Kubernetes are learned through experience solving real-world problems. This post will focus on providing resources to help you do exactly that, while also explaining some of the core concepts behind Kubernetes.

https://logz.io/blog/what-are-the-hardest-parts-of-kubernetes-to-learn
Keeping Customers Streaming — The Centralized Site Reliability Practice at Netflix

https://netflixtechblog.com/keeping-customers-streaming-the-centralized-site-reliability-practice-at-netflix-205cc37aa9fb
kmoncon - Monitoring connectivity between your kubernetes nodes

A Kubernetes node connectivity tool that preforms frequent tests (tcp, udp and dns), and exposes Prometheus metrics that are enriched with the node name, and the locality information (such as zone), enabling you to correlate issues between availability zones or nodes.

https://github.com/Stono/kconmon