DevOps&SRE Library
17.8K subscribers
461 photos
4 videos
2 files
4.76K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
OpenTelemetry Java: All you need to know

https://lightstep.com/blog/opentelemetry-java
tobs

Tobs is a tool that aims to make it as easy as possible to install a full observability stack into a Kubernetes cluster.

https://github.com/timescale/tobs
The Big Little Guide to Message Queues

https://sudhir.io/the-big-little-guide-to-message-queues
Operable Software

In this post, I'll cover views on simplicity and complexity, how people actually approach their systems and form mental models of them, and how we should rather structure things if we want to make systems both observable and operable.

https://ferd.ca/operable-software.html
Writing Runbook Documentation When You’re An SRE

Tips and tricks for writing effective runbook documentation when you aren’t a technical writer

https://www.transposit.com/blog/2020.01.30-writing-runbook-documentation-when-youre-an-sre
Athenz

Athenz is an open source platform for X.509 certificate based service authentication and fine grained access control in dynamic infrastructures. It supports provisioning and configuration (centralized authorization) use cases as well as serving/runtime (decentralized authorization) use cases. Athenz authorization system utilizes x.509 certificates and industry standard mutual TLS bound oauth2 access tokens. The name “Athenz” is derived from “AuthNZ” (N for authentication and Z for authorization).

https://github.com/yahoo/athenz
owncast

Take control over your live stream video by running it yourself. Streaming + chat out of the box.

https://github.com/owncast/owncast
Managing technical quality in a codebase

- Trust metrics over intuition
- Keep your intuition fresh
- Listen to, and learn from, your users
- Do fewer things, but do them better
- Don’t horde impact

https://lethain.com/managing-technical-quality
Top Considerations when Evaluating an Ingress Controller for Kubernetes

1) Traffic protocol support
2) Client management
3) Traffic routing
4) Resiliency
5) Load balancing algorithms
6) Authentication
7) Observability
8) Kubernetes Integration
9) Traffic routing
10) Interface

https://releaseops.io/blog/top-considerations-when-evaluating-an-ingress-controller-for-kubernetes
Lessons learned in incident management

At Dropbox, we view incident management as a central element of our reliability efforts.

https://dropbox.tech/infrastructure/lessons-learned-in-incident-management
The Google Cloud Developer's Visual Notes

Every product in the Google Cloud family described in the visual sketchnote format to grasp the capability of the tools quickly and easily.

https://github.com/priyankavergadia/GCPSketchnote
Migrating Your Open Source Builds Off Of Travis CI

https://blog.earthly.dev/migrating-from-travis
How to Prepare for a Site Reliability Engineer Interview

https://victorops.com/blog/preparing-for-a-site-reliability-engineer-interview