DevOps&SRE Library
17.8K subscribers
459 photos
4 videos
2 files
4.75K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
Azure Verified Module - Azure Landing Zones

In this article, we take a look at the Azure Verified Module for Azure Landing Zones, and how we can customise deployments.


P1: https://mikeguy.co.uk/posts/azure-verified-module-landing-zones-part-1

P2: https://mikeguy.co.uk/posts/azure-verified-module-landing-zones-part-2
What I Really Mean When I Say “Good Communication” in Incident Response

https://uptimelabs.io/good-communication-in-incident-response
As a Seasoned K8s Expert: An In-Depth Analysis of the OpenAI’s Incident and Mitigation Strategies

On December 11, 2024, OpenAI experienced a major outage caused by a failure in the Kubernetes cluster control plane. For outsiders, this may simply seem like an interesting incident, but as an insider, I analyzed this failure from a technical perspective.


https://midbai.com/en/post/how-to-avoid-openai-incident
Taming the Wild West of Research Computing: How Policies Saved Us a Thousand Headaches

Harnessing the power of policy-driven governance in shared computing environments


https://alessandropomponio.medium.com/taming-the-wild-west-of-research-computing-how-policies-saved-us-a-thousand-headaches-9432558f5740
We’re leaving Kubernetes

Kubernetes seems like the obvious choice for building out remote, standardized and automated development environments. We thought so too and have spent six years invested in making the most popular cloud development environment platform at internet scale. That’s 1.5 million users, where we regularly see thousands of development environments per day. In that time, we’ve found that Kubernetes is not the right choice for building development environments.


https://www.gitpod.io/blog/we-are-leaving-kubernetes
Istio-Proxy Chaos in the Middle of a Snowy Morning

December 4th, 2024, started as a peaceful, snowy morning. Around 8 AM, I settled into my work-from-home routine, having freshly brewed coffee. My usual workflow:

1. Check the Production dashboard to ensure everything is running smoothly — and it was.
2. Check my email and Slack to see if any team member needs help.
3. Open JIRA, pick up a task and get ready to dive into work.

There were no pressing issues to address. I opened JIRA and picked up a task to migrate one of our Infrastructures as a Code repository from Terragrunt to Terraform. This is a topic for another post to explain why.

Lucked out! The peace and serenity didn’t last long. An alert popped up: One of our production services had gone down. What started as a calm Wednesday morning quickly turned into a troubleshooting adventure.


https://medium.com/@zehendiaries/istio-proxy-chaos-in-the-middle-of-a-snowy-morning-6fe437cf3996
Mastering Compute Efficiency: Dynamic GPU Partitioning Strategies for Kubernetes-Based ML Systems

https://yashmehra2411.medium.com/mastering-gpu-efficiency-dynamic-partitioning-strategies-for-kubernetes-based-ml-systems-75100c94112b
Standardizing App Delivery with Flux and Generic Helm Charts

In this guide we will explore how Flux can be used to standardize the lifecycle management of applications by leveraging the Generic Helm Chart pattern.

The big promise of this pattern is that it should reduce the cognitive load on developers, as they only need to focus on the service-specific configuration, while the Generic Helm Chart shields them from the complexity of the Kubernetes API.


https://medium.com/@stefanprodan/standardizing-app-delivery-with-flux-and-generic-helm-charts-f66941f399e9
khronoscope

Khronoscope is a tool inspired by k9s that allows you to inspect the state of your Kubernetes cluster and travel back in time to see its state at any point since you started the application using VCR like controls.


https://github.com/hoyle1974/khronoscope
kubectl-switch

kubectl-switch is a command-line tool for managing and switching between multiple Kubernetes configuration files located in the same directory. It simplifies the process of selecting a Kubernetes context from multiple kubeconfig files and updating the active configuration or namespace.

Just dump all your kubeconfigs into a single dir and let kubectl-switch manage them for you!


https://github.com/mirceanton/kubectl-switch
f2

F2 is a cross-platform command-line tool for batch renaming files and directories quickly and safely. Written in Go!


https://github.com/ayoisaiah/f2
lnk

Git-native dotfiles management that doesn't suck.

Move your dotfiles to ~/.config/lnk, symlink them back, and use Git like normal. Supports both common configurations and host-specific setups.


https://github.com/yarlson/lnk
In defence of deployment freezes

Many organizations have periods when they restrict deployments to production. You may find yourself working for one, so it's best to be prepared for it, and protect yourself from the downsides.


https://thefridaydeploy.substack.com/p/in-defence-of-deployment-freezes
012: The MTTI Manifesto

Mean Time to Isolate


https://www.oldschoolburke.com/the-mtti-manifesto
Solving the Terraform Backend Chicken-and-Egg Problem

My preferred way to store Terraform state files is close to the provisioned infrastructure. In my case this is mostly Azure Blob Storage. This approach offers built-in benefits like RBAC, versioning, locking, and identity-based authentication, making it an excellent solution for state management at almost no cost.

However, there’s a catch: you need to create the storage account before Terraform can use it. This creates a chicken and egg problem - how do you provision the state storage using Terraform itself without manual steps or external scripts?

In this article, I’ll walk through a fully automated solution to deploy Terraform state storage in Azure Blob and import “self” state there, ensuring everything is managed declaratively from the start.


https://cloudchronicles.blog/blog/Solving-the-Terraform-Backend-Chicken-and-Egg-Problem