DevOps&SRE Library
17.8K subscribers
459 photos
4 videos
2 files
4.75K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
milvus

A cloud-native vector database, storage for next generation AI applications

https://github.com/milvus-io/milvus
Real-World GitOps with Flux, Flagger, and Linkerd

https://linkerd.io/2023/05/15/real-world-gitops
How to create a cron job docker container using AWS ECS, Fargate, fully automated with Terraform

https://noiselesstech.net/2023/05/04/how-to-create-a-cron-job-docker-container-using-aws-ecs-fargate-fully-automated-with-terraform
Terraform automation for teams. Purpose-built for GitHub

Start running Terraform with cost estimation, security alerts, drift detection, access controls, and OPA policy testing. All within GitHub's UI.

https://terrateam.io
terrap-cli

Terrap - a powerful CLI tool that scans your infrastructure and identifies any required changes.

https://github.com/sirrend/terrap-cli
tfc-workflows-github

This repo includes prescriptive workflows that implement best practices when interacting with Terraform Cloud. These starter workflow templates provide a entrypoint to integrate your CI/CD pipelines with Terraform Cloud.

https://github.com/hashicorp/tfc-workflows-github
IaC & GitOps with EKS blueprints

TLDR; Need a cluster up and running fast? Take a close look at eks-blueprints, I got started in minutes and have been working with it for almost 2 years now.

https://medium.com/everything-full-stack/iac-gitops-with-eks-blueprints-7a28ad1f702a
Percentiles don’t work: Analyzing the distribution of response times for web services

https://adrianco.medium.com/percentiles-dont-work-analyzing-the-distribution-of-response-times-for-web-services-ace36a6a2a19
etcd: getting 30% more write/s

My team at Zendesk look after around 30 Kubernetes clusters. These are all self managed, meaning we maintain the API servers, and as you may guess: etcd.

Recently, I had a task to do some performance analysis on our etcd clusters. It had been a while since we ran any sort of benchmarking. Plus I wanted to get my hands dirty as I haven’t got much experience in tuning databases.

While I ended up getting about a 30% increase in performance, I learnt a lot about how databases, and by extension; how disks work together.

https://zendesk.engineering/etcd-getting-30-more-write-s-318bcdbf7774
How to Monitor CoreDNS

CoreDNS is a DNS add-on for Kubernetes environments. It is one of the components running in the control plane nodes, and having it fully operational and responsive is key for the proper functioning of Kubernetes clusters. Learning how to monitor CoreDNS, and what its most important metrics are, is a must for operations teams.

https://sysdig.com/blog/how-to-monitor-coredns
Managing Grafana Dashboards With Terraform

We’ve all done it — deleted a graph from a dashboard, realised we still need it but have forgotten the query. Use Terraform to go back in time and save yourself the headache

https://betterprogramming.pub/managing-grafana-dashboards-with-terraform-ad49ff6bb552
The DevOps Hangover

The greatest irony is that DevOps aimed to help developers talk with operations, and vice versa, yet developers are even further away from understanding how their code operates at runtime.

https://www.linkedin.com/pulse/devops-hangover-pete-cheslock
Moving Terraform Managed Resources Between States for Scaling AWS Infrastructure in Startups

https://fivexl.io/blog/terraform-mv
Why `fsync()`: Losing unsynced data on a single node leads to global data loss

Regardless of the replication mechanism you must fsync() your data to prevent global data loss in non-Byzantine protocols.

https://redpanda.com/blog/why-fsync-is-needed-for-data-safety-in-kafka-or-non-byzantine-protocols