DevOps & SRE notes
12K subscribers
39 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
This piece, "The MTTI Manifesto," argues for the importance of a new metric in incident response: Mean Time to Isolate. The author contends that the majority of outage time is spent identifying the problem's source, not fixing it, and that focusing on MTTI can drive significant improvements in system architecture and observability.
https://www.oldschoolburke.com/the-mtti-manifesto/
๐Ÿ‘5
This write-up explores the emerging discipline of AI Reliability Engineering (AIRe) as the "Third Age of SRE." It argues that the unique challenges of AI workloads, such as their probabilistic nature and new failure modes like model decay, require an evolution of traditional Site Reliability Engineering principles.
https://thenewstack.io/ai-reliability-engineering-welcome-to-the-third-age-of-sre/
This dispatch offers a detailed walkthrough for backend engineers on creating a Kubernetes Operator using Go and Kubebuilder. The author, Amr Elhewy, simplifies complex DevOps concepts by building a practical "PodTracker" operator that sends Slack notifications for new pod creations.
https://hewi.blog/a-backend-engineer-lost-in-the-devops-world-making-a-kubernetes-operator-with-go
๐Ÿ”ฅ3
Forwarded from AWS Notes (Roman Siewko)
๐Ÿ”ฅ FREE premium exam prep on AWS Skill Builder until Jan 5, 2026!

https://skillbuilder.aws/

๐ŸŽ“ ๐—–๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐˜€:
๐Ÿ”ธAWS Certified Cloud Practitioner (CLF-C02)
๐Ÿ”ธAWS AI Practitioner

๐Ÿ’ก ๐—ช๐—ต๐—ฎ๐˜ ๐˜†๐—ผ๐˜‚ ๐—ด๐—ฒ๐˜ (๐—ป๐—ผ๐—ฟ๐—บ๐—ฎ๐—น๐—น๐˜† ๐—ฝ๐—ฎ๐—ถ๐—ฑ):
โœ… Official practice exams
โœ… Hands-on labs (SimuLearn)
โœ… AWS Escape Room (learning by playing)
โœ… Flashcards & learning plans

Plus, there are always-free resources:
โ€ข Official practice questions
โ€ข Free AWS training events
โ€ข AWS Educate (labs + potential free exam vouchers)

#AWS_certification
๐Ÿ”ฅ3
This post compares Amazon EKS Auto Mode and Azure AKS Automatic, evaluating which platform offers a superior managed Kubernetes solution. While acknowledging AWS's progress, the author ultimately argues that AKS Automatic's more comprehensive, end-to-end automation makes it the clear winner for a truly hands-off experience.
https://pixelrobots.co.uk/2024/12/amazon-eks-auto-mode-vs-azure-aks-automatic-the-better-managed-kubernetes-solution/
This paper delves into disaster recovery architectures that go beyond simple high availability to ensure systems remain operational even when HA fails. Yakaiah Bommishetti outlines various DR strategies, from cold backups to active-active multi-site setups, emphasizing the critical difference between preventing failures and restoring services after a catastrophe.
https://hackernoon.com/beyond-high-availability-disaster-recovery-architectures-that-keep-running-when-ha-fails
โคโ€๐Ÿ”ฅ3โค2
Cloudflare, again
๐Ÿคฃ5๐Ÿ”ฅ4๐Ÿ‘3
This case study examines the build-versus-buy decision for Terraform CI/CD orchestration by analyzing a custom-built tool called Terraflow. The author reflects on the trade-offs between creating a bespoke solution that perfectly fits a specific workflow and the opportunity cost of diverting engineering resources from core business features.
https://terrateam.io/blog/build-vs-buy-terraflow-case-study
๐Ÿ‘4โค2
This tutorial guides readers through building a unified OpenTelemetry pipeline in Kubernetes to correlate metrics, logs, and traces. Fatih Koรง explains how to deploy the OTel Collector as both a DaemonSet and a gateway to centralize enrichment and sampling, ultimately reducing incident resolution time.
https://fatihkoc.net/posts/opentelemetry-kubernetes-pipeline/
๐Ÿ‘5
This documentation demystifies the structure of Kubernetes YAML files by breaking them down into their three core components: metadata, spec, and status. It explains how users define the desired state in the spec, while Kubernetes continuously works to align the actual status with that intent through its reconciliation loop.
https://medium.com/@thisara.weerakoon2001/demystifying-kubernetes-yaml-ef9e92acf3df
๐Ÿ‘3
This engineering publication from DoubleVerify presents a case study on synchronizing database schema updates across multiple projects and environments. The team developed a solution using a shared, standalone schema migrations repository and Kubernetes pre-install hooks to automate and coordinate the process.
https://medium.com/doubleverify-engineering/a-case-study-in-synchronizing-database-schema-updates-between-projects-and-environments-a69a3cc38985
๐Ÿ‘3โค2