DevOps & SRE notes – Telegram

DevOps & SRE notes

@devops_sre_notes

12K subscribers

39 photos

19 files

2.5K links

Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak

All ways to support https://telegra.ph/How-support-the-channel-02-19

Download Telegram

About

Blog

Apps

Platform

DevOps & SRE notes

12K subscribers

DevOps & SRE notes

This piece, "The MTTI Manifesto," argues for the importance of a new metric in incident response: Mean Time to Isolate. The author contends that the majority of outage time is spent identifying the problem's source, not fixing it, and that focusing on MTTI can drive significant improvements in system architecture and observability.
https://www.oldschoolburke.com/the-mtti-manifesto/

Old School Burke

012: The MTTI Manifesto

Mean Time to Isolate

👍5

1.58K viewstutunak, 09:02

DevOps & SRE notes

The Airgap Native Packager Manager for Kubernetes

https://github.com/zarf-dev/zarf

GitHub - zarf-dev/zarf: The Airgap Native Packager Manager for Kubernetes

The Airgap Native Packager Manager for Kubernetes. Contribute to zarf-dev/zarf development by creating an account on GitHub.

❤1

1.66K viewstutunak, 16:04

DevOps & SRE notes

AWSDoor is a red team automation tool designed to simulate advanced attacker behavior in AWS environments

https://github.com/OtterHacker/AWSDoor

GitHub - OtterHacker/AWSDoor: AWSDoor is a red team automation tool designed to simulate advanced attacker behavior in AWS environments

AWSDoor is a red team automation tool designed to simulate advanced attacker behavior in AWS environments - OtterHacker/AWSDoor

❤2

1.53K viewstutunak, 09:04

DevOps & SRE notes

This write-up explores the emerging discipline of AI Reliability Engineering (AIRe) as the "Third Age of SRE." It argues that the unique challenges of AI workloads, such as their probabilistic nature and new failure modes like model decay, require an evolution of traditional Site Reliability Engineering principles.
https://thenewstack.io/ai-reliability-engineering-welcome-to-the-third-age-of-sre/

AI Reliability Engineering: Welcome to the Third Age of SRE

SREs must build AI we can trust, leveraging the emerging ecosystem of tools and standards.

1.54K viewstutunak, 16:01

DevOps & SRE notes

This dispatch offers a detailed walkthrough for backend engineers on creating a Kubernetes Operator using Go and Kubebuilder. The author, Amr Elhewy, simplifies complex DevOps concepts by building a practical "PodTracker" operator that sends Slack notifications for new pod creations.
https://hewi.blog/a-backend-engineer-lost-in-the-devops-world-making-a-kubernetes-operator-with-go

🔥3

1.61K viewstutunak, 09:04

DevOps & SRE notes

MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle

https://github.com/polyaxon/polyaxon

GitHub - polyaxon/polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle

MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle - polyaxon/polyaxon

👍4

1.46K viewstutunak, 16:02

DevOps & SRE notes

OpenYurt - Extending your native Kubernetes to edge(project under CNCF)

https://github.com/openyurtio/openyurt

GitHub - openyurtio/openyurt: OpenYurt - Extending your native Kubernetes to edge(project under CNCF)

OpenYurt - Extending your native Kubernetes to edge(project under CNCF) - openyurtio/openyurt

👍3

1.42K viewstutunak, 09:03

DevOps & SRE notes

Forwarded from AWS Notes (Roman Siewko)

🔥 FREE premium exam prep on AWS Skill Builder until Jan 5, 2026!

https://skillbuilder.aws/

🎓 𝗖𝗼𝘃𝗲𝗿𝘀:
🔸AWS Certified Cloud Practitioner (CLF-C02)
🔸AWS AI Practitioner

💡 𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝗴𝗲𝘁 (𝗻𝗼𝗿𝗺𝗮𝗹𝗹𝘆 𝗽𝗮𝗶𝗱):
✅ Official practice exams
✅ Hands-on labs (SimuLearn)
✅ AWS Escape Room (learning by playing)
✅ Flashcards & learning plans

Plus, there are always-free resources:
• Official practice questions
• Free AWS training events
• AWS Educate (labs + potential free exam vouchers)

#AWS_certification

🔥3

1.5K viewstutunak, 14:50

DevOps & SRE notes

This post compares Amazon EKS Auto Mode and Azure AKS Automatic, evaluating which platform offers a superior managed Kubernetes solution. While acknowledging AWS's progress, the author ultimately argues that AKS Automatic's more comprehensive, end-to-end automation makes it the clear winner for a truly hands-off experience.
https://pixelrobots.co.uk/2024/12/amazon-eks-auto-mode-vs-azure-aks-automatic-the-better-managed-kubernetes-solution/

1.49K viewstutunak, 16:05

DevOps & SRE notes

This paper delves into disaster recovery architectures that go beyond simple high availability to ensure systems remain operational even when HA fails. Yakaiah Bommishetti outlines various DR strategies, from cold backups to active-active multi-site setups, emphasizing the critical difference between preventing failures and restoring services after a catastrophe.
https://hackernoon.com/beyond-high-availability-disaster-recovery-architectures-that-keep-running-when-ha-fails

Beyond High Availability: Disaster Recovery Architectures That Keep Running When HA Fails

High Availability is not Disaster Recovery. This in-depth guide explores real-world Disaster Recovery architectures.

❤‍🔥3❤2

1.83K viewstutunak, 09:01

DevOps & SRE notes

Cloudflare, again

🤣5🔥4👏3

1.9K viewstutunak, 11:46

DevOps & SRE notes

Open Source Marketplace For Developer Tools

https://github.com/alexellis/arkade

GitHub - alexellis/arkade: Open Source Marketplace For Developer Tools

Open Source Marketplace For Developer Tools. Contribute to alexellis/arkade development by creating an account on GitHub.

👍3

1.72K viewstutunak, 16:04

DevOps & SRE notes

DevOps & SRE notes

Cloudflare, again

Will the "Code Orange" help Cloudflare?
https://blog.cloudflare.com/fail-small-resilience-plan/

The Cloudflare Blog

Code Orange: Fail Small — our resilience plan following recent incidents

We have declared “Code Orange: Fail Small” to focus everyone at Cloudflare on a set of high-priority workstreams with one simple goal: ensure that the cause of our last two global outages never happens again.

🤣4👍2🔥1

1.74K viewstutunak, 10:13

DevOps & SRE notes

A set of modern Grafana dashboards for Kubernetes.

https://github.com/dotdc/grafana-dashboards-kubernetes

GitHub - dotdc/grafana-dashboards-kubernetes: A set of modern Grafana dashboards for Kubernetes.

A set of modern Grafana dashboards for Kubernetes. - dotdc/grafana-dashboards-kubernetes

👍7💩1

1.49K viewstutunak, 09:00

DevOps & SRE notes

This case study examines the build-versus-buy decision for Terraform CI/CD orchestration by analyzing a custom-built tool called Terraflow. The author reflects on the trade-offs between creating a bespoke solution that perfectly fits a specific workflow and the opportunity cost of diverting engineering resources from core business features.
https://terrateam.io/blog/build-vs-buy-terraflow-case-study

function title(pageContext) {
const { post } = pageContext.data;
return (post == null ? void 0 : post.seoTitle) || (post ==…

function description(pageContext) {
const { post } = pageContext.data;
return (post == null ? void 0 : post.description) || "Blog post from Terrateam";
}

👍4❤2

1.5K viewstutunak, 16:00

DevOps & SRE notes

This tutorial guides readers through building a unified OpenTelemetry pipeline in Kubernetes to correlate metrics, logs, and traces. Fatih Koç explains how to deploy the OTel Collector as both a DaemonSet and a gateway to centralize enrichment and sampling, ultimately reducing incident resolution time.
https://fatihkoc.net/posts/opentelemetry-kubernetes-pipeline/

Building a Unified OpenTelemetry Pipeline in Kubernetes

Deploy OpenTelemetry Collector in Kubernetes to unify metrics, logs, and traces with correlation, smart sampling, and insights for faster incident resolution.

👍5

1.44K viewstutunak, 09:04

DevOps & SRE notes

Kamaji is the Hosted Control Plane Manager for Kubernetes.

https://github.com/clastix/kamaji

GitHub - clastix/kamaji: Kamaji is the Hosted Control Plane Manager for Kubernetes.

Kamaji is the Hosted Control Plane Manager for Kubernetes. - clastix/kamaji

🔥3

1.61K viewstutunak, 16:02

DevOps & SRE notes

Fast featureful friendly wifi terminal UI. 🛜✨

https://github.com/shazow/wifitui

GitHub - shazow/wifitui: Fast featureful friendly wifi terminal UI. 🛜✨

Fast featureful friendly wifi terminal UI. 🛜✨. Contribute to shazow/wifitui development by creating an account on GitHub.

👍3

1.54K viewstutunak, 11:53

DevOps & SRE notes

This documentation demystifies the structure of Kubernetes YAML files by breaking them down into their three core components: metadata, spec, and status. It explains how users define the desired state in the spec, while Kubernetes continuously works to align the actual status with that intent through its reconciliation loop.
https://medium.com/@thisara.weerakoon2001/demystifying-kubernetes-yaml-ef9e92acf3df

Demystifying Kubernetes YAML

In the world of Kubernetes, YAML files are the bread and butter. They are the declarative way you tell Kubernetes what you want your…

👍3

1.61K viewstutunak, 16:01

DevOps & SRE notes

This engineering publication from DoubleVerify presents a case study on synchronizing database schema updates across multiple projects and environments. The team developed a solution using a shared, standalone schema migrations repository and Kubernetes pre-install hooks to automate and coordinate the process.
https://medium.com/doubleverify-engineering/a-case-study-in-synchronizing-database-schema-updates-between-projects-and-environments-a69a3cc38985

A Case Study in Synchronizing Database Schema Updates between Projects and Environments

Written By: Chaim Leichman

👍3❤2

1.66K viewstutunak, 09:00

DevOps & SRE notes

eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS.

https://github.com/loxilb-io/loxilb

GitHub - loxilb-io/loxilb: eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS.

eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS. - loxilb-io/loxilb

👍2🔥1

1.65K viewstutunak, 16:03