DevOps & SRE notes

Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations

https://github.com/bluewave-labs/Checkmate

GitHub

GitHub - bluewave-labs/Checkmate: Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware,…

Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations. Don't be shy, ...

👍2

1.48K viewstutunak, 08:04

DevOps & SRE notes

AWS just released their postmortem (link in comment) for the October DynamoDB outage. It's thorough, technically detailed, and explains exactly what broke and how they'll "prevent" it from happening again. But this PR-approved, sanitized narrative tells us only what happened to the technology, nothing else.

https://aws.amazon.com/message/101925/

❤2👍2

1.36K viewstutunak, 09:57

DevOps & SRE notes

Marc Christian P. Gregorio offers a practical commentary on automating centralized NAT Gateways in AWS across multiple VPCs and regions using Terraform. The solution aims to optimize costs and simplify network management for large-scale deployments.
https://medium.com/@marcchristianp.gregorio/automating-centralized-nat-gateways-in-aws-vpcs-and-region-with-terraform-69a6f90d60da

Medium

Automating Centralized NAT Gateways in AWS VPCs and Region with Terraform

When managing a large-scale AWS environment with multiple accounts, deploying multiple NAT gateways across various VPCs can become very…

👍3❤1

1.4K viewstutunak, 15:01

DevOps & SRE notes

Elliot Graebert proposes an impact-based leveling system for engineering organizations as an alternative to traditional career ladders. This treatise discusses how focusing on impact can foster a more motivated and effective engineering culture.
https://medium.com/@elliotgraebert/an-impact-based-level-system-for-engineering-organizations-2e0f9bee20e6

Medium

An impact-based level system for engineering organizations

Defining L1-L6 for individual contributors and leads

👍2❤1

1.51K viewstutunak, 08:01

DevOps & SRE notes

https://github.com/cozystack/cozypkg

Cozy wrapper around Helm and Flux CD for local development

GitHub

GitHub - cozystack/cozypkg: Cozy wrapper around Helm and Flux CD for local development

Cozy wrapper around Helm and Flux CD for local development - cozystack/cozypkg

👍3

1.36K viewstutunak, 15:05

DevOps & SRE notes

A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS.

https://github.com/theopfr/somo

GitHub

GitHub - theopfr/somo: A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS.

A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS. - theopfr/somo

👍2

1.33K viewstutunak, 08:05

DevOps & SRE notes

This article from JP Gouin provides a deep dive into implementing GitOps at scale, with a specific focus on the cluster bootstrapping process. It covers the challenges and solutions for managing numerous Kubernetes clusters efficiently and declaratively.
https://medium.com/@jp-gouin/gitops-at-scale-clusters-bootstrapping-f36695d4340d

Medium

GitOps at scale — Clusters bootstrapping

Explore one approach to help infrastructure team managing their multiple environments, variants and all required applications

❤2

1.4K viewstutunak, 15:02

DevOps & SRE notes

This edition of the Scalable Thread newsletter breaks down effective strategies for handling sudden and unexpected bursts of traffic to your systems. It explores architectural patterns and techniques to ensure reliability and prevent service degradation during traffic spikes.
https://newsletter.scalablethread.com/p/how-to-handle-sudden-bursts-of-traffic

Scalablethread

How to Handle Sudden Bursts of Traffic or "Thundering Herd Problem"?

Techniques to Avoid Potential Failures Caused by Sudden Traffic Spikes

👍3

1.37K viewstutunak, 09:01

DevOps & SRE notes

Enable dynamic and seamless Kubernetes multi-cluster topologies

https://github.com/liqotech/liqo

GitHub

GitHub - liqotech/liqo: Enable dynamic and seamless Kubernetes multi-cluster topologies

Enable dynamic and seamless Kubernetes multi-cluster topologies - liqotech/liqo

👍1

1.38K viewstutunak, 16:03

DevOps & SRE notes

Terraform configuration for my entire Mikrotik-powered home network.

https://github.com/mirceanton/mikrotik-terraform/

GitHub

GitHub - mirceanton/mikrotik-terraform: Terraform configuration for my entire Mikrotik-powered home network.

Terraform configuration for my entire Mikrotik-powered home network. - mirceanton/mikrotik-terraform

❤1👍1

1.39K viewstutunak, 09:02

DevOps & SRE notes

The Grab Engineering team shares their experience in executing a seamless database migration with zero downtime. This blogpost details the meticulous planning, tooling, and validation steps required to achieve a successful migration for a critical, high-traffic service.
https://engineering.grab.com/seamless-migration

Grab Tech

How we seamlessly migrated high volume real-time streaming traffic from one service to another with zero data loss and duplication

In the world of high-volume data processing, migrating services without disruption is a formidable challenge. At Grab, we recently undertook this task by splitting one of our backend service's stream read and write functionalities into two separate services.…

❤2

1.37K viewstutunak, 16:01

DevOps & SRE notes

This write-up from Prezi Engineering explains how multi-AZ deployments can lead to surprisingly high data transfer costs. It documents their journey of migrating from a costly self-hosted Prometheus setup to a more efficient monitoring solution to save on their cloud budget.
https://engineering.prezi.com/how-using-availability-zones-can-eat-up-your-budget-our-journey-from-prometheus-to-be8a816f7efe

Medium

How using Availability Zones can eat up your budget — our journey from Prometheus to…

Intro

👍1

1.43K viewstutunak, 09:02

DevOps & SRE notes

kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit, allocatable,...)

https://github.com/davidB/kubectl-view-allocations

GitHub

GitHub - davidB/kubectl-view-allocations: kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit…

kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit, allocatable,...) - davidB/kubectl-view-allocations

💯1

1.39K viewstutunak, 16:02

DevOps & SRE notes

Identity-Aware Tunneled Reverse Proxy Server with Dashboard UI

https://github.com/fosrl/pangolin

GitHub

GitHub - fosrl/pangolin: Identity-Aware Tunneled Reverse Proxy Server with Dashboard UI

Identity-Aware Tunneled Reverse Proxy Server with Dashboard UI - fosrl/pangolin

👌2👍1

1.78K viewstutunak, 09:04

DevOps & SRE notes

Author Yasin Taha Erol provides a practical guide on migrating from Kubernetes' native Horizontal Pod Autoscaler (HPA) to KEDA. The text highlights the benefits of KEDA's event-driven scaling and walks through the steps for a smooth transition.
https://yasintahaerol.medium.com/migrating-hpa-to-keda-13e946ee29ee

Medium

Migrating Hpa To Keda

Story

👍2

1.37K viewstutunak, 16:01

DevOps & SRE notes

This tutorial offers an interesting approach to container image distribution by using S3 as a private container registry. The author demonstrates how to set up and use an S3 bucket for storing and pulling images, providing a simple alternative to dedicated registry services.
https://ochagavia.nl/blog/using-s3-as-a-container-registry/

Adolfo Ochagavía

Using S3 as a container registry

For the last four months I’ve been developing a custom container image builder, collaborating with Outerbounds1. The technical details of the builder itself might be the topic of a future article, but there’s something surprising I wanted to share already:…

👍1

1.41K viewstutunak, 09:00

DevOps & SRE notes

PostgreSQL monitoring and backups (with UI and self hosted)

https://github.com/RostislavDugin/postgresus

GitHub

GitHub - RostislavDugin/postgresus: PostgreSQL backup tool

PostgreSQL backup tool. Contribute to RostislavDugin/postgresus development by creating an account on GitHub.

👍4❤1

2K viewstutunak, 16:00

DevOps & SRE notes

This technical report from Datadog offers a deep dive into managing storage for etcd, the key-value store at the heart of Kubernetes. It explains the causes of database growth and provides strategies for monitoring, defragmenting, and purging old data to maintain a healthy cluster.
https://www.datadoghq.com/blog/managing-etcd-storage/

Datadog

How to support a growing Kubernetes cluster with a small etcd | Datadog

Discover essential strategies for efficiently managing etcd storage in your Kubernetes clusters.

👍1

1.31K viewstutunak, 09:03

DevOps & SRE notes

In this story from the Betterstack newsletter, learn how Dropbox managed to save millions of dollars by optimizing its object storage architecture. The piece delves into the technical decisions and engineering efforts behind their impressive cost-reduction initiative.
https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars

Betterstack

How Dropbox Saved Millions of Dollars by Building a Load Balancer

Dropbox saved resources by creating a superior version of a tool everyone uses

🔥1

1.23K viewstutunak, 16:00

DevOps & SRE notes

This extensive handbook serves as a go-to resource for troubleshooting common and complex issues within Kubernetes. It's packed with practical advice, commands, and methodologies to help engineers diagnose and resolve problems in their clusters.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff

Medium

The Kubernetes Troubleshooting Handbook

Debugging Tips, Tools, and Techniques

👍3❤1🔥1

1.26K viewstutunak, 09:04

DevOps & SRE notes

This commentary by Techielass provides a step-by-step walkthrough of building a CI/CD pipeline for Terraform using GitHub Actions. It demonstrates how to automate infrastructure deployments safely and efficiently, incorporating best practices like planning and approval steps.
https://www.techielass.com/terraform-with-github-actions-ci-cd-pipeline/

Techielass - A blog by Sarah Lean

Terraform with GitHub Actions CI/CD Pipeline

By using Terraform with GitHub Actions, IT professionals can automate and streamline the deployment of resources across Azure environments in a consistent and reliable way.

This guide will walk you through setting up Terraform in GitHub Actions, from configuring…

👍3👎1🤣1

1.28K viewstutunak, 16:03

About

Blog

Apps

Platform