DevOps & SRE notes

Forwarded from Make. Build. Break. Reflect.

#aws

🤣9🔥5

1.46K viewstutunak, 16:57

DevOps & SRE notes

A tool for exploring each layer in a docker image

https://github.com/wagoodman/dive

GitHub

GitHub - wagoodman/dive: A tool for exploring each layer in a docker image

A tool for exploring each layer in a docker image. Contribute to wagoodman/dive development by creating an account on GitHub.

❤1

1.41K viewstutunak, 08:00

DevOps & SRE notes

Author dotdc presents Terraflow, a CI/CD orchestrator designed to scale Terraform operations effectively. This report outlines the creation of the tool and how it helps manage complex infrastructure deployments.
https://medium.com/@dotdc/creating-terraflow-a-ci-cd-orchestrator-to-scale-terraform-3965b3f8931f

Medium

Creating Terraflow, a CI/CD orchestrator to scale Terraform

Introduction

👍1

1.35K viewstutunak, 15:01

DevOps & SRE notes

This analysis provides a deep dive into writing policies for Kubernetes clusters using OPA Gatekeeper. The Permify Tech Blog explains how to enforce custom rules and maintain security and compliance in a cloud-native environment.
https://medium.com/permify-tech-blog/opa-gatekeeper-how-to-write-policies-for-kubernetes-clusters-bb660666eb19

Medium

Opa Gatekeeper: How To Write Policies For Kubernetes Clusters

Learn how to leverage OPA Gatekeeper to write and enforce policies in Kubernetes clusters, ensuring security and efficient resource…

❤1👍1

1.35K viewstutunak, 08:00

DevOps & SRE notes

🚀 10x easier, 🚀 140x lower storage cost, 🚀 high performance, 🚀 petabyte scale - Elasticsearch/Splunk/Datadog alternative for 🚀 (logs, metrics, traces, RUM, Error tracking, Session replay).

https://github.com/openobserve/openobserve

GitHub

GitHub - openobserve/openobserve: Modern observability platform: 10x easier, 140x lower storage cost, petabyte scale. Open-source…

Modern observability platform: 10x easier, 140x lower storage cost, petabyte scale. Open-source alternative to Elasticsearch/Splunk/Datadog for logs, metrics, traces, RUM, and more. - openobserve/o...

🤣3

1.38K viewstutunak, 15:01

DevOps & SRE notes

Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations

https://github.com/bluewave-labs/Checkmate

GitHub

GitHub - bluewave-labs/Checkmate: Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware,…

Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations. Don't be shy, ...

👍2

1.44K viewstutunak, 08:04

DevOps & SRE notes

AWS just released their postmortem (link in comment) for the October DynamoDB outage. It's thorough, technically detailed, and explains exactly what broke and how they'll "prevent" it from happening again. But this PR-approved, sanitized narrative tells us only what happened to the technology, nothing else.

https://aws.amazon.com/message/101925/

❤2👍2

1.33K viewstutunak, 09:57

DevOps & SRE notes

Marc Christian P. Gregorio offers a practical commentary on automating centralized NAT Gateways in AWS across multiple VPCs and regions using Terraform. The solution aims to optimize costs and simplify network management for large-scale deployments.
https://medium.com/@marcchristianp.gregorio/automating-centralized-nat-gateways-in-aws-vpcs-and-region-with-terraform-69a6f90d60da

Medium

Automating Centralized NAT Gateways in AWS VPCs and Region with Terraform

When managing a large-scale AWS environment with multiple accounts, deploying multiple NAT gateways across various VPCs can become very…

👍3❤1

1.36K viewstutunak, 15:01

DevOps & SRE notes

Elliot Graebert proposes an impact-based leveling system for engineering organizations as an alternative to traditional career ladders. This treatise discusses how focusing on impact can foster a more motivated and effective engineering culture.
https://medium.com/@elliotgraebert/an-impact-based-level-system-for-engineering-organizations-2e0f9bee20e6

Medium

An impact-based level system for engineering organizations

Defining L1-L6 for individual contributors and leads

👍2❤1

1.47K viewstutunak, 08:01

DevOps & SRE notes

https://github.com/cozystack/cozypkg

Cozy wrapper around Helm and Flux CD for local development

GitHub

GitHub - cozystack/cozypkg: Cozy wrapper around Helm and Flux CD for local development

Cozy wrapper around Helm and Flux CD for local development - cozystack/cozypkg

👍3

1.32K viewstutunak, 15:05

DevOps & SRE notes

A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS.

https://github.com/theopfr/somo

GitHub

GitHub - theopfr/somo: A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS.

A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS. - theopfr/somo

👍2

1.27K viewstutunak, 08:05

DevOps & SRE notes

This article from JP Gouin provides a deep dive into implementing GitOps at scale, with a specific focus on the cluster bootstrapping process. It covers the challenges and solutions for managing numerous Kubernetes clusters efficiently and declaratively.
https://medium.com/@jp-gouin/gitops-at-scale-clusters-bootstrapping-f36695d4340d

Medium

GitOps at scale — Clusters bootstrapping

Explore one approach to help infrastructure team managing their multiple environments, variants and all required applications

❤2

1.34K viewstutunak, 15:02

DevOps & SRE notes

This edition of the Scalable Thread newsletter breaks down effective strategies for handling sudden and unexpected bursts of traffic to your systems. It explores architectural patterns and techniques to ensure reliability and prevent service degradation during traffic spikes.
https://newsletter.scalablethread.com/p/how-to-handle-sudden-bursts-of-traffic

Scalablethread

How to Handle Sudden Bursts of Traffic or "Thundering Herd Problem"?

Techniques to Avoid Potential Failures Caused by Sudden Traffic Spikes

👍3

1.33K viewstutunak, 09:01

DevOps & SRE notes

Enable dynamic and seamless Kubernetes multi-cluster topologies

https://github.com/liqotech/liqo

GitHub

GitHub - liqotech/liqo: Enable dynamic and seamless Kubernetes multi-cluster topologies

Enable dynamic and seamless Kubernetes multi-cluster topologies - liqotech/liqo

👍1

1.33K viewstutunak, 16:03

DevOps & SRE notes

Terraform configuration for my entire Mikrotik-powered home network.

https://github.com/mirceanton/mikrotik-terraform/

GitHub

GitHub - mirceanton/mikrotik-terraform: Terraform configuration for my entire Mikrotik-powered home network.

Terraform configuration for my entire Mikrotik-powered home network. - mirceanton/mikrotik-terraform

❤1👍1

1.34K viewstutunak, 09:02

DevOps & SRE notes

The Grab Engineering team shares their experience in executing a seamless database migration with zero downtime. This blogpost details the meticulous planning, tooling, and validation steps required to achieve a successful migration for a critical, high-traffic service.
https://engineering.grab.com/seamless-migration

Grab Tech

How we seamlessly migrated high volume real-time streaming traffic from one service to another with zero data loss and duplication

In the world of high-volume data processing, migrating services without disruption is a formidable challenge. At Grab, we recently undertook this task by splitting one of our backend service's stream read and write functionalities into two separate services.…

❤2

1.31K viewstutunak, 16:01

DevOps & SRE notes

This write-up from Prezi Engineering explains how multi-AZ deployments can lead to surprisingly high data transfer costs. It documents their journey of migrating from a costly self-hosted Prometheus setup to a more efficient monitoring solution to save on their cloud budget.
https://engineering.prezi.com/how-using-availability-zones-can-eat-up-your-budget-our-journey-from-prometheus-to-be8a816f7efe

Medium

How using Availability Zones can eat up your budget — our journey from Prometheus to…

Intro

👍1

1.31K viewstutunak, 09:02

DevOps & SRE notes

kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit, allocatable,...)

https://github.com/davidB/kubectl-view-allocations

GitHub

GitHub - davidB/kubectl-view-allocations: kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit…

kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit, allocatable,...) - davidB/kubectl-view-allocations

💯1

1.33K viewstutunak, 16:02

DevOps & SRE notes

Identity-Aware Tunneled Reverse Proxy Server with Dashboard UI

https://github.com/fosrl/pangolin

GitHub

GitHub - fosrl/pangolin: Identity-Aware Tunneled Reverse Proxy Server with Dashboard UI

Identity-Aware Tunneled Reverse Proxy Server with Dashboard UI - fosrl/pangolin

👌2👍1

1.71K viewstutunak, 09:04

DevOps & SRE notes

Author Yasin Taha Erol provides a practical guide on migrating from Kubernetes' native Horizontal Pod Autoscaler (HPA) to KEDA. The text highlights the benefits of KEDA's event-driven scaling and walks through the steps for a smooth transition.
https://yasintahaerol.medium.com/migrating-hpa-to-keda-13e946ee29ee

Medium

Migrating Hpa To Keda

Story

👍2

1.32K viewstutunak, 16:01

DevOps & SRE notes

This tutorial offers an interesting approach to container image distribution by using S3 as a private container registry. The author demonstrates how to set up and use an S3 bucket for storing and pulling images, providing a simple alternative to dedicated registry services.
https://ochagavia.nl/blog/using-s3-as-a-container-registry/

Adolfo Ochagavía

Using S3 as a container registry

For the last four months I’ve been developing a custom container image builder, collaborating with Outerbounds1. The technical details of the builder itself might be the topic of a future article, but there’s something surprising I wanted to share already:…

👍1

1.34K viewstutunak, 09:00

About

Blog

Apps

Platform