Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations
https://github.com/bluewave-labs/Checkmate
https://github.com/bluewave-labs/Checkmate
GitHub
GitHub - bluewave-labs/Checkmate: Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware,โฆ
Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations. Don't be shy, ...
๐2
AWS just released their postmortem (link in comment) for the October DynamoDB outage. It's thorough, technically detailed, and explains exactly what broke and how they'll "prevent" it from happening again. But this PR-approved, sanitized narrative tells us only what happened to the technology, nothing else.
https://aws.amazon.com/message/101925/
https://aws.amazon.com/message/101925/
โค2๐2
Marc Christian P. Gregorio offers a practical commentary on automating centralized NAT Gateways in AWS across multiple VPCs and regions using Terraform. The solution aims to optimize costs and simplify network management for large-scale deployments.
https://medium.com/@marcchristianp.gregorio/automating-centralized-nat-gateways-in-aws-vpcs-and-region-with-terraform-69a6f90d60da
https://medium.com/@marcchristianp.gregorio/automating-centralized-nat-gateways-in-aws-vpcs-and-region-with-terraform-69a6f90d60da
Medium
Automating Centralized NAT Gateways in AWS VPCs and Region with Terraform
When managing a large-scale AWS environment with multiple accounts, deploying multiple NAT gateways across various VPCs can become veryโฆ
๐3โค1
Elliot Graebert proposes an impact-based leveling system for engineering organizations as an alternative to traditional career ladders. This treatise discusses how focusing on impact can foster a more motivated and effective engineering culture.
https://medium.com/@elliotgraebert/an-impact-based-level-system-for-engineering-organizations-2e0f9bee20e6
https://medium.com/@elliotgraebert/an-impact-based-level-system-for-engineering-organizations-2e0f9bee20e6
Medium
An impact-based level system for engineering organizations
Defining L1-L6 for individual contributors and leads
๐2โค1
A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS.
https://github.com/theopfr/somo
https://github.com/theopfr/somo
GitHub
GitHub - theopfr/somo: A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS.
A human-friendly alternative to netstat for socket and port monitoring on Linux and macOS. - theopfr/somo
๐2
This article from JP Gouin provides a deep dive into implementing GitOps at scale, with a specific focus on the cluster bootstrapping process. It covers the challenges and solutions for managing numerous Kubernetes clusters efficiently and declaratively.
https://medium.com/@jp-gouin/gitops-at-scale-clusters-bootstrapping-f36695d4340d
https://medium.com/@jp-gouin/gitops-at-scale-clusters-bootstrapping-f36695d4340d
Medium
GitOps at scale โ Clusters bootstrapping
Explore one approach to help infrastructure team managing their multiple environments, variants and all required applications
โค2
This edition of the Scalable Thread newsletter breaks down effective strategies for handling sudden and unexpected bursts of traffic to your systems. It explores architectural patterns and techniques to ensure reliability and prevent service degradation during traffic spikes.
https://newsletter.scalablethread.com/p/how-to-handle-sudden-bursts-of-traffic
https://newsletter.scalablethread.com/p/how-to-handle-sudden-bursts-of-traffic
Scalablethread
How to Handle Sudden Bursts of Traffic or "Thundering Herd Problem"?
Techniques to Avoid Potential Failures Caused by Sudden Traffic Spikes
๐3
Terraform configuration for my entire Mikrotik-powered home network.
https://github.com/mirceanton/mikrotik-terraform/
https://github.com/mirceanton/mikrotik-terraform/
GitHub
GitHub - mirceanton/mikrotik-terraform: Terraform configuration for my entire Mikrotik-powered home network.
Terraform configuration for my entire Mikrotik-powered home network. - mirceanton/mikrotik-terraform
โค1๐1
The Grab Engineering team shares their experience in executing a seamless database migration with zero downtime. This blogpost details the meticulous planning, tooling, and validation steps required to achieve a successful migration for a critical, high-traffic service.
https://engineering.grab.com/seamless-migration
https://engineering.grab.com/seamless-migration
Grab Tech
How we seamlessly migrated high volume real-time streaming traffic from one service to another with zero data loss and duplication
In the world of high-volume data processing, migrating services without disruption is a formidable challenge. At Grab, we recently undertook this task by splitting one of our backend service's stream read and write functionalities into two separate services.โฆ
โค2
This write-up from Prezi Engineering explains how multi-AZ deployments can lead to surprisingly high data transfer costs. It documents their journey of migrating from a costly self-hosted Prometheus setup to a more efficient monitoring solution to save on their cloud budget.
https://engineering.prezi.com/how-using-availability-zones-can-eat-up-your-budget-our-journey-from-prometheus-to-be8a816f7efe
https://engineering.prezi.com/how-using-availability-zones-can-eat-up-your-budget-our-journey-from-prometheus-to-be8a816f7efe
Medium
How using Availability Zones can eat up your budgetโโโour journey from Prometheus toโฆ
Intro
๐1
kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit, allocatable,...)
https://github.com/davidB/kubectl-view-allocations
https://github.com/davidB/kubectl-view-allocations
GitHub
GitHub - davidB/kubectl-view-allocations: kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limitโฆ
kubectl plugin to list allocations (cpu, memory, gpu,... X utilization, requested, limit, allocatable,...) - davidB/kubectl-view-allocations
๐ฏ1
Author Yasin Taha Erol provides a practical guide on migrating from Kubernetes' native Horizontal Pod Autoscaler (HPA) to KEDA. The text highlights the benefits of KEDA's event-driven scaling and walks through the steps for a smooth transition.
https://yasintahaerol.medium.com/migrating-hpa-to-keda-13e946ee29ee
https://yasintahaerol.medium.com/migrating-hpa-to-keda-13e946ee29ee
Medium
Migrating Hpa To Keda
Story
๐2
This tutorial offers an interesting approach to container image distribution by using S3 as a private container registry. The author demonstrates how to set up and use an S3 bucket for storing and pulling images, providing a simple alternative to dedicated registry services.
https://ochagavia.nl/blog/using-s3-as-a-container-registry/
https://ochagavia.nl/blog/using-s3-as-a-container-registry/
Adolfo Ochagavรญa
Using S3 as a container registry
For the last four months Iโve been developing a custom container image builder, collaborating with Outerbounds1. The technical details of the builder itself might be the topic of a future article, but thereโs something surprising I wanted to share already:โฆ
๐1
PostgreSQL monitoring and backups (with UI and self hosted)
https://github.com/RostislavDugin/postgresus
https://github.com/RostislavDugin/postgresus
GitHub
GitHub - RostislavDugin/postgresus: PostgreSQL backup tool
PostgreSQL backup tool. Contribute to RostislavDugin/postgresus development by creating an account on GitHub.
๐4โค1
This technical report from Datadog offers a deep dive into managing storage for etcd, the key-value store at the heart of Kubernetes. It explains the causes of database growth and provides strategies for monitoring, defragmenting, and purging old data to maintain a healthy cluster.
https://www.datadoghq.com/blog/managing-etcd-storage/
https://www.datadoghq.com/blog/managing-etcd-storage/
Datadog
How to support a growing Kubernetes cluster with a small etcd | Datadog
Discover essential strategies for efficiently managing etcd storage in your Kubernetes clusters.
๐1
In this story from the Betterstack newsletter, learn how Dropbox managed to save millions of dollars by optimizing its object storage architecture. The piece delves into the technical decisions and engineering efforts behind their impressive cost-reduction initiative.
https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars
https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars
Betterstack
How Dropbox Saved Millions of Dollars by Building a Load Balancer
Dropbox saved resources by creating a superior version of a tool everyone uses
๐ฅ1
This extensive handbook serves as a go-to resource for troubleshooting common and complex issues within Kubernetes. It's packed with practical advice, commands, and methodologies to help engineers diagnose and resolve problems in their clusters.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff
Medium
The Kubernetes Troubleshooting Handbook
Debugging Tips, Tools, and Techniques
๐3โค1๐ฅ1
This commentary by Techielass provides a step-by-step walkthrough of building a CI/CD pipeline for Terraform using GitHub Actions. It demonstrates how to automate infrastructure deployments safely and efficiently, incorporating best practices like planning and approval steps.
https://www.techielass.com/terraform-with-github-actions-ci-cd-pipeline/
https://www.techielass.com/terraform-with-github-actions-ci-cd-pipeline/
Techielass - A blog by Sarah Lean
Terraform with GitHub Actions CI/CD Pipeline
By using Terraform with GitHub Actions, IT professionals can automate and streamline the deployment of resources across Azure environments in a consistent and reliable way.
This guide will walk you through setting up Terraform in GitHub Actions, from configuringโฆ
This guide will walk you through setting up Terraform in GitHub Actions, from configuringโฆ
๐3๐1๐คฃ1