PostgreSQL monitoring and backups (with UI and self hosted)
https://github.com/RostislavDugin/postgresus
https://github.com/RostislavDugin/postgresus
GitHub
GitHub - RostislavDugin/postgresus: PostgreSQL backup tool
PostgreSQL backup tool. Contribute to RostislavDugin/postgresus development by creating an account on GitHub.
π4β€1
This technical report from Datadog offers a deep dive into managing storage for etcd, the key-value store at the heart of Kubernetes. It explains the causes of database growth and provides strategies for monitoring, defragmenting, and purging old data to maintain a healthy cluster.
https://www.datadoghq.com/blog/managing-etcd-storage/
https://www.datadoghq.com/blog/managing-etcd-storage/
Datadog
How to support a growing Kubernetes cluster with a small etcd | Datadog
Discover essential strategies for efficiently managing etcd storage in your Kubernetes clusters.
π1
In this story from the Betterstack newsletter, learn how Dropbox managed to save millions of dollars by optimizing its object storage architecture. The piece delves into the technical decisions and engineering efforts behind their impressive cost-reduction initiative.
https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars
https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars
Betterstack
How Dropbox Saved Millions of Dollars by Building a Load Balancer
Dropbox saved resources by creating a superior version of a tool everyone uses
π₯1
This extensive handbook serves as a go-to resource for troubleshooting common and complex issues within Kubernetes. It's packed with practical advice, commands, and methodologies to help engineers diagnose and resolve problems in their clusters.
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff
https://itnext.io/the-kubernetes-troubleshooting-handbook-7596a1fdf2ff
Medium
The Kubernetes Troubleshooting Handbook
Debugging Tips, Tools, and Techniques
π2π₯1
This commentary by Techielass provides a step-by-step walkthrough of building a CI/CD pipeline for Terraform using GitHub Actions. It demonstrates how to automate infrastructure deployments safely and efficiently, incorporating best practices like planning and approval steps.
https://www.techielass.com/terraform-with-github-actions-ci-cd-pipeline/
https://www.techielass.com/terraform-with-github-actions-ci-cd-pipeline/
Techielass - A blog by Sarah Lean
Terraform with GitHub Actions CI/CD Pipeline
By using Terraform with GitHub Actions, IT professionals can automate and streamline the deployment of resources across Azure environments in a consistent and reliable way.
This guide will walk you through setting up Terraform in GitHub Actions, from configuringβ¦
This guide will walk you through setting up Terraform in GitHub Actions, from configuringβ¦
π3π1π€£1
Forwarded from DevOps & SRE notes (tutunak)
Looking for a hosting platform to practice with Linux, Kubernetes, etc.? Register using my referral link on DigitalOcean and get $200 in credit for 60 days. By registering through my referral link, you also support this Telegram channel.
π Register
π Register
β€3
This post from Chainguard Unchained introduces the concept of audited least privilege as a critical security measure for the software supply chain. It explains how this principle helps verify that components only have the permissions they strictly need to function.
https://www.chainguard.dev/unchained/audited-least-privilege
https://www.chainguard.dev/unchained/audited-least-privilege
www.chainguard.dev
Audited least privilege
Strengthen your software supply chain security with audited least privilege. Learn how Chainguard's approach minimizes risk and enhances trust.
π1π―1
In this unique piece, author explores the interesting and often overlooked capabilities of GitRepo volumes in Kubernetes. The content details some fun experiments and practical applications for dynamically providing content to pods directly from a Git repository.
https://raesene.github.io/blog/2024/07/10/Fun-With-GitRepo-Volumes/
https://raesene.github.io/blog/2024/07/10/Fun-With-GitRepo-Volumes/
raesene.github.io
Fun With GitRepo Volumes
π1
This opinionated report argues that Large Language Models (LLMs) are not the ultimate solution for complex socio-technical problems in the SRE and operations space. It cautions against over-reliance on AI, emphasizing the continued need for human expertise and critical thinking.
https://blog.relyabilit.ie/llms-wont-save-us/
https://blog.relyabilit.ie/llms-wont-save-us/
RelyAbility Blog
LLMs won't save us
The AI wave is passing over us: what of genuine value will be left behind? asks Niall Murphy
As a long-time observer of the SRE/DevOps tooling market, I look at the tsunami of AI-powered and LLM-enabled currently engulfing our industry like most great waveβ¦
As a long-time observer of the SRE/DevOps tooling market, I look at the tsunami of AI-powered and LLM-enabled currently engulfing our industry like most great waveβ¦
π1
Martin Atkins's latest study presents a clever technique for handling "ephemeral values" in Terraform, which are values needed during a plan but should not be stored in the state. The method helps manage dynamic or sensitive data that is only relevant for a single operation.
https://log.martinatkins.me/2024/05/22/terraform-ephemeral-values/
https://log.martinatkins.me/2024/05/22/terraform-ephemeral-values/
Development Log by Martin Atkins
Ephemeral Values in Terraform
A different approach to sensitive values in Terraform state.
π1
A terminal-based LDAP server explorer built with Go and BubbleTea, providing an interactive interface for browsing LDAP directory trees, viewing records, and executing custom queries.
https://github.com/ericschmar/moribito
https://github.com/ericschmar/moribito
GitHub
GitHub - ericschmar/moribito
Contribute to ericschmar/moribito development by creating an account on GitHub.
π2
This in-depth article by Henrik Gerdes benchmarks various container runtime interfaces (CRIs) for Kubernetes. It provides a detailed comparison of runc, crun, gvisor, and youki, focusing on performance and memory consumption.
https://henrikgerdes.me/blog/2024-07-kubernetes-cri-bench/
https://henrikgerdes.me/blog/2024-07-kubernetes-cri-bench/
henrikgerdes.me
Benchmarking what actually drive our containers
Kubernetes success and versatility often overshadows the lower-level details of what actually drives our containers. I took a deeper took on how the default conβ¦
π2
This blogpost explores the statistical complexities of using Mean Time to Resolve (MTTR) as a key incident metric. The author argues that due to the power-law distribution of incident durations, MTTR trends can be misleading.
https://surfingcomplexity.blog/2024/12/01/mttr-when-sample-means-and-power-laws-combine-trouble-follows/
https://surfingcomplexity.blog/2024/12/01/mttr-when-sample-means-and-power-laws-combine-trouble-follows/
Surfing Complexity
MTTR: When sample means and power laws combine, trouble follows
Think back on all of the availability-impacting incidents that have occurred in your organization over some decent-sized period, maybe a year or more. Is the majority of the overall availability imβ¦
π2
kubectl-validate is a SIG-CLI subproject to support the local validation of resources for native Kubernetes types and CRDs.
https://github.com/kubernetes-sigs/kubectl-validate
https://github.com/kubernetes-sigs/kubectl-validate
GitHub
GitHub - kubernetes-sigs/kubectl-validate
Contribute to kubernetes-sigs/kubectl-validate development by creating an account on GitHub.
β€βπ₯1
This write-up from incident.io introduces the "Incident Maturity Model," a framework for evaluating and improving an organization's incident management processes. The model outlines three stages: Centralized, Distributed, and Democratized, offering a roadmap for growth.
https://incident.io/blog/the-incident-maturity-model
https://incident.io/blog/the-incident-maturity-model
incident.io
The Incident Maturity Model | Blog
Incidents are inevitableβhow you handle them matters. The Incident Maturity Model shows how to level up from basic response to company-wide resilience, with actionable steps backed by real data. Where does your team stand?
π1
Lawrence Jones provides an analysis of the challenges and incentives surrounding company status pages. The text delves into why transparency can be difficult for businesses, especially when SLAs and financial penalties are involved.
https://blog.lawrencejones.dev/status-pages/
https://blog.lawrencejones.dev/status-pages/
blog.lawrencejones.dev
Uptime, status pages, and transparency calculus
From the evergreen AWS status page to hardcoded 100% uptime, no one fully trusts a status page anymore. But why is this? Companies often start with good intentions, aiming for full transparency. So why do so many change along the way: what pressures peopleβ¦
π₯1