DevOps&SRE Library

How Kubernetes Runs Containers : A Practical Deep Dive

Taking a deep dive into how Kubernetes runs containers as Linux processes

https://blog.esc.sh/kubernetes-containers-linux-processes

3.49K views07:01

How Ahrefs Saved US$400M in 3 Years by NOT Going to the Cloud

https://tech.ahrefs.com/how-ahrefs-saved-us-400m-in-3-years-by-not-going-to-the-cloud-8939dd930af8

3.45K views15:04

DevOps&SRE Library

tigrisfs

We're proud to announce the immediate availability of tigrisfs, the native filesystem interface for Tigris. This lets you mount Tigris buckets to your laptops, desktops, and servers so you can use data in your buckets as if it was local. This bridges the gap between the cloud and your machine.

https://www.tigrisdata.com/blog/tigrisfs

3.22K views07:01

DevOps&SRE Library

octelium

Octelium is a free and open source, self-hosted, unified platform for zero trust resource access that is primarily meant to be a modern alternative to remote access VPNs and similar tools.

https://github.com/octelium/octelium

2.92K views15:03

DevOps&SRE Library

Breaking up a monolith: How we’re unwinding a shared database at scale

https://www.datadoghq.com/blog/engineering/unwinding-shared-database

2.98K views07:05

DevOps&SRE Library

Taming Complexity: HelloFresh’s Playbook for Managing Large-Scale Change

P1: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-programs-part-1-3-cdf06c5a6ed9

P2: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-change-part-2-3-516dc3961e26

P3: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-change-part-3-3-ec0fd8bc6cd9

3.27K views15:05

DevOps&SRE Library

Kubernetes List API performance and reliability

At my current employer, we use Kubernetes to run hundreds of thousands of bare metal servers, spread over hundreds of Kubernetes clusters. We use Kubernetes beyond officially supported/tested scale limits by running more than 5,000 nodes and over a hundred thousand of pods in a single cluster.1 In these large scale setups, expensive “list” calls on the Kubernetes API are the achilles heel of the control plane reliability and scalability. In this article, I’ll explain which list call patterns pose the most risk, and how recent and upcoming Kubernetes versions are improving the list API performance.

https://ahmet.im/blog/kubernetes-list-performance

3.7K views07:02

DevOps&SRE Library

opencode

AI coding agent, built for the terminal.

https://github.com/sst/opencode

3.64K views15:01

DevOps&SRE Library

ktea

ktea is a tool designed to simplify and accelerate interactions with Kafka clusters.

https://github.com/jonas-grgt/ktea

3.59K views07:02

DevOps&SRE Library

GitOps: View from a security perspective

https://medium.com/@TechInternals/gitops-view-from-a-security-perspective-a120795b2f17

3.37K views15:04

DevOps&SRE Library

"Best practices" aren't always best for you

https://thefridaydeploy.substack.com/p/best-practices-arent-always-best

3.11K views07:02

DevOps&SRE Library

SLA vs SLO

Demystifying the most common misconception in Service Level jargon

https://blog.alexewerlof.com/p/sla-vs-slo

2.89K views15:05

DevOps&SRE Library

tfautomv

Generate Terraform moved blocks automatically for painless refactoring

https://github.com/busser/tfautomv

3.21K views07:02

DevOps&SRE Library

When SIGTERM Does Nothing: A Postgres Mystery

The ClickPipes team had encountered a bug with logical replication slot creation on Postgres read replicas—specifically, an issue where a query that was already taking hours rather than the few seconds it usually took couldn’t be terminated by any of the usual methods in Postgres, causing customer frustration and risking the stability of production databases. In this blog post, I’ll walk through how I investigated the problem and ultimately discovered it was due to a Postgres bug. We’ll also share how we fixed it and our experience working with the Postgres community.

https://clickhouse.com/blog/sigterm-postgres-mystery

3.31K views15:05

DevOps&SRE Library

Mastering Postgres Replication Slots: Preventing WAL Bloat and Other Production Issues

https://www.morling.dev/blog/mastering-postgres-replication-slots

3K views07:02

DevOps&SRE Library

Life Altering Postgresql Patterns

There is a set of things that you can do when working with a Postgres database which I have found made my and my coworker's lives much more pleasant. Each one is by itself small, but in aggregate have a noticeable effect.

https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns

3.25K views15:02

DevOps&SRE Library

Don't Do This

A short list of common mistakes.

https://wiki.postgresql.org/wiki/Don%27t_Do_This

2.96K views07:04

DevOps&SRE Library

Fix a top cause of slow queries in PostgreSQL (no slow query log needed)

https://render.com/blog/postgresql-top-cause-slow-queries

3.2K views15:03

DevOps&SRE Library

Postgres query plan visualization tools

https://www.pgmustard.com/blog/postgres-query-plan-visualization-tools

2.9K views07:03

DevOps&SRE Library

OpenAI: Scaling PostgreSQL to the Next Level

At the PGConf.dev 2025 Global Developer Conference, Bohan Zhang from OpenAI shared OpenAI’s best practices with PostgreSQL, offering a glimpse into the database usage of one of the most prominent unicorn company.

https://www.pixelstech.net/article/1747708863-openai%3a-scaling-postgresql-to-the-next-level

2.77K views15:02

About

Blog

Apps

Platform