DevOps&SRE Library
17.8K subscribers
458 photos
4 videos
2 files
4.75K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
How Kubernetes Runs Containers : A Practical Deep Dive

Taking a deep dive into how Kubernetes runs containers as Linux processes


https://blog.esc.sh/kubernetes-containers-linux-processes
tigrisfs

We're proud to announce the immediate availability of tigrisfs, the native filesystem interface for Tigris. This lets you mount Tigris buckets to your laptops, desktops, and servers so you can use data in your buckets as if it was local. This bridges the gap between the cloud and your machine.


https://www.tigrisdata.com/blog/tigrisfs
octelium

Octelium is a free and open source, self-hosted, unified platform for zero trust resource access that is primarily meant to be a modern alternative to remote access VPNs and similar tools.


https://github.com/octelium/octelium
Breaking up a monolith: How we’re unwinding a shared database at scale

https://www.datadoghq.com/blog/engineering/unwinding-shared-database
Kubernetes List API performance and reliability

At my current employer, we use Kubernetes to run hundreds of thousands of bare metal servers, spread over hundreds of Kubernetes clusters. We use Kubernetes beyond officially supported/tested scale limits by running more than 5,000 nodes and over a hundred thousand of pods in a single cluster.1 In these large scale setups, expensive “list” calls on the Kubernetes API are the achilles heel of the control plane reliability and scalability. In this article, I’ll explain which list call patterns pose the most risk, and how recent and upcoming Kubernetes versions are improving the list API performance.


https://ahmet.im/blog/kubernetes-list-performance
opencode

AI coding agent, built for the terminal.


https://github.com/sst/opencode
ktea

ktea is a tool designed to simplify and accelerate interactions with Kafka clusters.


https://github.com/jonas-grgt/ktea
SLA vs SLO

Demystifying the most common misconception in Service Level jargon


https://blog.alexewerlof.com/p/sla-vs-slo
tfautomv

Generate Terraform moved blocks automatically for painless refactoring


https://github.com/busser/tfautomv
When SIGTERM Does Nothing: A Postgres Mystery

The ClickPipes team had encountered a bug with logical replication slot creation on Postgres read replicas—specifically, an issue where a query that was already taking hours rather than the few seconds it usually took couldn’t be terminated by any of the usual methods in Postgres, causing customer frustration and risking the stability of production databases. In this blog post, I’ll walk through how I investigated the problem and ultimately discovered it was due to a Postgres bug. We’ll also share how we fixed it and our experience working with the Postgres community.


https://clickhouse.com/blog/sigterm-postgres-mystery
Mastering Postgres Replication Slots: Preventing WAL Bloat and Other Production Issues

https://www.morling.dev/blog/mastering-postgres-replication-slots
Life Altering Postgresql Patterns

There is a set of things that you can do when working with a Postgres database which I have found made my and my coworker's lives much more pleasant. Each one is by itself small, but in aggregate have a noticeable effect.


https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns
Don't Do This

A short list of common mistakes.


https://wiki.postgresql.org/wiki/Don%27t_Do_This
Fix a top cause of slow queries in PostgreSQL (no slow query log needed)

https://render.com/blog/postgresql-top-cause-slow-queries
OpenAI: Scaling PostgreSQL to the Next Level

At the PGConf.dev 2025 Global Developer Conference, Bohan Zhang from OpenAI shared OpenAI’s best practices with PostgreSQL, offering a glimpse into the database usage of one of the most prominent unicorn company.


https://www.pixelstech.net/article/1747708863-openai%3a-scaling-postgresql-to-the-next-level