octelium
https://github.com/octelium/octelium
Octelium is a free and open source, self-hosted, unified platform for zero trust resource access that is primarily meant to be a modern alternative to remote access VPNs and similar tools.
https://github.com/octelium/octelium
Breaking up a monolith: How we’re unwinding a shared database at scale
https://www.datadoghq.com/blog/engineering/unwinding-shared-database
https://www.datadoghq.com/blog/engineering/unwinding-shared-database
Taming Complexity: HelloFresh’s Playbook for Managing Large-Scale Change
P1: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-programs-part-1-3-cdf06c5a6ed9
P2: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-change-part-2-3-516dc3961e26
P3: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-change-part-3-3-ec0fd8bc6cd9
P1: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-programs-part-1-3-cdf06c5a6ed9
P2: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-change-part-2-3-516dc3961e26
P3: https://engineering.hellofresh.com/taming-complexity-hellofreshs-playbook-for-managing-large-scale-change-part-3-3-ec0fd8bc6cd9
Kubernetes List API performance and reliability
https://ahmet.im/blog/kubernetes-list-performance
At my current employer, we use Kubernetes to run hundreds of thousands of bare metal servers, spread over hundreds of Kubernetes clusters. We use Kubernetes beyond officially supported/tested scale limits by running more than 5,000 nodes and over a hundred thousand of pods in a single cluster.1 In these large scale setups, expensive “list” calls on the Kubernetes API are the achilles heel of the control plane reliability and scalability. In this article, I’ll explain which list call patterns pose the most risk, and how recent and upcoming Kubernetes versions are improving the list API performance.
https://ahmet.im/blog/kubernetes-list-performance
ktea
https://github.com/jonas-grgt/ktea
ktea is a tool designed to simplify and accelerate interactions with Kafka clusters.
https://github.com/jonas-grgt/ktea
GitOps: View from a security perspective
https://medium.com/@TechInternals/gitops-view-from-a-security-perspective-a120795b2f17
https://medium.com/@TechInternals/gitops-view-from-a-security-perspective-a120795b2f17
"Best practices" aren't always best for you
https://thefridaydeploy.substack.com/p/best-practices-arent-always-best
https://thefridaydeploy.substack.com/p/best-practices-arent-always-best
SLA vs SLO
https://blog.alexewerlof.com/p/sla-vs-slo
Demystifying the most common misconception in Service Level jargon
https://blog.alexewerlof.com/p/sla-vs-slo
tfautomv
https://github.com/busser/tfautomv
Generate Terraform moved blocks automatically for painless refactoring
https://github.com/busser/tfautomv
When SIGTERM Does Nothing: A Postgres Mystery
https://clickhouse.com/blog/sigterm-postgres-mystery
The ClickPipes team had encountered a bug with logical replication slot creation on Postgres read replicas—specifically, an issue where a query that was already taking hours rather than the few seconds it usually took couldn’t be terminated by any of the usual methods in Postgres, causing customer frustration and risking the stability of production databases. In this blog post, I’ll walk through how I investigated the problem and ultimately discovered it was due to a Postgres bug. We’ll also share how we fixed it and our experience working with the Postgres community.
https://clickhouse.com/blog/sigterm-postgres-mystery
Mastering Postgres Replication Slots: Preventing WAL Bloat and Other Production Issues
https://www.morling.dev/blog/mastering-postgres-replication-slots
https://www.morling.dev/blog/mastering-postgres-replication-slots
Life Altering Postgresql Patterns
https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns
There is a set of things that you can do when working with a Postgres database which I have found made my and my coworker's lives much more pleasant. Each one is by itself small, but in aggregate have a noticeable effect.
https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns
Fix a top cause of slow queries in PostgreSQL (no slow query log needed)
https://render.com/blog/postgresql-top-cause-slow-queries
https://render.com/blog/postgresql-top-cause-slow-queries
Postgres query plan visualization tools
https://www.pgmustard.com/blog/postgres-query-plan-visualization-tools
https://www.pgmustard.com/blog/postgres-query-plan-visualization-tools
OpenAI: Scaling PostgreSQL to the Next Level
https://www.pixelstech.net/article/1747708863-openai%3a-scaling-postgresql-to-the-next-level
At the PGConf.dev 2025 Global Developer Conference, Bohan Zhang from OpenAI shared OpenAI’s best practices with PostgreSQL, offering a glimpse into the database usage of one of the most prominent unicorn company.
https://www.pixelstech.net/article/1747708863-openai%3a-scaling-postgresql-to-the-next-level
Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet
https://dropbox.tech/infrastructure/seventh-generation-server-hardware
Fourteen years ago, Dropbox took its first steps toward building its own hardware infrastructure—and as our product and user base has grown, so has our infrastructure. What started with just a handful of servers has evolved into one of the largest custom-built storage systems in the world. We've scaled from a few dozen machines to tens of thousands of servers with millions of drives.
That evolution didn’t happen by accident. It took years of iteration, close collaboration with suppliers, and a product-first mindset that treated infrastructure as a strategic advantage. Now we’re excited to share what’s next: the launch of our seventh-generation hardware platform, now featuring Crush, Dexter, and Sonic for our traditional compute, database, and storage workloads, and our newest GPU tiers, Gumby and Godzilla. To make this leap possible, we dramatically increased storage bandwidth, effectively doubled our available rack power, and introduced a next-gen storage chassis designed to even further minimize vibration and heat.
This generation represents our most efficient, capable, and scalable architecture yet—and it’ll help us as we continue to build and scale helpful AI products like Dropbox Dash. Below, we’ll walk you through how we designed the latest version of our server hardware as well as key lessons we’ll carry into generations to come.
https://dropbox.tech/infrastructure/seventh-generation-server-hardware
Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure
https://huggingface.co/blog/infrastructure-alerting
The Infrastructure team at Hugging Face is excited to share a behind-the-scenes look at the inner workings of Hugging Face's production infrastructure, which we’ve had the privilege of helping to build and maintain. Our team's dedication to designing and implementing a robust monitoring and alerting system has been instrumental in ensuring the stability and scalability of our platforms. We’re constantly reminded of the impact that our alerts have on our ability to identify and respond to potential issues before they become major incidents.
In this blog post, we’ll dive into the details of three mighty alerts that play their unique role in supporting our production infrastructure, and explore how they've helped us maintain the high level of performance and uptime that our community relies on.
https://huggingface.co/blog/infrastructure-alerting
rustfs
https://github.com/rustfs/rustfs
RustFS is a high-performance distributed object storage software built using Rust, one of the most popular languages worldwide. Along with MinIO, it shares a range of advantages such as simplicity, S3 compatibility, open-source nature, support for data lakes, AI, and big data. Furthermore, it has a better and more user-friendly open-source license in comparison to other storage systems, being constructed under the Apache license. As Rust serves as its foundation, RustFS provides faster speed and safer distributed features for high-performance object storage.
https://github.com/rustfs/rustfs