DevOps&SRE Library

Durable Workflows Beyond Vercel: Version-Safe Orchestration for Kubernetes

Workflow DevKit lets you write durable, long-running workflows directly in your Next.js and Node.js apps. You define steps with ’use step’, and the SDK handles persistence, retries, and replay automatically. Workflows survive server restarts, can sleep for days, and resume exactly where they left off.

On Vercel, all of this works out of the box — the platform handles deployment versioning and queue routing behind the scenes. But what happens when you deploy to your own Kubernetes cluster? Version mismatch. And it’s subtle enough to corrupt data before you notice.

We built Platformatic World to fix this. It’s a drop-in World implementation that brings the same deployment safety to any Kubernetes cluster. Every workflow run is pinned to the code version that created it. Queue messages are routed to the correct versioned pods. Old versions stay alive until all their in-flight runs are complete.

https://blog.platformatic.dev/durable-workflows-kubernetes-version-safe

3.25K views15:02

DevOps&SRE Library

Designing for Failure with CloudNativePG

This post focuses on three areas that separate a demo from production systems: backups, recovery and connection pooling.

https://dylanmarkdacosta.medium.com/designing-for-failure-with-cloudnativepg-2c3987605a39

3.2K views07:01

DevOps&SRE Library

Building a Production-Grade HA Kubernetes Cluster on a Homelab with $0 in Cloud Costs

How I turned four Proxmox nodes, some enterprise surplus drives, and an afternoon into a fully automated HA k3s cluster with Rancher, Traefik, and Ansible — all running on hardware that draws less power than a gaming PC.

https://thiago-marsal.medium.com/homelab-k3s-ha-cluster-a-complete-architecture-guide-6a60005b6e99

3.21K views15:02

DevOps&SRE Library

SlimFaas

SlimFaas is a lightweight, plug-and-play Function-as-a-Service (FaaS) platform for Kubernetes (and Docker-Compose / Podman-Compose).

https://github.com/SlimPlanet/SlimFaas

2.92K views07:00

DevOps&SRE Library

Что общего у SRE и рыбаков? «GitOps = реальность» — это миф? Не создаёт ли Chaos Engineering ещё больше хаоса?..

Звучит как те самые внезапные вопросы перед сном в будний день 👀

И, кстати, ответ на все три у нас имеется! Правда, не здесь, а в подкасте «В SREду на кухне» — его ведут опытные инженеры из Авито. Они обсуждают наболевшее, приглашают внешних гостей и коллег, а также делятся дополнительными инсайтами, статьями по теме и анонсами встреч в своём канале.

Советуем подписаться и сохранить на будущее пару выпусков 🧠

Please open Telegram to view this post

VIEW IN TELEGRAM

3.41K views13:24

DevOps&SRE Library

agentgram

A single front door for all your AI agents and MCPs

https://github.com/dfradehubs/agentgram

3.39K views15:02

DevOps&SRE Library

The Problem with AI-Generated Post-Incident Reviews

The real learning comes from analyzing the incident while writing the document, not reading it; the document at the end is the residue of the learning.

https://greatcircle.com/blog/2026/05/05/problem-with-ai-generated-post-incident-reviews

3.33K views07:02

DevOps&SRE Library

You Shipped It Fast. But Did You Ship It Right?

AI tools have genuinely changed how fast teams can produce code, but they haven't changed how fast a codebase can safely absorb that code.

https://stackoverflow.blog/2026/05/12/you-shipped-it-fast-but-did-you-ship-it-right

3.02K views15:04

DevOps&SRE Library

On benchmarking

Benchmarking is hard. There are many ways to do it wrong and few to do it right.

But zooming out from any single system or harness, there are broad principles that should be applied to all benchmarking. Using these correctly makes it difficult to produce biased results.

Am I the world's best benchmarker? Certainly not. I invented the language balls, after all. But correctness and precision are important parts of PlanetScale's culture. We've spent considerable time learning the art of benchmarking, and are here to share best-practices.

Here, we're focusing primarily on benchmarking databases, but these principles apply to many domains.

https://planetscale.com/blog/on-benchmarking

3.4K views07:01

DevOps&SRE Library

Humans aren't fast enough for 4 9's

When thinking about Service Level Objectives (SLOs) and contractual Service Level Agreements (SLAs) for availability, I always like to put the percentages into concrete numbers.

https://incident.io/blog/humans-arent-fast-enough-for-4-nines

3.37K views15:03

DevOps&SRE Library

Why reviewing AI-generated code is devilishly hard

When working on code with GenAI assistance you need a better understanding of the system than when working without.

https://www.spinellis.gr/blog/20260523

3.2K views07:03

DevOps&SRE Library

Why Teamwork Makes (Or Breaks) Your Incident Response

High-severity incidents expose how a team really works together, usually within the first ten minutes.

https://uptimelabs.io/articles/teamwork-incident-response

3.1K views15:02

DevOps&SRE Library

Say the Thing You Want

You’re in a 1:1 with your manager, and things are going just fine. You talk about the project and that other thing. Toward the end, she asks: “Anything else?”

And there is something else. You want to lead that new initiative. Or move to a different team. Or you’ve been thinking about what stands in the way of your promotion. The thought is right there, sitting in the back of your throat. You’re going to say it, and then… “Nope, all good.”

You get out of the call feeling a specific kind of regret. You rationalize it somehow and then tell yourself you’ll bring it up next time (you won’t).

https://terriblesoftware.org/2026/04/01/say-the-thing-you-want

3.28K views07:02

DevOps&SRE Library

mq is a command-line tool that processes Markdown using a syntax similar to jq.

It's written in Rust, allowing you to easily slice, filter, map, and transform structured data.

https://github.com/harehare/mq

3.22K views15:02

DevOps&SRE Library

“Good Taste” Is Just Experience

“In the age of AI, taste is the ultimate differentiator.”

https://terriblesoftware.org/2026/03/27/good-taste-is-just-experience

3.13K views07:01

DevOps&SRE Library

slumber

Slumber is a TUI (terminal user interface) HTTP client. Define, execute, and share configurable HTTP requests.

https://github.com/LucasPickering/slumber

3.09K views15:02

DevOps&SRE Library

markitdown

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.

https://github.com/microsoft/markitdown

3.05K views07:03

DevOps&SRE Library

cate

An infinite canvas for your code, terminals, browsers, docs, and AI agents.

https://github.com/0-AI-UG/cate

2.95K views15:04

DevOps&SRE Library

paneru

Paneru is a MacOS window manager that arranges windows on an infinite strip, extending to the right. A core principle is that opening a new window will never cause existing windows to resize, maintaining your layout stability.

https://github.com/karinushka/paneru

2.64K views07:02

DevOps&SRE Library

Как правильно работать с резервным копированием в облаке?

25 июня приглашаем на бесплатный вебинар от MWS Cloud Platform всех, кто работает с облаками.

⚫Развеем мифы, разберём лучшие современные подходы и инструменты.

⚫Обсудим интеграцию в процессы, консистентность, точечное восстановление и безопасность. Поговорим о плюсах нативных облачных инструментов.

⚫Проведём демо в MWS Cloud Platform и ответим на ваши вопросы.

Зарегистрируйтесь, чтобы не пропустить!

⏰ 25 июня в 14:00 (мск)

✅

Зарегистрироваться

Please open Telegram to view this post

VIEW IN TELEGRAM

3.05K viewsedited 09:01

DevOps&SRE Library

opensre

The open-source framework for AI SRE agents, and the training and evaluation environment they need to improve. Connect the 60+ tools you already run, define your own workflows, and investigate incidents on your own infrastructure.

https://github.com/Tracer-Cloud/opensre

3.04K views15:02

About

Blog

Apps

Platform