CatOps – Telegram

CatOps

5.09K subscribers

94 photos

5 videos

19 files

2.57K links

DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!

Download Telegram

About

Blog

Apps

Platform

5.09K subscribers

At least Cloudflare is fast in sharing their postmortems.

https://blog.cloudflare.com/5-december-2025-outage/

A curious thing is this:

>>>
Customers that have their web assets served by our older FL1 proxy AND had the Cloudflare Managed Ruleset deployed were impacted. All requests for websites in this state returned an HTTP 500 error, with the small exception of some test endpoints such as /cdn-cgi/trace.
<<<

IIRC, in the previous incident on Nov 18, only the customers on the newer proxy version were impacted. So, one could say that Cloudflare had a single time-distributed total outage.

Another important thing:

>>>
Before the end of next week we will publish a detailed breakdown of all the resiliency projects underway, including the ones listed above. While that work is underway, we are locking down all changes to our network in order to ensure we have better mitigation and rollback systems before we begin again.
<<<

Honestly, looking forward to seeing the write-up. I can only imagine how stressed their team is after taking down a big chunk of the Internet twice in less than 30 days.

#cloudflare #postmortem

The Cloudflare Blog

Cloudflare outage on December 5, 2025

Cloudflare experienced a significant traffic outage on December 5, 2025, starting approximately at 8:47 UTC. The incident lasted approximately 25 minutes before resolution. We are sorry for the impact that it caused to our customers and the Internet. The…

👍5🔥2

1.72K views10:53

This isn't a technical article, but still an important one, I would say. This one is about the importance of making your work visible.

Shadow work in engineering teams.

For better or worse, in many companies, promotion cycle is the popularity contest, therefore you need to act accordingly.

This article is aimed at the managers, but you may find it useful as an individual contributor as well.

#culture

newsletter.manager.dev

Shadow work in engineering teams

And the price your team pays for it

❤13👍1

1.36K views10:53

Shadow work in engineering teams

Our current fundraiser

Support Ukraine 🇺🇦

Here's an article on using DRY and KISS principles when working with Terraform. In my opinion, this is one of those articles that has a good idea behind it, but lacks a bit in delivery.

KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever.

The main take-way is, as usual: use your own judgment when creating abstractions for your infra code. This also applies to all your code.

I do generally agree on the tooling part. This is what Adam Jacobs called "A 200% knowledge problem": when adding an abstraction (a wrapper), you need to understand not only your code and the underlaying technologies, but also each layer of your abstractions. Thus, do not add wrappers unless you have to.

However, this article also touches an important point: you may feel like it's time to introduce an abstraction, but in reality, it's not.

#terraform #iac

KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever

The Scale Gap Problem

👍14

1.51K views16:56

KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever

Our current fundraiser

Support Ukraine 🇺🇦

A new issue of the CatOps digest is here!

https://newsletter.catops.dev/p/catops-digest-2025-12-12

#digest #newsletter

newsletter.catops.dev

CatOps Digest 2025-12-12

What was on CatOps in the last couple of weeks...

❤4🔥2

1.52K views17:29

CatOps Digest 2025-12-12

Our current fundraiser

Support Ukraine 🇺🇦

For today’s Donations Monday, let’s help Serhii Sternenko with his initiatives:

- Rusoriz - a standing Monobank jar. The goal is to buy 300 FPV drones daily.
- Fundraiser for the interceptor drones

#donations #Monday

❤5👍1

1.27K views19:59

Cloudflare shares how they use Terraform in production.

Their setup is quite standard: Terraform, Atlantis, Conftest (OPA). One interesting thing is that they use their in-house tool called tfstate-butler to work around the lack of encryption of the Terraform states. Although, they do not disclose the details of this tool.

Another catchy quote:

>>>
...we do this at a global scale — where a single misconfiguration can propagate across our edge in seconds and lead to unintended consequences.

Yeah... We know, Cloudflare, we know...

#terraform #iac

The Cloudflare Blog

Shifting left at enterprise scale: how we manage Cloudflare with Infrastructure as Code

Cloudflare has shifted to Infrastructure as Code and policy enforcement to manage internal Cloudflare accounts. This new architecture uses Terraform, custom tooling, and Open Policy Agent to enforce security baselines and increase engineering velocity.

👍13😁1🤔1

1.36K views09:39

Shifting left at enterprise scale: how we manage Cloudflare with Infrastructure as Code

Our current fundraiser

Support Ukraine 🇺🇦

GitHub Actions will charge $0.002 per minute for self-hosted runners starting from the 1st of March 2026.

Obviously, you would still pay whatever you pay for your self-hosted infrastructure itself.

GitHub Actions will remain free for public repositories. For now.

#cicd #gha #microsoft

GitHub Resources

Pricing changes for GitHub Actions

GitHub Actions pricing update: Discover lower runner rates (up to 39% off) following a major re-architecture for faster, more reliable CI/CD.

😐23😁6😭5🤬2👍1

1.25K views08:30

Pricing Changes for GitHub Actions

Our current fundraiser

Support Ukraine 🇺🇦

On the positive note: Docker opens access to their hardened images (DHI) to everyone, not just their enterprise customers.

DHI uses a distroless runtime and includes SBOM.

Here you can browse the whole catalog of DHI. Docker asked me to login, though, but I'm definitely not an enterprise customer :D

#docker #security

Hardened Images for Everyone | Docker

Security for everyone. Docker Hardened Images are now free to use, share, and build on with no licensing surprises.

🔥8👍5

2.23K views16:09

A Safer Container Ecosystem with Docker: Free Docker Hardened Images

Our current fundraiser

Support Ukraine 🇺🇦

Forwarded from oleg_log (Oleg)

Good one. Have literally the same feedback. Cool tech but mostly useless.

https://johnjames.blog/posts/graphql-the-enterprise-honeymoon-is-over

GraphQL: the enterprise honeymoon is over

A production-tested take on GraphQL in enterprise systems, why the honeymoon phase fades, and when its complexity outweighs the benefits.

👍5👏2🤝1

1.34K views06:51

Cold-Restart Resilience is an article on what could go wrong, when a system recovers from a total outage. Cases, covered in this article, with some tips on how to solve those:

- Circular bootstrap dependencies
- Using in-memory storage as databases
- Failures when trying to create a quorum
- Failures to fetch a remote dynamic config
- Stale data in leaderless systems

It doesn't mention cascading errors, but those are kinda famous already.

#sre #reliability #systems

Cold-Restart Resilience

Because ‘It Starts’ Doesn’t Mean ‘It Works’

👍6

1.57K views13:47

Cold-Restart Resilience

Our current fundraiser

Support Ukraine 🇺🇦

For today’s Donations Monday, I would like to ask to help a friend of mine to get a car at the Zaporizhzhia front lines.

https://send.monobank.ua/jar/5mSFtTYUFt

This is a personal request, so you can be sure that this fundraiser is legit.

#donations #Ukraine

❤1

1.37K views13:57

Monzo - a British neobank - reveals their system that grants engineers temporary elevated access.

tl;dr: They are using AWS Nitro Enclaves for this.

During my time at N26, we also had a system that served the same purpose, albeit it was designed differently.

#security

Securing admin access to Monzo's platform

Monzo runs on a shared platform of infrastructure that hosts our microservices. In this post, we’ll discuss how we broker access to our infrastructure credentials with a system that is resistant to attacks even from the team that maintains it.

🔥4

1.74K views10:14

Securing admin access to Monzo’s platform

AWS Nitro Enclaves

Our current fundraiser

Support Ukraine 🇺🇦

The last digest of this year is here!

https://newsletter.catops.dev/p/catops-digest-2025-12-27

With this digest been out, I'm taking some holidays. So, there will be no new posts here until the end of the year (it's not like there were many posts in the last couple of days, lol).

Also, I would really appreciate it, if you could share your thoughts about the newsletter in general. Unlike for the Telegram channel, I cannot really find a good fit for it. You can share your thoughts in the comments on Substack, in our chat (in Ukrainian), or via info@catops.dev

🎄🎄🎄 Happy holidays! 🎄🎄🎄

newsletter.catops.dev

CatOps Digest 2025-12-27

The last digest of this year...

🔥3❤1🤔1

1.58K views17:11

CatOps Digest 2025-12-27

Our current fundraiser

Support Ukraine 🇺🇦

I'm back!

It always feels nice to start a new year from scratch. Unfortunately, it's often not the case, and we have to finish things that remained.

Today's fundraiser is one of those things: let's help a friend of mine to raise funds for a pickup truck for the Zaporizhzhia front lines:

https://send.monobank.ua/jar/5mSFtTYUFt

#donations #Ukraine

❤4

1.11K views10:43

Starting a new year with a postmortem, eh?

There was a prolonged incident with Kafka at Honeycomb last month. Here you can find a preliminary postmortem for this incident.

"Preliminary" means that there is no root cause analysis yet, but there's already the timeline and the remediation steps.

#postmortem

status.honeycomb.io

Querying and Ingest issues in EU

Honeycomb's Status Page - Querying and Ingest issues in EU.

👍2🔥1

971 views10:51

Querying and Ingest issues in EU

Our current fundraiser

Support Ukraine 🇺🇦

Our job is all about tradeoffs.

This article describes tradeoffs of database indices.

#databases

blog.algomaster.io

The Hidden Cost of Database Indexes

“Just add an index”. This is the most common advice when a query runs slow.

👀4❤1

884 views09:25

The Hidden Cost of Database Indexes

Our current fundraiser

Support Ukraine 🇺🇦

I think, this could be a good Friday read: "When Change Outruns Us" is a tale about sustained progress.

The main point of this article is that smart companies do not push for "constant change for the sake of change", but rather adopt a more cyclic pace, when the periods of extensive work are followed by more relaxed times.

This article is particularly interesting to me, because I've just finished listening to the "Slow Productivity" book by Cal Newport. One of the principles, outlined in that book, is that one should work in their natural pace. However, a constant run is no one's natural pace. Another observation in that book, is that starting from the second half of the XX century, managers started to approximate work by "business", i.e. if you look busy, you do some work, even if in the reality, there are zero outcomes.

Many tech companies like to claim that they are "outcomes-oriented" or "value impact", but in my experience, "business" is still the approximation for work. Especially, once a company growth beyond the size, when everyone naturally knows everyone, as well as what they are doing.

#culture #mgmt

When Change Outruns Us

Why growth depends on absorption and recovery

👍3❤1

666 views09:21

When Change Outruns Us

Our current fundraiser

Support Ukraine 🇺🇦