CatOps
5.09K subscribers
94 photos
5 videos
19 files
2.57K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
​​For today's Donations Monday, I'd like to ask you to donate to the administrative needs of the "Come Back Alive" foundation.

It takes tremendous effort to run a foundation like this, and despite they can, they do not take money for the operational needs from regular donations. Thus, it's important to help them cover those needs as well!

https://savelife.in.ua/en/donate-en/#donate-fund-card-once

#donations #Ukraine
👍10
A postmortem from Cloudflare for yesterday’s outage is now available.

tl;dr:
>>>
The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.Instead, it was triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.
<<<

Another interesting thing:
>>>
Unrelated to this incident, we were and are currently migrating our customer traffic to a new version of our proxy service, internally known as FL2. Both versions were affected by the issue, although the impact observed was different.
Customers deployed on the new FL2 proxy engine, observed HTTP 5xx errors. Customers on our old proxy engine, known as FL, did not see errors, but bot scores were not generated correctly, resulting in all traffic receiving a bot score of zero. Customers that had rules deployed to block bots would have seen large numbers of false positives. Customers who were not using our bot score in their rules did not see any impact.
<<<

So, if you were not affected yesterday, you know why now.

#postmortem #cloudflare
🤔12👌1
Always Be Ready to Leave (Even If You Never Do) is not about keeping your CV up-to-date or socializing with recruiters, as it may seem from the title. It’s a short article on work habits that would keep you more efficient and, probably, happy at work; even if these habits would eventually make it easier for you to quit, if you choose to.

#culture
15🔥2
​​For today’s Donations Monday, I would like to remind you about the foundation that we’ve been partnering with for DevOps Days Ukraine for years now.

UA Responders. Their specialization is medical equipment and such.

#donations #Ukraine
4
Do you have the "What went well" section in your postmortems?

Here's an argument to have one with explanation of why this is important.

tl;dr: Because while each incident is different, there is a set of skills and behaviors that allow one to improvise under pressure to mitigate an incident. These skills and behaviors can be taught as well, and your "What went well" section is also for that.

#sre #incidents
🔥5👍2
​​For today’s Donations Monday, let’s help the foundations “Тихо” to raise money for FPV and Vampire drones.

https://send.monobank.ua/jar/WaFbzLzNK

This fundraiser was shared by a close friend of mine, so I trust it.

#donations #Ukraine
3
The bot I used for years to make posts into this channel has finally died. So, it seems like I won't be able to make neat buttons anymore :\

Yet, I have a couple of time-sensitive things for y'all:

- Cybersecurity books bundle by Packt
- Hacking book bundle by No Starch Press

Another time-sensitive topic: our friends at DOU are running their winter salary survey. More participants mean more accurate results, so jump in!

https://dou.ua/goto/rJks

#security #dou
3🎉2🤔1
​​Ok, the bot is online again!

Yesterday, I watched a video from KubeCon NA by Denys Vasyliev (in Ukrainian), and at some point they were discussing the dusk of open source, because the major players shifted their focus towards monetization and proprietary solutions.

And just today, I learned that Minio (S3-compatible storage) has been moved into the "maintenance" mode.

Here's a discussion on Reddit about the alternatives.

#open_source #minio
🤬31
​​I don't know, when is the point, where we can all collectively agree that front-end frameworks have gone too far in their complexity.

Yet, here you are with the Cloudflare preliminary postmortem:

>>>
A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.
<<<

https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

#cloudflare #postmortem
7🔥1
At least Cloudflare is fast in sharing their postmortems.

https://blog.cloudflare.com/5-december-2025-outage/

A curious thing is this:

>>>
Customers that have their web assets served by our older FL1 proxy AND had the Cloudflare Managed Ruleset deployed were impacted. All requests for websites in this state returned an HTTP 500 error, with the small exception of some test endpoints such as /cdn-cgi/trace.
<<<

IIRC, in the previous incident on Nov 18, only the customers on the newer proxy version were impacted. So, one could say that Cloudflare had a single time-distributed total outage.

Another important thing:

>>>
Before the end of next week we will publish a detailed breakdown of all the resiliency projects underway, including the ones listed above. While that work is underway, we are locking down all changes to our network in order to ensure we have better mitigation and rollback systems before we begin again.
<<<

Honestly, looking forward to seeing the write-up. I can only imagine how stressed their team is after taking down a big chunk of the Internet twice in less than 30 days.


#cloudflare #postmortem
👍5🔥2
This isn't a technical article, but still an important one, I would say. This one is about the importance of making your work visible.

Shadow work in engineering teams.

For better or worse, in many companies, promotion cycle is the popularity contest, therefore you need to act accordingly.

This article is aimed at the managers, but you may find it useful as an individual contributor as well.

#culture
13👍1
Here's an article on using DRY and KISS principles when working with Terraform. In my opinion, this is one of those articles that has a good idea behind it, but lacks a bit in delivery.

KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever.

The main take-way is, as usual: use your own judgment when creating abstractions for your infra code. This also applies to all your code.

I do generally agree on the tooling part. This is what Adam Jacobs called "A 200% knowledge problem": when adding an abstraction (a wrapper), you need to understand not only your code and the underlaying technologies, but also each layer of your abstractions. Thus, do not add wrappers unless you have to.

However, this article also touches an important point: you may feel like it's time to introduce an abstraction, but in reality, it's not.

#terraform #iac
👍14
​​For today’s Donations Monday, let’s help Serhii Sternenko with his initiatives:

- Rusoriz - a standing Monobank jar. The goal is to buy 300 FPV drones daily.
- Fundraiser for the interceptor drones

#donations #Monday
5👍1
Cloudflare shares how they use Terraform in production.

Their setup is quite standard: Terraform, Atlantis, Conftest (OPA). One interesting thing is that they use their in-house tool called tfstate-butler to work around the lack of encryption of the Terraform states. Although, they do not disclose the details of this tool.

Another catchy quote:

>>>
...we do this at a global scale — where a single misconfiguration can propagate across our edge in seconds and lead to unintended consequences.

Yeah... We know, Cloudflare, we know...

#terraform #iac
👍13😁1🤔1
GitHub Actions will charge $0.002 per minute for self-hosted runners starting from the 1st of March 2026.

Obviously, you would still pay whatever you pay for your self-hosted infrastructure itself.

GitHub Actions will remain free for public repositories. For now.

#cicd #gha #microsoft
😐23😁6😭5🤬2👍1
On the positive note: Docker opens access to their hardened images (DHI) to everyone, not just their enterprise customers.

DHI uses a distroless runtime and includes SBOM.

Here you can browse the whole catalog of DHI. Docker asked me to login, though, but I'm definitely not an enterprise customer :D

#docker #security
🔥8👍5