CatOps
5.1K subscribers
94 photos
5 videos
19 files
2.56K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
​​For today’s Donations Monday, let’s help the foundations β€œΠ’ΠΈΡ…ΠΎβ€ to raise money for FPV and Vampire drones.

https://send.monobank.ua/jar/WaFbzLzNK

This fundraiser was shared by a close friend of mine, so I trust it.

#donations #Ukraine
❀3
The bot I used for years to make posts into this channel has finally died. So, it seems like I won't be able to make neat buttons anymore :\

Yet, I have a couple of time-sensitive things for y'all:

- Cybersecurity books bundle by Packt
- Hacking book bundle by No Starch Press

Another time-sensitive topic: our friends at DOU are running their winter salary survey. More participants mean more accurate results, so jump in!

https://dou.ua/goto/rJks

#security #dou
❀3πŸŽ‰2πŸ€”1
​​Ok, the bot is online again!

Yesterday, I watched a video from KubeCon NA by Denys Vasyliev (in Ukrainian), and at some point they were discussing the dusk of open source, because the major players shifted their focus towards monetization and proprietary solutions.

And just today, I learned that Minio (S3-compatible storage) has been moved into the "maintenance" mode.

Here's a discussion on Reddit about the alternatives.

#open_source #minio
🀬3❀1
​​I don't know, when is the point, where we can all collectively agree that front-end frameworks have gone too far in their complexity.

Yet, here you are with the Cloudflare preliminary postmortem:

>>>
A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.
<<<

https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

#cloudflare #postmortem
❀7πŸ”₯1
At least Cloudflare is fast in sharing their postmortems.

https://blog.cloudflare.com/5-december-2025-outage/

A curious thing is this:

>>>
Customers that have their web assets served by our older FL1 proxy AND had the Cloudflare Managed Ruleset deployed were impacted. All requests for websites in this state returned an HTTP 500 error, with the small exception of some test endpoints such as /cdn-cgi/trace.
<<<

IIRC, in the previous incident on Nov 18, only the customers on the newer proxy version were impacted. So, one could say that Cloudflare had a single time-distributed total outage.

Another important thing:

>>>
Before the end of next week we will publish a detailed breakdown of all the resiliency projects underway, including the ones listed above. While that work is underway, we are locking down all changes to our network in order to ensure we have better mitigation and rollback systems before we begin again.
<<<

Honestly, looking forward to seeing the write-up. I can only imagine how stressed their team is after taking down a big chunk of the Internet twice in less than 30 days.


#cloudflare #postmortem
πŸ‘5πŸ”₯2
This isn't a technical article, but still an important one, I would say. This one is about the importance of making your work visible.

Shadow work in engineering teams.

For better or worse, in many companies, promotion cycle is the popularity contest, therefore you need to act accordingly.

This article is aimed at the managers, but you may find it useful as an individual contributor as well.

#culture
❀13πŸ‘1
Here's an article on using DRY and KISS principles when working with Terraform. In my opinion, this is one of those articles that has a good idea behind it, but lacks a bit in delivery.

KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever.

The main take-way is, as usual: use your own judgment when creating abstractions for your infra code. This also applies to all your code.

I do generally agree on the tooling part. This is what Adam Jacobs called "A 200% knowledge problem": when adding an abstraction (a wrapper), you need to understand not only your code and the underlaying technologies, but also each layer of your abstractions. Thus, do not add wrappers unless you have to.

However, this article also touches an important point: you may feel like it's time to introduce an abstraction, but in reality, it's not.

#terraform #iac
πŸ‘14
​​For today’s Donations Monday, let’s help Serhii Sternenko with his initiatives:

- Rusoriz - a standing Monobank jar. The goal is to buy 300 FPV drones daily.
- Fundraiser for the interceptor drones

#donations #Monday
❀5πŸ‘1
Cloudflare shares how they use Terraform in production.

Their setup is quite standard: Terraform, Atlantis, Conftest (OPA). One interesting thing is that they use their in-house tool called tfstate-butler to work around the lack of encryption of the Terraform states. Although, they do not disclose the details of this tool.

Another catchy quote:

>>>
...we do this at a global scale β€” where a single misconfiguration can propagate across our edge in seconds and lead to unintended consequences.

Yeah... We know, Cloudflare, we know...

#terraform #iac
πŸ‘13😁1πŸ€”1
GitHub Actions will charge $0.002 per minute for self-hosted runners starting from the 1st of March 2026.

Obviously, you would still pay whatever you pay for your self-hosted infrastructure itself.

GitHub Actions will remain free for public repositories. For now.

#cicd #gha #microsoft
😐23😁6😭5🀬2πŸ‘1
On the positive note: Docker opens access to their hardened images (DHI) to everyone, not just their enterprise customers.

DHI uses a distroless runtime and includes SBOM.

Here you can browse the whole catalog of DHI. Docker asked me to login, though, but I'm definitely not an enterprise customer :D

#docker #security
πŸ”₯8πŸ‘5
Cold-Restart Resilience is an article on what could go wrong, when a system recovers from a total outage. Cases, covered in this article, with some tips on how to solve those:

- Circular bootstrap dependencies
- Using in-memory storage as databases
- Failures when trying to create a quorum
- Failures to fetch a remote dynamic config
- Stale data in leaderless systems

It doesn't mention cascading errors, but those are kinda famous already.

#sre #reliability #systems
πŸ‘6
​​For today’s Donations Monday, I would like to ask to help a friend of mine to get a car at the Zaporizhzhia front lines.

https://send.monobank.ua/jar/5mSFtTYUFt

This is a personal request, so you can be sure that this fundraiser is legit.

#donations #Ukraine
❀1
The last digest of this year is here!

https://newsletter.catops.dev/p/catops-digest-2025-12-27

With this digest been out, I'm taking some holidays. So, there will be no new posts here until the end of the year (it's not like there were many posts in the last couple of days, lol).

Also, I would really appreciate it, if you could share your thoughts about the newsletter in general. Unlike for the Telegram channel, I cannot really find a good fit for it. You can share your thoughts in the comments on Substack, in our chat (in Ukrainian), or via info@catops.dev

πŸŽ„πŸŽ„πŸŽ„ Happy holidays! πŸŽ„πŸŽ„πŸŽ„
πŸ”₯3❀1πŸ€”1
​​I'm back!

It always feels nice to start a new year from scratch. Unfortunately, it's often not the case, and we have to finish things that remained.

Today's fundraiser is one of those things: let's help a friend of mine to raise funds for a pickup truck for the Zaporizhzhia front lines:

https://send.monobank.ua/jar/5mSFtTYUFt

#donations #Ukraine
❀4
Starting a new year with a postmortem, eh?

There was a prolonged incident with Kafka at Honeycomb last month. Here you can find a preliminary postmortem for this incident.

"Preliminary" means that there is no root cause analysis yet, but there's already the timeline and the remediation steps.

#postmortem
πŸ‘2πŸ”₯1