Since AWS has an outage, some of you have unplanned time off anyway. So, it's a good time to make a donation to a noble cause!
A friend of mine has a supportive jar for INSCIENCE who partnered with the Come Back Alive foundation to raise money to combat enemy UAVs.
https://send.monobank.ua/jar/fKfgmjgw1
Her goal is 20k UAH, so we can easily achieve it!
P.S. Apparently, Monobank hosts in AWS, since their web paged did not renew once I sent the donation.
P.P.S. Despite N26 also being on AWS, they processed the transaction just fine.
#donations #Ukraine
A friend of mine has a supportive jar for INSCIENCE who partnered with the Come Back Alive foundation to raise money to combat enemy UAVs.
https://send.monobank.ua/jar/fKfgmjgw1
Her goal is 20k UAH, so we can easily achieve it!
P.S. Apparently, Monobank hosts in AWS, since their web paged did not renew once I sent the donation.
P.P.S. Despite N26 also being on AWS, they processed the transaction just fine.
#donations #Ukraine
👍9
Speaking of AWS:
Oct 20 2:01 AM PDT We have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery. This issue also affects other AWS Services in the US-EAST-1 Region. Global services or features that rely on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables may also be experiencing issues. During this time, customers may be unable to create or update Support Cases. We recommend customers continue to retry any failed requests. We will continue to provide updates as we have more information to share, or by 2:45 AM.
#aws #outage
Oct 20 2:01 AM PDT We have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery. This issue also affects other AWS Services in the US-EAST-1 Region. Global services or features that rely on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables may also be experiencing issues. During this time, customers may be unable to create or update Support Cases. We recommend customers continue to retry any failed requests. We will continue to provide updates as we have more information to share, or by 2:45 AM.
#aws #outage
😭20
Helm graduated into a "boring technology" a long time ago. So, the news about the upcoming
Anyway, here's a sneak peek of upcoming changes. They promise that:
>>>
*Bottom Line*: Helm v4.0 is a major architectural upgrade focused on better Kubernetes integration, enhanced plugin capabilities,
and improved developer experience while maintaining chart
compatibility.
So, if you only care about your charts, those should continue to work.
#helm #kubernetes
v4 major version are probably no so exciting, unless you are a Helm plugin developer.Anyway, here's a sneak peek of upcoming changes. They promise that:
>>>
*Bottom Line*: Helm v4.0 is a major architectural upgrade focused on better Kubernetes integration, enhanced plugin capabilities,
and improved developer experience while maintaining chart
compatibility.
So, if you only care about your charts, those should continue to work.
#helm #kubernetes
GitHub
helm/helm-v4-changelog-summary.md at v4-changelog · scottrigby/helm
The Kubernetes Package Manager. Contribute to scottrigby/helm development by creating an account on GitHub.
❤5👍5🤮1
I don't know if you are involved into the promotions cycles in your company or participate in hiring, but I still want to share with you this article - On Hiring: Promote Stars, Not Strangers.
It is targeted towards managers, but you can easily apply the core idea of this article to any role or position in a company.
#culture
It is targeted towards managers, but you can easily apply the core idea of this article to any role or position in a company.
#culture
Kellblog
On Hiring: Promote Stars, Not Strangers
“Well, he’s never been a sales development rep (SDR) manager before, but he has been an SDR for 3 years at another company. The chance to be a manager is why he’d come here.” — Famous Last Words I can’t … Continue reading →
👍2
A glimpse into the alerting infrastructure of Hugging Face - a repository for ML models and datasets.
This article has a bit of a "they made me write this for promotion" vibe, but it's still interesting to see what technologies other people use, even if they don't dig deep into any of them.
#observability
This article has a bit of a "they made me write this for promotion" vibe, but it's still interesting to see what technologies other people use, even if they don't dig deep into any of them.
#observability
huggingface.co
Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
❤3🤗1
Reddit engineers made a long post about their DevEx survey, which they shared... on Reddit.
This is a nice read, if you'd like to learn how different companies evaluate developer productivity and satisfaction.
A few interesting points there:
- They don't use any specialized SaaS tools for this survey - only Typeform and Looker.
- The survey is quite long, but the adoption seems to be good.
- They include various topics into the survey, not just purely DX metrics, they also changed the name of the survey to reflect that.
- Teams can use the results on their own to calibrate their decisions.
#dx #reddit
This is a nice read, if you'd like to learn how different companies evaluate developer productivity and satisfaction.
A few interesting points there:
- They don't use any specialized SaaS tools for this survey - only Typeform and Looker.
- The survey is quite long, but the adoption seems to be good.
- They include various topics into the survey, not just purely DX metrics, they also changed the name of the survey to reflect that.
- Teams can use the results on their own to calibrate their decisions.
#dx #reddit
Reddit
From the RedditEng community on Reddit
Explore this post and more from the RedditEng community
❤4
OneUptime has published their update after two years since moving from AWS.
It's an interesting read, and tl;dr is that they do not regret their decision at all. Although, they admit that they still use cloud for dev environments, cold storage, etc.
Here are a few points that I find interesting:
- Our workload is 24/7 steady / We still recommend staying put if your usage pattern is spiky or seasonal - right sizing is one of the major advantages of clouds that is often overlooked.
- We still recommend staying put if you lean heavily on managed services - this is another important point. Managed services add a lot of value to the clouds. It does seem a bit dumb to use AWS just like an expensive datacenter. On another hand, if you want to be able to do multi-cloud, hybrid-cloud, etc.; you have to make a deliberate decision to stay as much decoupled from cloud offerings as possible. It's a deliberate strategy that trades flexibility for immediate value.
- Ceph stack in production - I'm sure, Ceph evolved a lot through the years, but I still remember words of a colleague of mine from a long time ago: "We didn't lose the data, we just cannot retrieve it". Back then, we decided to keep on-premise installations with ephemeral disks and ship all the data that has to be preserved into AWS (it's not like there was a lot of data to preserve there, though).
- so we added Anycast ingress via BGP with our transit provider to cut traffic shifting to sub-minute and We PXE boot with Tinkerbell - ask on an interview about Anycast, and the same day you will get an angry post on Reddit about unreasonable questions and lowballing candidates, lol. Or maybe, it's just my pessimism speaking.
Anyway, use your best judgment before doing any rapid moves. BTW, this advice is generally applicable, and is not limited to the clouds vs dc discussions.
#aws #bare_metal
It's an interesting read, and tl;dr is that they do not regret their decision at all. Although, they admit that they still use cloud for dev environments, cold storage, etc.
Here are a few points that I find interesting:
- Our workload is 24/7 steady / We still recommend staying put if your usage pattern is spiky or seasonal - right sizing is one of the major advantages of clouds that is often overlooked.
- We still recommend staying put if you lean heavily on managed services - this is another important point. Managed services add a lot of value to the clouds. It does seem a bit dumb to use AWS just like an expensive datacenter. On another hand, if you want to be able to do multi-cloud, hybrid-cloud, etc.; you have to make a deliberate decision to stay as much decoupled from cloud offerings as possible. It's a deliberate strategy that trades flexibility for immediate value.
- Ceph stack in production - I'm sure, Ceph evolved a lot through the years, but I still remember words of a colleague of mine from a long time ago: "We didn't lose the data, we just cannot retrieve it". Back then, we decided to keep on-premise installations with ephemeral disks and ship all the data that has to be preserved into AWS (it's not like there was a lot of data to preserve there, though).
- so we added Anycast ingress via BGP with our transit provider to cut traffic shifting to sub-minute and We PXE boot with Tinkerbell - ask on an interview about Anycast, and the same day you will get an angry post on Reddit about unreasonable questions and lowballing candidates, lol. Or maybe, it's just my pessimism speaking.
Anyway, use your best judgment before doing any rapid moves. BTW, this advice is generally applicable, and is not limited to the clouds vs dc discussions.
#aws #bare_metal
OneUptime | One Complete Observability platform.
AWS to Bare Metal Two Years Later: Answering Your Toughest Questions About Leaving AWS
Two years after our AWS-to-bare-metal migration, we revisit the numbers, share what changed, and address the biggest questions from Hacker News and Reddit.
👍8❤3👎1
For today’s Donations Monday, I’d like to share with you a fundraiser for FPV drones from DeepState - a collective behind the close-to-real-time battlefield maps.
https://send.monobank.ua/jar/9AtiB8esqu
#donations #Ukraine
https://send.monobank.ua/jar/9AtiB8esqu
#donations #Ukraine
❤2
More follow-ups for the AWS outage (Azure outage didn't generate that much press).
Lorin Hochstein analyzes the postmortem from the complexity point of view and comes to quite interesting conclusions that you can absolutely apply to your incidents and postmortems as well.
tl;dr is that incidents (especially bigger ones) are often unique. So, when reasoning about the preventive measures, you need not only to prevent similar incidents, but also get prepared to handle incidents in general, because the next incident may be not the same as the present one.
#reliability #sre #aws
Lorin Hochstein analyzes the postmortem from the complexity point of view and comes to quite interesting conclusions that you can absolutely apply to your incidents and postmortems as well.
tl;dr is that incidents (especially bigger ones) are often unique. So, when reasoning about the preventive measures, you need not only to prevent similar incidents, but also get prepared to handle incidents in general, because the next incident may be not the same as the present one.
#reliability #sre #aws
Surfing Complexity
Quick thoughts on the recent AWS outage
AWS recently posted a public write-up of the us-east-1 incident that hit them this past Monday. Here are a couple of quick thoughts on it. Reliability → Automation → Complexity → New failure modes …
❤5👍1
A book bundle that I wanted to post a couple of days ago, but forgot. So, here it is:
Linux for Professionals by Apress.
#books #linux
Linux for Professionals by Apress.
#books #linux
Humble Bundle
Humble Tech Book Bundle: Linux for Professionals by Apress/Springer
Unlock essential resources for Linux—get a professional edge on the competition with a little help from the experts at Apress & Springer!
🤔4🔥3👍1
An article by Charity Majors on why thinking of Observability in pillars is limiting.
I recall a similar article from the past about how Facebook does their observability. It’s somewhere here on the channel.
The core idea is to treat all the signals as universal wide events that would allow one to preserve all the context and not hop between different tools.
#observability
I recall a similar article from the past about how Facebook does their observability. It’s somewhere here on the channel.
The core idea is to treat all the signals as universal wide events that would allow one to preserve all the context and not hop between different tools.
#observability
charity.wtf
How many pillars of observability can you fit on the head of a pin?
My day started off with an innocent question, from an innocent soul. “Hey Charity, is profiling a pillar?” I hadn’t even had my coffee yet. “Someone was just telling me that profiling is the fourth…
👍8🤯1
For today's Donations Monday, I'd like to share with you a fundraiser for the Optic Dragons unit - a specialized FPV drone assault unit of the 92nd Separate Assault Brigade.
They're raising funds for optical fiber drones, spare parts for converting drones to fiber optics, and supporting combat vehicles of pilot crews. The unit has been redeployed to the Pokrovsk direction where the situation is intense and they need more drone reels for optical drones.
Direct donation link:
https://send.monobank.ua/jar/7D7whfQHfF
Card number: 4441 1111 2291 2961
#donations #Ukraine
They're raising funds for optical fiber drones, spare parts for converting drones to fiber optics, and supporting combat vehicles of pilot crews. The unit has been redeployed to the Pokrovsk direction where the situation is intense and they need more drone reels for optical drones.
Direct donation link:
https://send.monobank.ua/jar/7D7whfQHfF
Card number: 4441 1111 2291 2961
#donations #Ukraine
❤4👍1
An interesting lab for an overengineered solution from AWS for Kubernetes workloads right sizing.
Should you implement it this way? I don't know. But maybe, you want to play with GitOps, AWS Bedrock and all that stuff.
Also, it's funny how they say in the beginning that having VPA and Goldilocks inside a cluster is an overhead and additional management burden and then propose to create a cluster in GHA runtime and use generative AI to address that.
#aws #kubernetes
Should you implement it this way? I don't know. But maybe, you want to play with GitOps, AWS Bedrock and all that stuff.
Also, it's funny how they say in the beginning that having VPA and Goldilocks inside a cluster is an overhead and additional management burden and then propose to create a cluster in GHA runtime and use generative AI to address that.
#aws #kubernetes
Amazon
Kubernetes right-sizing with metrics-driven GitOps automation | Amazon Web Services
In this post, we introduce an automated, GitOps-driven approach to resource optimization in Amazon EKS using AWS services such as Amazon Managed Service for Prometheus and Amazon Bedrock. The solution helps optimize Kubernetes resource allocation through…
❤2😁1🤔1
For people nostalgic for on-premise setups, Dropbox reviled their new generation hardware setup and the challenges they face storing exabytes of data.
#on_prem
#on_prem
dropbox.tech
Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet
This generation represents our most efficient, capable, and scalable architecture yet—and it’ll help us as we continue to build AI products like Dropbox Dash.
🤔5😁1
Press F to pay respects.
>>> Ingress NGINX Retirement: Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.
Announcement page.
#kubernetes #nginx
>>> Ingress NGINX Retirement: Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.
Announcement page.
#kubernetes #nginx
GitHub
GitHub - kubernetes/ingress-nginx: Ingress NGINX Controller for Kubernetes
Ingress NGINX Controller for Kubernetes. Contribute to kubernetes/ingress-nginx development by creating an account on GitHub.
🫡33😱6❤1
A new issue of the CatOps Digest is here:
https://newsletter.catops.dev/p/catops-digest-2025-11-14
#digest #newsletter
https://newsletter.catops.dev/p/catops-digest-2025-11-14
#digest #newsletter
newsletter.catops.dev
CatOps Digest 2025-11-14
What was on CatOps in the last couple of weeks...
🔥2🤔1🤨1
For today's Donations Monday, I'd like to ask you to donate to the administrative needs of the "Come Back Alive" foundation.
It takes tremendous effort to run a foundation like this, and despite they can, they do not take money for the operational needs from regular donations. Thus, it's important to help them cover those needs as well!
https://savelife.in.ua/en/donate-en/#donate-fund-card-once
#donations #Ukraine
It takes tremendous effort to run a foundation like this, and despite they can, they do not take money for the operational needs from regular donations. Thus, it's important to help them cover those needs as well!
https://savelife.in.ua/en/donate-en/#donate-fund-card-once
#donations #Ukraine
👍10
We don't know why Cloudflare is down - their status page is not so detailed as one of AWS.
However, you can still check out some books on Humble Bundle:
- Data engineering & data science by O'Reilly.
- Software architecture by Pearson.
#books #bundle
However, you can still check out some books on Humble Bundle:
- Data engineering & data science by O'Reilly.
- Software architecture by Pearson.
#books #bundle
Humble Bundle
Humble Tech Book Bundle: Data Engineering & Science by O'Reilly
Become an expert on data science and software engineering for this library of ebooks from O’Reilly! All purchases support Code for America!
❤3
A postmortem from Cloudflare for yesterday’s outage is now available.
tl;dr:
>>>
The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.Instead, it was triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.
<<<
Another interesting thing:
>>>
Unrelated to this incident, we were and are currently migrating our customer traffic to a new version of our proxy service, internally known as FL2. Both versions were affected by the issue, although the impact observed was different.
Customers deployed on the new FL2 proxy engine, observed HTTP 5xx errors. Customers on our old proxy engine, known as FL, did not see errors, but bot scores were not generated correctly, resulting in all traffic receiving a bot score of zero. Customers that had rules deployed to block bots would have seen large numbers of false positives. Customers who were not using our bot score in their rules did not see any impact.
<<<
So, if you were not affected yesterday, you know why now.
#postmortem #cloudflare
tl;dr:
>>>
The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.Instead, it was triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.
<<<
Another interesting thing:
>>>
Unrelated to this incident, we were and are currently migrating our customer traffic to a new version of our proxy service, internally known as FL2. Both versions were affected by the issue, although the impact observed was different.
Customers deployed on the new FL2 proxy engine, observed HTTP 5xx errors. Customers on our old proxy engine, known as FL, did not see errors, but bot scores were not generated correctly, resulting in all traffic receiving a bot score of zero. Customers that had rules deployed to block bots would have seen large numbers of false positives. Customers who were not using our bot score in their rules did not see any impact.
<<<
So, if you were not affected yesterday, you know why now.
#postmortem #cloudflare
The Cloudflare Blog
Cloudflare outage on November 18, 2025
Cloudflare suffered a service outage on November 18, 2025. The outage was triggered by a bug in generation logic for a Bot Management feature file causing many Cloudflare services to be affected.
🤔12👌1
It's been a while since we had simple how-to articles here. So, here you are:
How to enable the JMX port on Jenkins.
It's short and actionable, and you would be surprised to learn how many people use Jenkins till these days.
#ci #java #debug
How to enable the JMX port on Jenkins.
It's short and actionable, and you would be surprised to learn how many people use Jenkins till these days.
#ci #java #debug
Medium
Jenkins JVM monitoring with JMX remote
If you ever encountered Jenkins misbehaving, running out of memory or just curious too see whats happening inside, Enabling remote JMX…
❤6😁4