DevOps&SRE Library
17.8K subscribers
461 photos
4 videos
2 files
4.76K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
kamal

From bare metal to cloud VMs, deploy web apps anywhere with zero downtime. Kamal has the dynamic reverse-proxy Traefik hold requests while a new app container is started and the old one is stopped. Works seamlessly across multiple hosts, using SSHKit to execute commands. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized with Docker.


https://github.com/basecamp/kamal
5 SRE Predictions For 2024

1️⃣ Tougher Job Market for SREs

With many companies looking to cut costs due to worsening economic conditions, dedicated SRE roles may be seen as expendable - so SRE headcount and budgets could be reduced. Many organizations transition to Amazon-like model, where SWEs would "do it all". Infrastructure management, operational hardening, incident tracking and being oncall are becoming a part of the job, so reliability engineers would be slowly pushed out or would have to transition into development. We can already see these trends among colleagues being laid off in 2023, including SRE-minded companies like Google.

This combination of factors means the SRE job market will likely tighten considerably in 2024. Openings will be harder to find and competition will be steeper. SREs will need to clearly demonstrate their value to stay relevant.

2️⃣ Rise of the Hybrid Cloud

The economic realities of running workloads on major public clouds like AWS, GCP and Azure will lead companies to look for alternatives. The costs of using public cloud infrastructure and services have been climbing, eating into budgets. As companies look to reduce spending, running applications on public clouds may no longer make economic sense. We'll see a migration back towards private data centers, colocation facilities, and on-prem infrastructure. SREs skilled in on-prem operations, bare metal provisioning, etc. will be in higher demand.

3️⃣ Kubernetes will continue its dominance.

While Kubernetes benefits and operational costs are questioned a lot recently, it has become the clear leader as the orchestration platform of choice for containerized workloads. Engineers and companies are heavily invested in Kubernetes workflows and tools, both in cloud and on-prem. As companies look to further invest in efficiency of infrastructure and application management, SREs will need strong Kubernetes expertise.

4️⃣ Increased major outages due to AI-written code
(and fewer SREs)

While the automated code generation promises improved developer productivity, it also poses new reliability challenges. As code generation by AI systems increases, companies may end up with insufficiently supervised software. With fewer SREs around to establish robust testing and deployment practices, outages caused by bugs in AI-generated code could become more frequent. Companies will be caught off guard by disruptions caused by their overreliance on AI. Quick mitigations for these outages would be problematic as well, as fundamentally it'd be harder to fix code issues in AI-written code.

5️⃣ Platform Engineering Matures

In 2024, unifying infrastructure, applications, data, and services under common APIs and self-service platforms will accelerate.

These platforms will provide standardized building blocks and streamlined workflows so engineering teams can quickly build, connect and deploy applications without wasting time in infrastructure complexities. Platforms will handle provisioning, networking, monitoring, access controls, and other operational aspects behind the scenes.

With job opportunities for traditional SRE roles declining, many SREs will look to transition into platform engineering positions. The broad technical skills required by platform roles align well with strengths many SREs already have. However, to successfully land a platform engineering role, you will need to skill up on software development as well. Programming and coding will become mandatory for those looking to get into platform engineering.


https://www.codereliant.io/5-sre-predictions-for-2024
Best practices to prevent alert fatigue

As your environment changes, new trends can quickly make your existing monitoring less accurate. At the same time, building alerts after every new incident can turn a straightforward strategy into a convoluted one. Treating monitoring as a one-time or reactive effort can both result in alert fatigue. Alert fatigue occurs when an excessive number of alerts are generated by monitoring systems or when alerts are irrelevant or unhelpful, leading to a diminished ability to see critical issues. Updating your alerts infrequently or too often can cause false positive alarms and redundant alerts that overwhelm your team. A desensitized team won’t be able to detect issues early and will lose trust in their monitoring systems, which can disrupt production and negatively impact your business.


https://www.datadoghq.com/blog/best-practices-to-prevent-alert-fatigue
10 Strategies to Build and Manage Scalable Infrastructure

https://spacelift.io/blog/scalable-infrastructure
pgxman

npm for PostgreSQL


https://pgxman.com
better-commits

A CLI for creating better commits following the conventional commits specification


https://github.com/Everduin94/better-commits
5 tips to efficiently manage AWS security groups using Terraform

Discover 5 proven strategies for scalable and stress-free security rule group management on AWS using Terraform.


https://blog.avangards.io/5-tips-to-efficiently-manage-aws-security-groups-using-terraform
aws2tf

aws2tf - automates the importing of existing AWS resources into Terraform and outputs the Terraform HCL code.


https://github.com/aws-samples/aws2tf
The Case for Kubernetes Resource Limits: Predictability vs. Efficiency

https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits
3 Common Mistakes with PromQL and Kubernetes Metrics

- Mistake #1: Duplicate Series
- Problem #2: Grouping/Sum Mistakes
- Problem #3: Unexpected Cardinality


https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics
Different Kinds of Managed Kubernetes

Explore a new World with Kubernetes


https://itnext.io/different-kinds-of-managed-kubernetes-c6c9c0ea1e06
Saving Millions of Dollars by Bin-Packing ClickHouse Pods in AWS EKS

https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money
The good, the bad and the ugly of templating YAML in Kubernetes

In this blog post, I’d like to argue that templated YAML has a bad reputation in the Kubernetes community for the wrong reasons, and that it actually is not as evil as one might believe, even with the bad experiences that all of us have likely made in the past.


https://levelup.gitconnected.com/the-good-the-bad-and-the-ugly-of-templating-yaml-in-kubernetes-82fc5ce43fec