DevOps&SRE Library

kamal

From bare metal to cloud VMs, deploy web apps anywhere with zero downtime. Kamal has the dynamic reverse-proxy Traefik hold requests while a new app container is started and the old one is stopped. Works seamlessly across multiple hosts, using SSHKit to execute commands. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized with Docker.

https://github.com/basecamp/kamal

4.26K views07:00

DevOps&SRE Library

Exploring Open Source Alternatives to Terraform Enterprise / Cloud

https://medium.com/terrakube/exploring-open-source-alternatives-to-terraform-enterprise-cloud-73acf158a6e4

4.05K views15:01

DevOps&SRE Library

Building ML Infrastructure with Terraform

https://medium.com/@alexgidiotis_96550/building-ml-infrastructure-with-terraform-520b80874e8b

3.78K views07:00

DevOps&SRE Library

5 SRE Predictions For 2024

1️⃣ Tougher Job Market for SREs

With many companies looking to cut costs due to worsening economic conditions, dedicated SRE roles may be seen as expendable - so SRE headcount and budgets could be reduced. Many organizations transition to Amazon-like model, where SWEs would "do it all". Infrastructure management, operational hardening, incident tracking and being oncall are becoming a part of the job, so reliability engineers would be slowly pushed out or would have to transition into development. We can already see these trends among colleagues being laid off in 2023, including SRE-minded companies like Google.

This combination of factors means the SRE job market will likely tighten considerably in 2024. Openings will be harder to find and competition will be steeper. SREs will need to clearly demonstrate their value to stay relevant.

2️⃣ Rise of the Hybrid Cloud

The economic realities of running workloads on major public clouds like AWS, GCP and Azure will lead companies to look for alternatives. The costs of using public cloud infrastructure and services have been climbing, eating into budgets. As companies look to reduce spending, running applications on public clouds may no longer make economic sense. We'll see a migration back towards private data centers, colocation facilities, and on-prem infrastructure. SREs skilled in on-prem operations, bare metal provisioning, etc. will be in higher demand.

3️⃣ Kubernetes will continue its dominance.

While Kubernetes benefits and operational costs are questioned a lot recently, it has become the clear leader as the orchestration platform of choice for containerized workloads. Engineers and companies are heavily invested in Kubernetes workflows and tools, both in cloud and on-prem. As companies look to further invest in efficiency of infrastructure and application management, SREs will need strong Kubernetes expertise.

4️⃣ Increased major outages due to AI-written code
(and fewer SREs)

While the automated code generation promises improved developer productivity, it also poses new reliability challenges. As code generation by AI systems increases, companies may end up with insufficiently supervised software. With fewer SREs around to establish robust testing and deployment practices, outages caused by bugs in AI-generated code could become more frequent. Companies will be caught off guard by disruptions caused by their overreliance on AI. Quick mitigations for these outages would be problematic as well, as fundamentally it'd be harder to fix code issues in AI-written code.

5️⃣ Platform Engineering Matures

In 2024, unifying infrastructure, applications, data, and services under common APIs and self-service platforms will accelerate.

These platforms will provide standardized building blocks and streamlined workflows so engineering teams can quickly build, connect and deploy applications without wasting time in infrastructure complexities. Platforms will handle provisioning, networking, monitoring, access controls, and other operational aspects behind the scenes.

With job opportunities for traditional SRE roles declining, many SREs will look to transition into platform engineering positions. The broad technical skills required by platform roles align well with strengths many SREs already have. However, to successfully land a platform engineering role, you will need to skill up on software development as well. Programming and coding will become mandatory for those looking to get into platform engineering.

https://www.codereliant.io/5-sre-predictions-for-2024

4.19K views15:01

DevOps&SRE Library

Creating an EKS Cluster Using CDKTF

https://medium.com/@stevosjt88/creating-an-eks-cluster-using-cdktf-ed6cf28599c9

3.37K views07:01

DevOps&SRE Library

Best practices to prevent alert fatigue

As your environment changes, new trends can quickly make your existing monitoring less accurate. At the same time, building alerts after every new incident can turn a straightforward strategy into a convoluted one. Treating monitoring as a one-time or reactive effort can both result in alert fatigue. Alert fatigue occurs when an excessive number of alerts are generated by monitoring systems or when alerts are irrelevant or unhelpful, leading to a diminished ability to see critical issues. Updating your alerts infrequently or too often can cause false positive alarms and redundant alerts that overwhelm your team. A desensitized team won’t be able to detect issues early and will lose trust in their monitoring systems, which can disrupt production and negatively impact your business.

https://www.datadoghq.com/blog/best-practices-to-prevent-alert-fatigue

3.55K views15:01

DevOps&SRE Library

10 Strategies to Build and Manage Scalable Infrastructure

https://spacelift.io/blog/scalable-infrastructure

3.71K views07:01

DevOps&SRE Library

pgxman

npm for PostgreSQL

https://pgxman.com

3.97K views15:01

DevOps&SRE Library

better-commits

A CLI for creating better commits following the conventional commits specification

https://github.com/Everduin94/better-commits

3.75K views07:01

DevOps&SRE Library

Provision EKS Cluster with ArgoCD by Terraform

https://yukccy.medium.com/provision-eks-cluster-with-argocd-by-terraform-4ba07a891463

https://github.com/yukccy/terraform-argocd-on-eks

3.77K views15:01

DevOps&SRE Library

10 steps to building Terragrunt orchestrator

https://nordcloud.com/tech-community/10-steps-to-building-terragrunt-orchestrator

3.65K views07:00

DevOps&SRE Library

5 tips to efficiently manage AWS security groups using Terraform

Discover 5 proven strategies for scalable and stress-free security rule group management on AWS using Terraform.

https://blog.avangards.io/5-tips-to-efficiently-manage-aws-security-groups-using-terraform

3.74K views15:00

DevOps&SRE Library

aws2tf

aws2tf - automates the importing of existing AWS resources into Terraform and outputs the Terraform HCL code.

https://github.com/aws-samples/aws2tf

4.28K views07:00

DevOps&SRE Library

An overview of Cloudflare's logging pipeline

https://blog.cloudflare.com/an-overview-of-cloudflares-logging-pipeline

4.34K views15:01

DevOps&SRE Library

The Case for Kubernetes Resource Limits: Predictability vs. Efficiency

https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits

4.57K views07:01

DevOps&SRE Library

3 Common Mistakes with PromQL and Kubernetes Metrics

- Mistake #1: Duplicate Series
- Problem #2: Grouping/Sum Mistakes
- Problem #3: Unexpected Cardinality

https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics

4.47K views15:00

DevOps&SRE Library

Different Kinds of Managed Kubernetes

Explore a new World with Kubernetes

https://itnext.io/different-kinds-of-managed-kubernetes-c6c9c0ea1e06

4.18K views07:01

DevOps&SRE Library

Helm’s — atomic Option for Rollback Leaves You in the Dark

https://medium.com/@akashjoffical08/helms-atomic-option-for-rollback-leaves-you-in-the-dark-73841d8a5842

4.2K views15:00

DevOps&SRE Library

Saving Millions of Dollars by Bin-Packing ClickHouse Pods in AWS EKS

https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money

4K views07:01

DevOps&SRE Library

The good, the bad and the ugly of templating YAML in Kubernetes

In this blog post, I’d like to argue that templated YAML has a bad reputation in the Kubernetes community for the wrong reasons, and that it actually is not as evil as one might believe, even with the bad experiences that all of us have likely made in the past.

https://levelup.gitconnected.com/the-good-the-bad-and-the-ugly-of-templating-yaml-in-kubernetes-82fc5ce43fec

6.12K views15:02

About

Blog

Apps

Platform