kamal
https://github.com/basecamp/kamal
From bare metal to cloud VMs, deploy web apps anywhere with zero downtime. Kamal has the dynamic reverse-proxy Traefik hold requests while a new app container is started and the old one is stopped. Works seamlessly across multiple hosts, using SSHKit to execute commands. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized with Docker.
https://github.com/basecamp/kamal
Exploring Open Source Alternatives to Terraform Enterprise / Cloud
https://medium.com/terrakube/exploring-open-source-alternatives-to-terraform-enterprise-cloud-73acf158a6e4
https://medium.com/terrakube/exploring-open-source-alternatives-to-terraform-enterprise-cloud-73acf158a6e4
Building ML Infrastructure with Terraform
https://medium.com/@alexgidiotis_96550/building-ml-infrastructure-with-terraform-520b80874e8b
https://medium.com/@alexgidiotis_96550/building-ml-infrastructure-with-terraform-520b80874e8b
5 SRE Predictions For 2024
https://www.codereliant.io/5-sre-predictions-for-2024
1️⃣ Tougher Job Market for SREs
With many companies looking to cut costs due to worsening economic conditions, dedicated SRE roles may be seen as expendable - so SRE headcount and budgets could be reduced. Many organizations transition to Amazon-like model, where SWEs would "do it all". Infrastructure management, operational hardening, incident tracking and being oncall are becoming a part of the job, so reliability engineers would be slowly pushed out or would have to transition into development. We can already see these trends among colleagues being laid off in 2023, including SRE-minded companies like Google.
This combination of factors means the SRE job market will likely tighten considerably in 2024. Openings will be harder to find and competition will be steeper. SREs will need to clearly demonstrate their value to stay relevant.
2️⃣ Rise of the Hybrid Cloud
The economic realities of running workloads on major public clouds like AWS, GCP and Azure will lead companies to look for alternatives. The costs of using public cloud infrastructure and services have been climbing, eating into budgets. As companies look to reduce spending, running applications on public clouds may no longer make economic sense. We'll see a migration back towards private data centers, colocation facilities, and on-prem infrastructure. SREs skilled in on-prem operations, bare metal provisioning, etc. will be in higher demand.
3️⃣ Kubernetes will continue its dominance.
While Kubernetes benefits and operational costs are questioned a lot recently, it has become the clear leader as the orchestration platform of choice for containerized workloads. Engineers and companies are heavily invested in Kubernetes workflows and tools, both in cloud and on-prem. As companies look to further invest in efficiency of infrastructure and application management, SREs will need strong Kubernetes expertise.
4️⃣ Increased major outages due to AI-written code
(and fewer SREs)
While the automated code generation promises improved developer productivity, it also poses new reliability challenges. As code generation by AI systems increases, companies may end up with insufficiently supervised software. With fewer SREs around to establish robust testing and deployment practices, outages caused by bugs in AI-generated code could become more frequent. Companies will be caught off guard by disruptions caused by their overreliance on AI. Quick mitigations for these outages would be problematic as well, as fundamentally it'd be harder to fix code issues in AI-written code.
5️⃣ Platform Engineering Matures
In 2024, unifying infrastructure, applications, data, and services under common APIs and self-service platforms will accelerate.
These platforms will provide standardized building blocks and streamlined workflows so engineering teams can quickly build, connect and deploy applications without wasting time in infrastructure complexities. Platforms will handle provisioning, networking, monitoring, access controls, and other operational aspects behind the scenes.
With job opportunities for traditional SRE roles declining, many SREs will look to transition into platform engineering positions. The broad technical skills required by platform roles align well with strengths many SREs already have. However, to successfully land a platform engineering role, you will need to skill up on software development as well. Programming and coding will become mandatory for those looking to get into platform engineering.
https://www.codereliant.io/5-sre-predictions-for-2024
Creating an EKS Cluster Using CDKTF
https://medium.com/@stevosjt88/creating-an-eks-cluster-using-cdktf-ed6cf28599c9
https://medium.com/@stevosjt88/creating-an-eks-cluster-using-cdktf-ed6cf28599c9
Best practices to prevent alert fatigue
https://www.datadoghq.com/blog/best-practices-to-prevent-alert-fatigue
As your environment changes, new trends can quickly make your existing monitoring less accurate. At the same time, building alerts after every new incident can turn a straightforward strategy into a convoluted one. Treating monitoring as a one-time or reactive effort can both result in alert fatigue. Alert fatigue occurs when an excessive number of alerts are generated by monitoring systems or when alerts are irrelevant or unhelpful, leading to a diminished ability to see critical issues. Updating your alerts infrequently or too often can cause false positive alarms and redundant alerts that overwhelm your team. A desensitized team won’t be able to detect issues early and will lose trust in their monitoring systems, which can disrupt production and negatively impact your business.
https://www.datadoghq.com/blog/best-practices-to-prevent-alert-fatigue
10 Strategies to Build and Manage Scalable Infrastructure
https://spacelift.io/blog/scalable-infrastructure
https://spacelift.io/blog/scalable-infrastructure
better-commits
https://github.com/Everduin94/better-commits
A CLI for creating better commits following the conventional commits specification
https://github.com/Everduin94/better-commits
Provision EKS Cluster with ArgoCD by Terraform
https://yukccy.medium.com/provision-eks-cluster-with-argocd-by-terraform-4ba07a891463
https://github.com/yukccy/terraform-argocd-on-eks
https://yukccy.medium.com/provision-eks-cluster-with-argocd-by-terraform-4ba07a891463
https://github.com/yukccy/terraform-argocd-on-eks
10 steps to building Terragrunt orchestrator
https://nordcloud.com/tech-community/10-steps-to-building-terragrunt-orchestrator
https://nordcloud.com/tech-community/10-steps-to-building-terragrunt-orchestrator
5 tips to efficiently manage AWS security groups using Terraform
https://blog.avangards.io/5-tips-to-efficiently-manage-aws-security-groups-using-terraform
Discover 5 proven strategies for scalable and stress-free security rule group management on AWS using Terraform.
https://blog.avangards.io/5-tips-to-efficiently-manage-aws-security-groups-using-terraform
aws2tf
https://github.com/aws-samples/aws2tf
aws2tf - automates the importing of existing AWS resources into Terraform and outputs the Terraform HCL code.
https://github.com/aws-samples/aws2tf
An overview of Cloudflare's logging pipeline
https://blog.cloudflare.com/an-overview-of-cloudflares-logging-pipeline
https://blog.cloudflare.com/an-overview-of-cloudflares-logging-pipeline
The Case for Kubernetes Resource Limits: Predictability vs. Efficiency
https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits
https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits
3 Common Mistakes with PromQL and Kubernetes Metrics
https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics
- Mistake #1: Duplicate Series
- Problem #2: Grouping/Sum Mistakes
- Problem #3: Unexpected Cardinality
https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics
Different Kinds of Managed Kubernetes
https://itnext.io/different-kinds-of-managed-kubernetes-c6c9c0ea1e06
Explore a new World with Kubernetes
https://itnext.io/different-kinds-of-managed-kubernetes-c6c9c0ea1e06
Helm’s — atomic Option for Rollback Leaves You in the Dark
https://medium.com/@akashjoffical08/helms-atomic-option-for-rollback-leaves-you-in-the-dark-73841d8a5842
https://medium.com/@akashjoffical08/helms-atomic-option-for-rollback-leaves-you-in-the-dark-73841d8a5842
Saving Millions of Dollars by Bin-Packing ClickHouse Pods in AWS EKS
https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money
https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money
The good, the bad and the ugly of templating YAML in Kubernetes
https://levelup.gitconnected.com/the-good-the-bad-and-the-ugly-of-templating-yaml-in-kubernetes-82fc5ce43fec
In this blog post, I’d like to argue that templated YAML has a bad reputation in the Kubernetes community for the wrong reasons, and that it actually is not as evil as one might believe, even with the bad experiences that all of us have likely made in the past.
https://levelup.gitconnected.com/the-good-the-bad-and-the-ugly-of-templating-yaml-in-kubernetes-82fc5ce43fec