SRE Engagement Models
- Consultinghttps://certomodo.substack.com/p/sre-engagement-models
- Embedded
- Infra Team
CloudFront and Terraform Essentials: How to Optimize Content Delivery
We are going to describe how CloudFront can be integrated with API Gateway to provide lower-latency. And we will go through the attributes of the CloudFront resources in Terraform, including the ones that we need to create the distribution and configure origins and behaviors.https://medium.com/@xpiotrkleban/cloudfront-and-terraform-essentials-how-to-optimize-content-delivery-27c84e8aef04
Best practices for monitoring static web applications
https://www.datadoghq.com/blog/static-web-application-monitoring-best-practices
https://www.datadoghq.com/blog/static-web-application-monitoring-best-practices
latency: a primer
hi! this article is aimed at folks who are interested in performance analysis or operations of software, and want to understand the impact on user experience. the examples will be centered around web applications and web services, but can be applied in other contexts as well.https://igor.io/latency
Principles of Reliable Software Design
Reliable software design is a discipline that involves a careful balance of numerous principles, each of which is intended to ensure the development of high-quality software that meets the needs of users and stakeholders.https://www.codereliant.io/principles-of-reliable-software-design-part-1
Failover
What is it? How does it work? When to use it and when not to use it?https://blog.alexewerlof.com/p/failover
Solving challenges caused by Out Of Memory (OOM) Killer in Linux
Learn how out of memory events created challenges for our team, and how we solved them.https://redpanda.com/blog/solve-out-of-memory-killer-events
acme-dns
A simplified DNS server with a RESTful HTTP API to provide a simple way to automate ACME DNS challenges.https://github.com/joohoi/acme-dns
Building and operating a pretty big storage system called S3
Today, I am publishing a guest post from Andy Warfield, VP and distinguished engineer over at S3. I asked him to write this based on the Keynote address he gave at USENIX FAST ‘23 that covers three distinct perspectives on scale that come along with building and operating a storage system the size of S3.https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Bridging the gap between IaC and Schema Management
When we started building Atlas a couple of years ago, we noticed that there was a substantial gap between what was then considered state-of-the-art in managing database schemas and the recent strides from Infrastructure-as-Code (IaC) to managing cloud infrastructure.https://atlasgo.io/blog/2023/07/19/bridging-the-gap-between-iac-and-schema-management
In this post, we review that gap and show how Atlas – along with its Terraform provider – can bridge the two domains.
A misadventure with Terraform Sets & PagerDuty Schedules
How Terraform's setunion() disregards ordering.https://tratnayake.dev/a-misadventure-with-terraform-sets-pagerduty-schedules
Stop using IAM User Credentials with Terraform Cloud
I recently started using Terraform Cloud but discovered that the getting started tutorial which describes how to integrate it with Amazon Web Services (AWS) suggested using IAM user credentials. This is not ideal as these credentials are long-lived and can lead to security issues.https://www.wolfe.id.au/2023/07/17/stop-using-iam-user-credentials-with-terraform-cloud
Secure Your AWS Environments with Terraform, Vault, and Veeam
https://julia.hashnode.dev/secure-your-aws-environments-with-terraform-vault-and-veeam
https://julia.hashnode.dev/secure-your-aws-environments-with-terraform-vault-and-veeam
Supporting Teams with Different Maturity Levels
https://medium.com/@hans.knechtions/supporting-teams-with-different-maturity-levels-c43f5b5080eb
https://medium.com/@hans.knechtions/supporting-teams-with-different-maturity-levels-c43f5b5080eb
sre-checklist
A checklist of anyone practicing Site Reliability Engineeringhttps://github.com/bregman-arie/sre-checklist
Why bother with SLI and SLO?
Is there really any value in setting service level indicators and objectives?https://blog.alexewerlof.com/p/why-bother-with-sli-and-slo
Traffic Jams in the Cloud: Are Overloads Sabotaging Your Application's Reliability?
https://blog.fluxninja.com/blog/traffic-jams-in-the-cloud-unveiling-the-true-enemy-of-reliability
https://blog.fluxninja.com/blog/traffic-jams-in-the-cloud-unveiling-the-true-enemy-of-reliability