Adopt Open ID Connect (OIDC) in Terraform for secure multi-account CI/CD to AWS
https://hedrange.com/2023/10/07/adopt-open-id-connect-oidc-in-terraform-for-secure-multi-account-ci-cd-to-aws
https://hedrange.com/2023/10/07/adopt-open-id-connect-oidc-in-terraform-for-secure-multi-account-ci-cd-to-aws
Sofia’s Observability Odyssey: The Do’s and Don’ts for Effective Observability
https://medium.com/@letathenasleep/alerting-the-dos-and-don-ts-for-effective-observability-139db9fb49d1
https://medium.com/@letathenasleep/alerting-the-dos-and-don-ts-for-effective-observability-139db9fb49d1
System Design 101
Explain complex systems using visuals and simple terms.https://github.com/ByteByteGoHq/system-design-101
Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.
Vulnerability Management at Lyft: Enforcing the Cascade - Part 1
Over the past 2 years, we’ve built a comprehensive vulnerability management program at Lyft. This blog post will focus on the systems we’ve built to address OS and OS-package level vulnerabilities in a timely manner across hundreds of services run on Kubernetes.https://eng.lyft.com/vulnerability-management-at-lyft-enforcing-the-cascade-part-1-234d1561b994
krakend-ce
KrakenD Community Edition: High-performance, stateless, declarative, API Gateway written in Go.https://github.com/krakend/krakend-ce
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot.https://github.com/TabbyML/tabby
Setting up your first EKS cluster on AWS: some practical tips
https://medium.com/@benjamin.christmann_12432/setting-up-your-first-eks-cluster-on-aws-some-practical-tips-60400963c588
https://medium.com/@benjamin.christmann_12432/setting-up-your-first-eks-cluster-on-aws-some-practical-tips-60400963c588
A Guide to Kubernetes Application Resource Tuning
p1: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-1-bf0ba04db10
p2: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-2-1d287479b52b
p3: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-3-40f7f6510c93
p1: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-1-bf0ba04db10
p2: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-2-1d287479b52b
p3: https://medium.com/@vvsevel/a-guide-to-kubernetes-application-resource-tuning-part-3-40f7f6510c93
AKS Networking Deep Dive: Kubenet vs Azure-CNI vs Azure-CNI (overlay)
https://inder-devops.medium.com/aks-networking-deep-dive-kubenet-vs-azure-cni-vs-azure-cni-overlay-a51709171ce9
https://inder-devops.medium.com/aks-networking-deep-dive-kubenet-vs-azure-cni-vs-azure-cni-overlay-a51709171ce9
Kubernetes Services: ClusterIP, Nodeport and LoadBalancer
https://sysdig.com/blog/kubernetes-services-clusterip-nodeport-loadbalancer
https://sysdig.com/blog/kubernetes-services-clusterip-nodeport-loadbalancer
Lessons Learned from Twenty Years of Site Reliability Engineering
Or, Eleven things we have learned as Site Reliability Engineers at Googlehttps://sre.google/resources/practices-and-processes/twenty-years-of-sre-lessons-learned
1. The riskiness of a mitigation should scale with the severity of the outage
2. Recovery mechanisms should be fully tested before an emergency
3. Canary all changes
4. Have a "Big Red Button"
5. Unit tests alone are not enough - integration testing is also needed
6. COMMUNICATION CHANNELS! AND BACKUP CHANNELS!! AND BACKUPS FOR THOSE BACKUP CHANNELS!!!
7. Intentionally degrade performance modes
8. Test for Disaster resilience
9. Automate your mitigations
10. Reduce the time between rollouts, to decrease the likelihood of the rollout going wrong
11. A single global hardware version is a single point of failure
How DoorDash Migrated from StatsD to Prometheus
https://doordash.engineering/2023/08/01/how-doordash-migrated-from-statsd-to-prometheus
https://doordash.engineering/2023/08/01/how-doordash-migrated-from-statsd-to-prometheus
How to use Terraform test
The new Terraform version v1.6.0 introduce a test framework, named “Terraform test”. Here’s how to use it.https://blog.captaincy.io/how-to-use-terraform-test
Terraform project structure with reusable modules
https://erudinsky.com/2023/10/20/structuring-terraform-projects
https://erudinsky.com/2023/10/20/structuring-terraform-projects
cluster.dev
Cluster.dev is an open-source tool designed to manage cloud native infrastructures with simple declarative manifests - infrastructure templates. The infrastructure templates could be based on Terraform modules, Kubernetes manifests, Shell scripts, Helm charts, Kustomize and ArgoCD/Flux applications, OPA policies etc. Cluster.dev sticks those components together so that you could deploy, test and distribute a whole set of components with pinned versions.https://github.com/shalb/cluster.dev
Prometheus and its storage: Architecture, challenges, and solutions
This two-article series is about monitoring. Part One covers accumulating a multitude of different metrics in a single place, handling permissions for different aspects of those metrics, and storing large amounts of data. In Part Two, we then focus on choosing monitoring systems based on the brief example of a fictional company’s “journey” in struggling with continually expanding its monitoring system and growing its infrastructure.https://blog.palark.com/prometheus-architecture-tsdb
What is a Memory Leak?
Memory leaks are a common and frustrating problem in software development. These issues arise when a program fails to free up memory that is no longer being used, leading to a gradual loss of available memory over time.https://www.codereliant.io/what-is-a-memory-leak
Rescue Struggling Pods from Scratch
https://www.honeycomb.io/blog/rescue-struggling-pods-from-scratch
https://www.honeycomb.io/blog/rescue-struggling-pods-from-scratch