DevOps & SRE notes
11.8K subscribers
27 photos
19 files
2.41K links
Helpfull articles and tools for DevOps&SRE

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
Ably's post explains their "four pillars" engineering principle, which is designed to ensure their systems have no ceiling on scale. This philosophy guides their architecture to handle massive, unpredictable, and complex realtime workloads.
https://ably.com/blog/ablys-four-pillars-no-scale-ceiling
KubeBuddy - A PowerShell tool for monitoring and managing Kubernetes clusters. Perform health checks, resource usage insights, and configuration audits with ease. Supports AKS best practices, snapshot-based monitoring, and security checks tailored for Kubernetes environments. Available on the PowerShell Gallery.

https://github.com/KubeDeckio/KubeBuddy
πŸ‘3❀1
This piece from Airbnb Engineering details their journey of building a centralized user signals platform. It explores the motivations, challenges, and architectural decisions behind creating a system to capture user interactions at scale.
https://medium.com/airbnb-engineering/building-a-user-signals-platform-at-airbnb-b236078ec82b
πŸ‘1
In this insightful study, the Games24x7 Tech team shares their experience of migrating Node.js services to Kubernetes. They discuss the strategies and tools used to achieve a seamless and efficient transition with minimal downtime.
https://medium.com/@Games24x7Tech/how-we-seamlessly-transitioned-our-node-services-to-k8s-7e2e6067daa0
πŸ‘1
🀣9πŸ”₯5
Author dotdc presents Terraflow, a CI/CD orchestrator designed to scale Terraform operations effectively. This report outlines the creation of the tool and how it helps manage complex infrastructure deployments.
https://medium.com/@dotdc/creating-terraflow-a-ci-cd-orchestrator-to-scale-terraform-3965b3f8931f
πŸ‘1
This analysis provides a deep dive into writing policies for Kubernetes clusters using OPA Gatekeeper. The Permify Tech Blog explains how to enforce custom rules and maintain security and compliance in a cloud-native environment.
https://medium.com/permify-tech-blog/opa-gatekeeper-how-to-write-policies-for-kubernetes-clusters-bb660666eb19
❀1πŸ‘1
AWS just released their postmortem (link in comment) for the October DynamoDB outage. It's thorough, technically detailed, and explains exactly what broke and how they'll "prevent" it from happening again. But this PR-approved, sanitized narrative tells us only what happened to the technology, nothing else.

https://aws.amazon.com/message/101925/
❀2πŸ‘2
Marc Christian P. Gregorio offers a practical commentary on automating centralized NAT Gateways in AWS across multiple VPCs and regions using Terraform. The solution aims to optimize costs and simplify network management for large-scale deployments.
https://medium.com/@marcchristianp.gregorio/automating-centralized-nat-gateways-in-aws-vpcs-and-region-with-terraform-69a6f90d60da
πŸ‘3❀1
Elliot Graebert proposes an impact-based leveling system for engineering organizations as an alternative to traditional career ladders. This treatise discusses how focusing on impact can foster a more motivated and effective engineering culture.
https://medium.com/@elliotgraebert/an-impact-based-level-system-for-engineering-organizations-2e0f9bee20e6
πŸ‘2❀1
This article from JP Gouin provides a deep dive into implementing GitOps at scale, with a specific focus on the cluster bootstrapping process. It covers the challenges and solutions for managing numerous Kubernetes clusters efficiently and declaratively.
https://medium.com/@jp-gouin/gitops-at-scale-clusters-bootstrapping-f36695d4340d
❀2
This edition of the Scalable Thread newsletter breaks down effective strategies for handling sudden and unexpected bursts of traffic to your systems. It explores architectural patterns and techniques to ensure reliability and prevent service degradation during traffic spikes.
https://newsletter.scalablethread.com/p/how-to-handle-sudden-bursts-of-traffic
πŸ‘3