DevOps&SRE Library

wal-listener

A service that helps implement the Event-Driven architecture.

To maintain the consistency of data in the system, we will use transactional messaging - publishing events in a single transaction with a domain model change.

The service allows you to subscribe to changes in the PostgreSQL database using its logical decoding capability and publish them to the NATS Streaming server.

https://github.com/ihippik/wal-listener

4.45K views07:00

DevOps&SRE Library

The state of Kubernetes jobs in 2023 Q4

Kubernetes Job market trends for Q4 2023

https://kube.careers/state-of-kubernetes-jobs-2023-q4

4.27K views09:01

DevOps&SRE Library

42 things I learned from building a production database

https://maheshba.bitbucket.io/blog/2021/10/19/42Things.html

4.4K views15:01

DevOps&SRE Library

12 Factor CLI Apps

At Heroku, we’ve come up with a methodology called the 12 factor app. It’s a set of principles designed to make great web applications that are easy to maintain. In that spirit, here are 12 CLI factors to keep in mind when building your next CLI application. Following these principles will offer CLI UX that users will love.

https://medium.com/@jdxcode/12-factor-cli-apps-dd3c227a0e46

4.35K views07:01

DevOps&SRE Library

Viacheslav Biriukov - SRE deep dive into Linux Page Cache

In this series of articles, I would like to talk about Linux Page Cache. I believe that the following knowledge of the theory and tools is essential and crucial for every SRE. This understanding can help both in usual and routine everyday DevOps-like tasks and in emergency debugging and firefighting.

https://biriukov.dev/docs/page-cache/0-linux-page-cache-for-sre

4.44K views15:01

DevOps&SRE Library

Loki's new TSDB Index

https://lokidex.com/posts/tsdb

3.8K views07:00

DevOps&SRE Library

uptrace

Open source APM: OpenTelemetry traces, metrics, and logs

https://github.com/uptrace/uptrace

4.05K views15:01

DevOps&SRE Library

kubernetes-image-puller

Kubernetes Image Puller is used for caching images on a cluster. It creates a DaemonSet downloading and running the relevant container images on each node.

https://github.com/che-incubator/kubernetes-image-puller

4.18K views06:59

DevOps&SRE Library

Why Distributed Systems Fail?

Distributed systems are tricky - it's easy to make wrong assumptions that lead to problems down the road. Back in the 90s, computer scientist L. Peter Deutsch identified several common misconceptions, or "fallacies," that trip up engineers working on distributed systems. Surprisingly these fallacies are still relevant today:

1. The Network is Reliable: It's risky to assume networks are 100% reliable. Networks can and do fail in various ways.
2. Latency is Zero: While we might wish our networks had no latency, that's simply not physically possible - even light takes time to travel distances. Ignoring the inevitable delay in data transmission can lead to unrealistic expectations of system performance.
3. Bandwidth is Infinite: This overlooks the physical and practical limitations on data transfer rates.
4. The Network is Secure: No wonder Security is a growing industry. Assuming inherent security can lead to vulnerabilities and oversight in protective measures.
5. Topology Doesn't Change: This neglects the dynamic nature of network configurations.
6. There is One Administrator: A simplification that fails to consider the complexity of managing distributed systems.
7. Transport Cost is Zero: Overlooking the resources required for data movement.
8. The Network is Homogeneous: Ignoring the diversity in network systems and standards.

These fallacies, if not recognized and addressed, can lead to design flaws, performance issues, and security vulnerabilities in distributed systems. In the following sections, we will break down each of these misconceptions, exploring their implications and how to mitigate the risks they pose in real-world applications.

P1: https://www.codereliant.io/why-distributed-systems-fail-1

P2: https://www.codereliant.io/why-distributed-systems-fail-2

5.24K views15:01

DevOps&SRE Library

Using LocalStack and GitHub Actions to Test Terraform AWS Deployments

https://medium.com/@robbiedouglas/using-localstack-and-github-actions-to-test-terraform-aws-deployments-0a119dcff7c2

3.6K views07:00

DevOps&SRE Library

Terragrunt root selector: automatically select the best root directory base on file changed

https://medium.com/@bill.nz/terragrunt-root-selector-automatically-select-the-best-root-directory-base-on-file-changed-8f0b4147a8a3

4.03K views15:01

DevOps&SRE Library

mcfly

McFly replaces your default ctrl-r shell history search with an intelligent search engine that takes into account your working directory and the context of recently executed commands. McFly's suggestions are prioritized in real time with a small neural network.

https://github.com/cantino/mcfly

4.19K views17:00

DevOps&SRE Library

Terraform As A Service: Google Infrastructure Manager

https://medium.com/google-cloud/terraform-as-a-service-google-infrastructure-manager-409c2c9e71d5

3.73K views07:00

DevOps&SRE Library

crd-to-sample-yaml

Generate a sample YAML file from a CRD definition.

https://github.com/Skarlso/crd-to-sample-yaml

4.28K views15:00

DevOps&SRE Library

LLM Inference Performance Engineering: Best Practices

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices

4.15K views07:01

DevOps&SRE Library

Switching Build Systems, Seamlessly

https://engineering.atspotify.com/2023/10/switching-build-systems-seamlessly

3.97K views15:00

DevOps&SRE Library

radius

Radius is a cloud-native application platform that enables developers and the platform engineers that support them to collaborate on delivering and managing cloud-native applications that follow organizational best practices for cost, operations and security, by default. Radius is an open-source project that supports deploying applications across private cloud, Microsoft Azure, and Amazon Web Services, with more cloud providers to come.

https://github.com/radius-project/radius

3.93K views07:00

DevOps&SRE Library

selectel-billing-exporter

Prometheus exporter для получения информации по биллингу аккаунта в хостинге Selectel

https://github.com/mxssl/selectel-billing-exporter

3.97K views09:00

DevOps&SRE Library

The Scary Thing About Automating Deploys

Most of Slack runs on a monolithic service simply called “The Webapp”. It’s big – hundreds of developers create hundreds of changes every week.

Deploying at this scale is a unique challenge. When people talk about continuous deployment, they’re often thinking about deploying to systems as soon as changes are ready. They talk about microservices and 2-pizza teams (~8 people). But what does continuous deployment mean when you’re looking at 150 changes on a normal day? That’s a lot of pizzas…

https://slack.engineering/the-scary-thing-about-automating-deploys

3.71K views15:01

DevOps&SRE Library

API load testing: A beginner's guide

An API load test generally starts with small loads on isolated components. As your testing matures, your strategy can expand to how to test the API more completely. You’ll test your API with more requests, longer durations, and on a wider test scope — from isolated components to complete end-to-end workflows.

https://grafana.com/blog/2024/01/30/api-load-testing

3.97K views17:00

About

Blog

Apps

Platform