DevOps&SRE Library

Terragrunt root selector: automatically select the best root directory base on file changed

https://medium.com/@bill.nz/terragrunt-root-selector-automatically-select-the-best-root-directory-base-on-file-changed-8f0b4147a8a3

4.03K views15:01

DevOps&SRE Library

mcfly

McFly replaces your default ctrl-r shell history search with an intelligent search engine that takes into account your working directory and the context of recently executed commands. McFly's suggestions are prioritized in real time with a small neural network.

https://github.com/cantino/mcfly

4.19K views17:00

DevOps&SRE Library

Terraform As A Service: Google Infrastructure Manager

https://medium.com/google-cloud/terraform-as-a-service-google-infrastructure-manager-409c2c9e71d5

3.73K views07:00

DevOps&SRE Library

crd-to-sample-yaml

Generate a sample YAML file from a CRD definition.

https://github.com/Skarlso/crd-to-sample-yaml

4.28K views15:00

DevOps&SRE Library

LLM Inference Performance Engineering: Best Practices

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices

4.15K views07:01

DevOps&SRE Library

Switching Build Systems, Seamlessly

https://engineering.atspotify.com/2023/10/switching-build-systems-seamlessly

3.97K views15:00

DevOps&SRE Library

radius

Radius is a cloud-native application platform that enables developers and the platform engineers that support them to collaborate on delivering and managing cloud-native applications that follow organizational best practices for cost, operations and security, by default. Radius is an open-source project that supports deploying applications across private cloud, Microsoft Azure, and Amazon Web Services, with more cloud providers to come.

https://github.com/radius-project/radius

3.93K views07:00

DevOps&SRE Library

selectel-billing-exporter

Prometheus exporter для получения информации по биллингу аккаунта в хостинге Selectel

https://github.com/mxssl/selectel-billing-exporter

3.97K views09:00

DevOps&SRE Library

The Scary Thing About Automating Deploys

Most of Slack runs on a monolithic service simply called “The Webapp”. It’s big – hundreds of developers create hundreds of changes every week.

Deploying at this scale is a unique challenge. When people talk about continuous deployment, they’re often thinking about deploying to systems as soon as changes are ready. They talk about microservices and 2-pizza teams (~8 people). But what does continuous deployment mean when you’re looking at 150 changes on a normal day? That’s a lot of pizzas…

https://slack.engineering/the-scary-thing-about-automating-deploys

3.71K views15:01

DevOps&SRE Library

API load testing: A beginner's guide

An API load test generally starts with small loads on isolated components. As your testing matures, your strategy can expand to how to test the API more completely. You’ll test your API with more requests, longer durations, and on a wider test scope — from isolated components to complete end-to-end workflows.

https://grafana.com/blog/2024/01/30/api-load-testing

3.97K views17:00

DevOps&SRE Library

Continuous Integration

Continuous Integration is a software development practice where each member of a team merges their changes into a codebase together with their colleagues changes at least daily. Each of these integrations is verified by an automated build (including test) to detect integration errors as quickly as possible. Teams find that this approach reduces the risk of delivery delays, reduces the effort of integration, and enables practices that foster a healthy codebase for rapid enhancement with new features.

https://martinfowler.com/articles/continuousIntegration.html

3.57K views07:01

DevOps&SRE Library

prodzilla

Prodzilla is a modern synthetic monitoring tool built in Rust. It's focused on surfacing whether existing behaviour in production is as expected in a human-readable format, so that stakeholders, or even customers, can contribute to system verification.

https://github.com/prodzilla/prodzilla

3.38K views15:01

DevOps&SRE Library

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

I’ve led infrastructure at a startup for the past 4 years that has had to scale quickly. From the beginning I made some core decisions that the company has had to stick to, for better or worse, these past four years. This post will list some of the major decisions made and if I endorse them for your startup, or if I regret them and advise you to pick something else.

https://cep.dev/posts/every-infrastructure-decision-i-endorse-or-regret-after-4-years-running-infrastructure-at-a-startup

3.55K views07:00

DevOps&SRE Library

Infrastructure Pipeline

https://medium.com/@tusharmurudkar/devops-infrastructure-pipeline-beab47e7b876

3.66K views15:01

DevOps&SRE Library

How to have multiple Terraform deployments with the same GitHub Action

https://medium.com/@robbiedouglas/how-to-have-multiple-terraform-deployments-with-the-same-github-action-043f082f76e2

3.33K views07:01

DevOps&SRE Library

Fallback

What is it? How does it work? When to use it and when not to use it?

https://blog.alexewerlof.com/p/fallback

3.14K views15:00

DevOps&SRE Library

How to reduce expenses on monitoring: Swapping in VictoriaMetrics for Prometheus

Monitoring can get expensive due to the huge quantities of data that need to be processed. In this blog post, you’ll learn the best ways to store and process monitoring metrics to reduce your costs, and how VictoriaMetrics can help.

This blog post will only cover open-source solutions. VictoriaMetrics is proudly open source. You’ll get the most out of this blog post if you are familiar with Prometheus, Thanos, Mimir or VictoriaMetrics.

https://victoriametrics.com/blog/reducing-costs-p1/index.html

3.57K views05:01

DevOps&SRE Library

Reliability Engineering Mindset