DevOps&SRE Library

kine

Run Kubernetes on MySQL, Postgres, sqlite, dqlite, not etcd.

4.17K views07:00

linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.

https://github.com/LINBIT/linstor-server

4.2K views15:02

DevOps&SRE Library

RedisInsight

RedisInsight is a visual tool that provides capabilities to design, develop and optimize your Redis application. Query, analyse and interact with your Redis data.

https://github.com/RedisInsight/RedisInsight

3.91K views07:00

DevOps&SRE Library

ScratchDB

Scratch is an open-source alternative to BigQuery, Redshift, and Snowflake. Runs on Clickhouse.

https://github.com/scratchdata/ScratchDB

4.15K views15:01

DevOps&SRE Library

Lessons learned from writing a Terraform Provider

https://medium.com/@abagayev/lessons-learned-from-writing-a-terraform-provider-62412b79a997

3.79K views07:00

DevOps&SRE Library

terraform-provider-namecheap

A Terraform Provider for Namecheap domain DNS configuration.

https://github.com/namecheap/terraform-provider-namecheap

3.79K views15:01

DevOps&SRE Library

Argo Workflows - Proven Patterns from Production

Argo Workflows provides an excellent platform for infrastructure automation, and has replaced Jenkins as my go tool for running scheduled or event-driven automation tasks.

In growing my experience with Argo Workflows, I’ve killed clusters, broken workflows and generally made a mess of things. I’ve also built a lot of workflows that needed refactoring as they became difficult to maintain.

This blog post aims to share some of the lessons I’ve learned, and some of the patterns I’ve developed, to help you avoid the same mistakes I’ve made.

https://hodgkins.io/argo-workflow-proven-patterns-from-production

3.89K views07:00

DevOps&SRE Library

Top 10 common Dockerfile linting issues

We've added the ability to lint Dockerfiles on demand in Depot. This post covers the top 10 most common Dockerfile linting issues we've seen flowing through Depot.

https://depot.dev/blog/dockerfile-linting-issues

3.84K views15:01

DevOps&SRE Library

Scaling Elasticsearch by Cleaning the Cluster State

We often get questions like:

- How much data can I put in an Elasticsearch cluster?
- How many nodes can an Elasticsearch cluster have?
- What’s the biggest cluster that you’ve seen?

And while the 14-year-old in me is proud to say that we’ve done 24/7 support for clusters of 1000+ nodes holding many PB of data, I am quick to add that:

1. It doesn’t mean it’s a good idea to have clusters that big.
2. Such generic questions deserve more nuanced answers. Which is exactly what this blog post does. And it applies to OpenSearch as well as for Elasticsearch. And for the most part, to Solr (where the cluster state is stored in Zookeeper).

https://sematext.com/blog/elasticsearch-scaling-cluster-state

3.72K views07:01

DevOps&SRE Library

Learning From Google SRE Team (part-1)

In this blog post, we aim to expand on the first 5 lessons shared by Google's Site Reliability Engineering team, offering a closer look at practical implementation examples.

https://www.codereliant.io/20-sre-lessons-from-google-part1

4.09K views15:01

DevOps&SRE Library

SRE Interview Prep Plan (Week 2) This week is dedicated to providing you with the skills and knowledge to automate routine tasks, create scripts to solve complex problems, and manage infrastructure as code. As we look at scripting languages like Python and…

SRE Interview Prep Plan (Week 3)

This week, we're taking another significant step forward as we get into the critical stack of monitoring and alerting. Now, it's time to equip yourself with the knowledge and tools needed to keep an eye on systems, analyze performance, and respond quickly to any issues that may come up.

https://www.codereliant.io/sre-interview-prep-plan-week-3

3.74K views07:00

DevOps&SRE Library

tailspin

A log file highlighter

https://github.com/bensadeh/tailspin

3.47K views15:00

DevOps&SRE Library

The costs of microservices

The microservices architecture adds more moving parts to the overall system, and this doesn’t come for free. The cost of fully embracing microservices is only worth paying if it can be amortized across dozens of development teams.

https://robertovitillo.com/costs-of-microservices

3.49K views07:01

DevOps&SRE Library

Retries, Backoff and Jitter

In distributed systems, failures and latency issues are inevitable. Services can fail due to overloaded servers, network issues, bugs, and various other factors. As engineers building distributed systems, we need strategies to make our services robust and resilient in the face of such failures. One useful technique is using retries.

https://www.codereliant.io/retries-backoff-jitter

3.6K views15:01

DevOps&SRE Library

Prometheus and centralized storage: When you need it, how it works, and what Mimir is

https://blog.palark.com/prometheus-centralized-storage-mimir

4.14K views07:02

DevOps&SRE Library

A guide to post-mortem meetings and how we run them at incident.io

https://incident.io/hubs/post-mortem/a-guide-to-post-mortem-meetings

3.93K views15:01

DevOps&SRE Library

A Comprehensive Guide to Testing in Terraform: Keep your tests, validations, checks, and policies in order

This post discusses testing and validation for infrastructure-as-code (IaC) with HashiCorp Terraform. The insights and ideas presented here can surely be extended to IaC in general.

https://mattias.engineer/posts/terraform-testing-and-validation

3.64K views07:02

DevOps&SRE Library

Elevating CloudWatch Logs: Smart Alerts with Chatbot, SNS, and Lambda

https://medium.com/@louis-fiori/cloudwatch-logs-enhanced-alerts-a50ea08d0845

3.48K views15:00

DevOps&SRE Library

From AI to sustainability, why our latest data centers use 400G networking

To meet the bandwidth requirements of new and future AI workloads—and stay committed to our sustainability goals—the Dropbox networking team recently designed and launched our first data center architecture using highly efficient, cutting edge 400 gigabit per second (400G) ethernet technology.

https://dropbox.tech/infrastructure/from-ai-to-sustainability-why-our-latest-data-centers-use-400g-networking

3.59K views07:01

DevOps&SRE Library

gitness

Gitness is an open source development platform packed with the power of code hosting and automated DevOps pipelines.

https://github.com/harness/gitness

3.56K views15:00

About

Blog

Apps

Platform