DevOps&SRE Library

The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically:

- Expose the number of GPUs on each nodes of your cluster
- Keep track of the health of your GPUs
- Run GPU enabled containers in your Kubernetes cluster.

https://github.com/NVIDIA/k8s-device-plugin

3.79K views07:00

DevOps&SRE Library

kluctl

Kluctl is the missing glue that puts together your (and any third-party) deployments into one large declarative Kubernetes deployment, while making it fully manageable (deploy, diff, prune, delete, ...) via one unified command line interface.

https://github.com/kluctl/kluctl

3.95K views15:01

DevOps&SRE Library

k8e

Kubernetes Easy Engine(k8e)🚀 is a lightweight, scalable enterprise-grade Kubernetes distribution that allows users to manage, protect and obtain out-of-the-box Kubernetes clusters in a unified manner. It is suitable for enterprise environments.

https://github.com/xiaods/k8e

3.62K views07:01

DevOps&SRE Library

zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

https://github.com/zed-industries/zed

4.14K views15:01

DevOps&SRE Library

heynote

Heynote is a dedicated scratchpad for developers. It functions as a large persistent text buffer where you can write down anything you like. Works great for that Slack message you don't want to accidentally send, a JSON response from an API you're working with, notes from a meeting, your daily to-do list, etc.

https://github.com/heyman/heynote

4.07K views07:00

DevOps&SRE Library

The Bun Shell

The Bun Shell is a new experimental embedded language and interpreter in Bun that allows you to run cross-platform shell scripts in JavaScript & TypeScript.

https://bun.sh/blog/the-bun-shell

4.17K views07:00

DevOps&SRE Library

wal-listener

A service that helps implement the Event-Driven architecture.

To maintain the consistency of data in the system, we will use transactional messaging - publishing events in a single transaction with a domain model change.

The service allows you to subscribe to changes in the PostgreSQL database using its logical decoding capability and publish them to the NATS Streaming server.

https://github.com/ihippik/wal-listener

4.46K views07:00

DevOps&SRE Library

The state of Kubernetes jobs in 2023 Q4

Kubernetes Job market trends for Q4 2023

https://kube.careers/state-of-kubernetes-jobs-2023-q4

4.27K views09:01

DevOps&SRE Library

42 things I learned from building a production database

https://maheshba.bitbucket.io/blog/2021/10/19/42Things.html

4.4K views15:01

DevOps&SRE Library

12 Factor CLI Apps

At Heroku, we’ve come up with a methodology called the 12 factor app. It’s a set of principles designed to make great web applications that are easy to maintain. In that spirit, here are 12 CLI factors to keep in mind when building your next CLI application. Following these principles will offer CLI UX that users will love.

https://medium.com/@jdxcode/12-factor-cli-apps-dd3c227a0e46

4.35K views07:01

DevOps&SRE Library

Viacheslav Biriukov - SRE deep dive into Linux Page Cache

In this series of articles, I would like to talk about Linux Page Cache. I believe that the following knowledge of the theory and tools is essential and crucial for every SRE. This understanding can help both in usual and routine everyday DevOps-like tasks and in emergency debugging and firefighting.

https://biriukov.dev/docs/page-cache/0-linux-page-cache-for-sre

4.44K views15:01

DevOps&SRE Library

Loki's new TSDB Index

https://lokidex.com/posts/tsdb

3.8K views07:00

DevOps&SRE Library

uptrace

Open source APM: OpenTelemetry traces, metrics, and logs

https://github.com/uptrace/uptrace

4.05K views15:01

DevOps&SRE Library

kubernetes-image-puller

Kubernetes Image Puller is used for caching images on a cluster. It creates a DaemonSet downloading and running the relevant container images on each node.

https://github.com/che-incubator/kubernetes-image-puller

4.18K views06:59

DevOps&SRE Library

Why Distributed Systems Fail?

Distributed systems are tricky - it's easy to make wrong assumptions that lead to problems down the road. Back in the 90s, computer scientist L. Peter Deutsch identified several common misconceptions, or "fallacies," that trip up engineers working on distributed systems. Surprisingly these fallacies are still relevant today:

1. The Network is Reliable: It's risky to assume networks are 100% reliable. Networks can and do fail in various ways.
2. Latency is Zero: While we might wish our networks had no latency, that's simply not physically possible - even light takes time to travel distances. Ignoring the inevitable delay in data transmission can lead to unrealistic expectations of system performance.
3. Bandwidth is Infinite: This overlooks the physical and practical limitations on data transfer rates.
4. The Network is Secure: No wonder Security is a growing industry. Assuming inherent security can lead to vulnerabilities and oversight in protective measures.
5. Topology Doesn't Change: This neglects the dynamic nature of network configurations.
6. There is One Administrator: A simplification that fails to consider the complexity of managing distributed systems.
7. Transport Cost is Zero: Overlooking the resources required for data movement.
8. The Network is Homogeneous: Ignoring the diversity in network systems and standards.

These fallacies, if not recognized and addressed, can lead to design flaws, performance issues, and security vulnerabilities in distributed systems. In the following sections, we will break down each of these misconceptions, exploring their implications and how to mitigate the risks they pose in real-world applications.

P1: https://www.codereliant.io/why-distributed-systems-fail-1

P2: https://www.codereliant.io/why-distributed-systems-fail-2

5.24K views15:01

DevOps&SRE Library

Using LocalStack and GitHub Actions to Test Terraform AWS Deployments

https://medium.com/@robbiedouglas/using-localstack-and-github-actions-to-test-terraform-aws-deployments-0a119dcff7c2

3.6K views07:00

About

Blog

Apps

Platform