DevOps&SRE Library
17.8K subscribers
462 photos
4 videos
2 files
4.76K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
kubeskoop

KubeSkoop is a kubernetes networking diagnose tool for different CNI plug-ins and IAAS providers. KubeSkoop automatic construct network traffic graph of Pod in the Kubernetes cluster, monitoring and analysis of the kernel's critical path by eBPF, to resolve most of Kubernetes cluster network problems.


https://github.com/alibaba/kubeskoop
kubezoo

KubeZoo is a lightweight gateway service that leverages the existing namespace model and add multi-tenancy capability to existing Kubernetes. KubeZoo provides view-level isolation among tenants by capturing and transforming the requests and responses.


https://github.com/kubewharf/kubezoo
kargo

Kargo is a next-generation continuous delivery and application lifecycle orchestration platform for Kubernetes. It builds upon GitOps principles and integrates with existing technologies, like Argo CD, to streamline and automate the progressive rollout of changes across the many stages of an application's lifecycle.


https://github.com/akuity/kargo
wireguard-ui

A web user interface to manage your WireGuard setup.


https://github.com/ngoduykhanh/wireguard-ui
Troubleshooting Missing Kubernetes Logs in Elasticsearch

https://povilasv.me/troubleshooting-missing-kubernetes-logs-in-elasticsearch
Optimizing Kubernetes scalability and cost-efficiency with Karpenter

In this post, you’ll learn the rationale and approach taken by Miro’s Compute team to enhance Kubernetes cluster scaling and efficiency. This was achieved by adopting groupless node pools using Karpenter and helped reduce the compute costs in non-production clusters up to 60%, while increasing production resources usage efficiency up to 95%.


https://medium.com/miro-engineering/optimizing-kubernetes-scalability-and-cost-efficiency-with-karpenter-356153fcf546
Handling Pods When Nodes Fail

In addition to the basic Pod types, Kubernetes offers a variety of higher-level workload types, such as Deployment, DaemonSet, and StatefulSet. These higher-level controllers allow you to provide services with multiple replicas of your Pods, making it easier to achieve a high availability architecture.

However, when Kubernetes nodes experience failures such as crashes, network disruptions, or system failures, what happens to the Pods running on those nodes?

From a high availability perspective, some might think that having multiple replicas of an application ensures that the service remains unaffected by node failures. However, in certain cases where the application belongs to a StatefulSet, horizontal scaling isn’t an option. In such scenarios, it becomes necessary to quickly reschedule the related Pods to maintain service availability in the event of node failures.


https://hwchiu.medium.com/handling-pods-when-nodes-fail-4daae20213b
Kubernetes V1.27 : Safeguarding Pod with MemoryThrottlingFactor

https://faun.pub/kubernetes-v1-27-safeguarding-pod-with-memorythrottlingfactor-cfbccde10de
dcgm-exporter

This repository contains the DCGM-Exporter project. It exposes GPU metrics exporter for Prometheus leveraging NVIDIA DCGM.


https://github.com/NVIDIA/dcgm-exporter