DevOps & SRE notes

The article discusses how they optimized costs for their Kubernetes-based ClickHouse clusters on AWS EKS. They achieved significant savings by analyzing and improving their EKS node utilization. The primary issue was the underutilization of EC2 instances. By changing the Kubernetes scheduler's scoring policy from 'LeastAllocated' to 'MostAllocated', they effectively increased cluster utilization and reduced the number of necessary EC2 nodes. This approach also involved setting up a custom scheduler and strategically handling system utility workloads. The result was a considerable reduction in infrastructure costs, without compromising performance or reliability for customers.
https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money

ClickHouse

Saving Millions of Dollars by Bin-Packing ClickHouse Pods in AWS EKS

Read about how changing the pod scheduling in our Kubernetes clusters, powering ClickHouse Cloud, saved millions of dollars.

👍5

1.61K viewstutunak, 13:02

DevOps & SRE notes

Interesting article about CICD observability by Grafan
https://grafana.com/blog/2023/11/20/ci-cd-observability-via-opentelemetry-at-grafana-labs/

Grafana Labs

What is CI/CD observability? | Grafana Labs

Learn why observability is critical to CI/CD and how we’re addressing it internally at Grafana Labs, and get a sneak peek at our vision for something that could democratize CI/CD insights for Grafana users and beyond.

👍4

1.61K viewstutunak, 06:02

DevOps & SRE notes

Helm Charts as Code

https://github.com/Praqma/helmsman

GitHub

GitHub - mkubaczyk/helmsman: Helm Charts as Code

Helm Charts as Code. Contribute to mkubaczyk/helmsman development by creating an account on GitHub.

👍3❤1😱1👌1

1.72K viewstutunak, 13:02

DevOps & SRE notes

Kubevious CLI - Prevent Kubernetes disasters at the early stages
https://github.com/kubevious/cli

GitHub

GitHub - kubevious/cli: Kubevious CLI - Prevent Kubernetes disasters at the early stages

Kubevious CLI - Prevent Kubernetes disasters at the early stages - kubevious/cli

👍3🔥2

1.62K viewstutunak, 06:02

DevOps & SRE notes

Now you can store container images on separate volume for k8s
https://kubernetes.io/blog/2024/01/23/kubernetes-separate-image-filesystem/

Kubernetes

Image Filesystem: Configuring Kubernetes to store containers on a separate filesystem

A common issue in running/operating Kubernetes clusters is running out of disk space. When the node is provisioned, you should aim to have a good amount of storage space for your container images and running containers. The container runtime usually writes…

👍5👌1

1.57K viewstutunak, 13:03

DevOps & SRE notes

https://explainextended.com/2023/12/31/happy-new-year-15/

👍4

1.63K viewstutunak, 06:02

DevOps & SRE notes

Kubernetes powered PaaS that runs in your own cloud.
https://github.com/porter-dev/porter

👍4

1.51K viewstutunak, 13:02

DevOps & SRE notes

Kubernetes native tool for mocking and testing API and micro-services. Microcks is a Cloud Native Computing Foundation sandbox project 🚀
https://github.com/microcks/microcks

GitHub

GitHub - microcks/microcks: The open source, cloud native tool for API Mocking and Testing. Microcks is a Cloud Native Computing…

The open source, cloud native tool for API Mocking and Testing. Microcks is a Cloud Native Computing Foundation incubating project 🚀 - microcks/microcks

👍5

1.61K viewstutunak, 06:03

DevOps & SRE notes

a guide related to GPU operators, possibly providing insights or instructions on how to use or manage GPU operators effectivel
https://lmyslinski.com/posts/gpu-operator-guide/

Lmyslinski

A guide to NVIDIA's GPU Operator

How to get Nvidia cards running on K8s

👍5

1.55K viewsDr. Mort, 13:00

DevOps & SRE notes

How to use k6s for stress tests, short article
https://grafana.com/blog/2024/01/30/stress-testing/

Grafana Labs

Stress testing: A beginner's guide | Grafana Labs

A basic guide to stress testing and how to create a stress test in Grafana k6

👍3

1.43K viewstutunak, 06:03

DevOps & SRE notes

kube2iam provides different AWS IAM roles for pods running on Kubernetes

https://github.com/jtblin/kube2iam

GitHub

GitHub - jtblin/kube2iam: kube2iam provides different AWS IAM roles for pods running on Kubernetes

kube2iam provides different AWS IAM roles for pods running on Kubernetes - jtblin/kube2iam

👍5

1.46K viewstutunak, 10:12

DevOps & SRE notes

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://github.com/fluxcd/flagger

GitHub

GitHub - fluxcd/flagger: Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - fluxcd/flagger

👍5🔥2👏2

1.64K viewstutunak, 13:02

DevOps & SRE notes

2024 Kubernetes Cost Benchmark Report.pdf

5.3 MB

An interesting statistic is how much resources are overprovisioned in Kubernetes.

👍4🔥4👏4❤1

1.55K viewstutunak, 06:02

DevOps & SRE notes

This article explores Kubernetes resource limits, detailing strategies for balancing efficiency with predictability and how limits affect performance, planning, and Quality of Service (QoS) classes.
https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits/

Kubernetes

The Case for Kubernetes Resource Limits: Predictability vs. Efficiency

There’s been quite a lot of posts suggesting that not using Kubernetes resource limits might be a fairly useful thing (for example, For the Love of God, Stop Using CPU Limits on Kubernetes or Kubernetes: Make your services faster by removing CPU limits ).…

👍6

1.52K viewsDr. Mort, 13:00

DevOps & SRE notes

In this article, you'll learn how to avoid three common mistakes with PromQL and Kubernetes metrics
https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics

home.robusta.dev

3 Common Mistakes with PromQL and Kubernetes Metrics | Robusta

Millions of developers write PromQL queries and build custom Grafana dashboards for Kubernetes. And everyone uses the same underlying metrics from node-exporter, kubelet, and kube-state-metrics. Unfortunately, there are some little-known pitfalls that many…

👍5

1.5K viewsDr. Mort, 06:00

DevOps & SRE notes

Easily check your clusters for use of deprecated APIs
https://github.com/doitintl/kube-no-trouble

GitHub

GitHub - doitintl/kube-no-trouble: Easily check your clusters for use of deprecated APIs

Easily check your clusters for use of deprecated APIs - doitintl/kube-no-trouble

👍5

1.48K viewsDr. Mort, 13:01

DevOps & SRE notes

Scheduled snapshots for Kubernetes persistent volumes
https://github.com/backube/snapscheduler

GitHub

GitHub - backube/snapscheduler: Scheduled snapshots for Kubernetes persistent volumes

Scheduled snapshots for Kubernetes persistent volumes - backube/snapscheduler

👍4

1.57K viewsDr. Mort, 06:00

DevOps & SRE notes

In this article, you'll learn how to maintain uninterrupted pod operation while utilizing Karpenter for node scaling.

https://rtfm.co.ua/en/kubernetes-ensuring-high-availability-for-pods/

RTFM: Linux, DevOps, and system administration | DevOps-engineering, and system administration. Cases from practice.

Kubernetes: ensuring High Availability for Pods

Setting up High Availability for Kubernetes Pods with Deployment replicas, Pod Topology Spread Constraints, PodDisruptionBudget and annotations for Karpenter

👍7🔥2💯2💩1

1.48K viewsDr. Mort, 13:00

About

Blog

Apps

Platform