The article discusses how they optimized costs for their Kubernetes-based ClickHouse clusters on AWS EKS. They achieved significant savings by analyzing and improving their EKS node utilization. The primary issue was the underutilization of EC2 instances. By changing the Kubernetes scheduler's scoring policy from 'LeastAllocated' to 'MostAllocated', they effectively increased cluster utilization and reduced the number of necessary EC2 nodes. This approach also involved setting up a custom scheduler and strategically handling system utility workloads. The result was a considerable reduction in infrastructure costs, without compromising performance or reliability for customers.
https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money
https://clickhouse.com/blog/packing-kubernetes-pods-more-efficiently-saving-money
ClickHouse
Saving Millions of Dollars by Bin-Packing ClickHouse Pods in AWS EKS
Read about how changing the pod scheduling in our Kubernetes clusters, powering ClickHouse Cloud, saved millions of dollars.
π5
Interesting article about CICD observability by Grafan
https://grafana.com/blog/2023/11/20/ci-cd-observability-via-opentelemetry-at-grafana-labs/
https://grafana.com/blog/2023/11/20/ci-cd-observability-via-opentelemetry-at-grafana-labs/
Grafana Labs
What is CI/CD observability? | Grafana Labs
Learn why observability is critical to CI/CD and how weβre addressing it internally at Grafana Labs, and get a sneak peek at our vision for something that could democratize CI/CD insights for Grafana users and beyond.
π4
Kubevious CLI - Prevent Kubernetes disasters at the early stages
https://github.com/kubevious/cli
https://github.com/kubevious/cli
GitHub
GitHub - kubevious/cli: Kubevious CLI - Prevent Kubernetes disasters at the early stages
Kubevious CLI - Prevent Kubernetes disasters at the early stages - kubevious/cli
π3π₯2
Now you can store container images on separate volume for k8s
https://kubernetes.io/blog/2024/01/23/kubernetes-separate-image-filesystem/
https://kubernetes.io/blog/2024/01/23/kubernetes-separate-image-filesystem/
Kubernetes
Image Filesystem: Configuring Kubernetes to store containers on a separate filesystem
A common issue in running/operating Kubernetes clusters is running out of disk space. When the node is provisioned, you should aim to have a good amount of storage space for your container images and running containers. The container runtime usually writesβ¦
π5π1
π4
Kubernetes powered PaaS that runs in your own cloud.
https://github.com/porter-dev/porter
https://github.com/porter-dev/porter
π4
Kubernetes native tool for mocking and testing API and micro-services. Microcks is a Cloud Native Computing Foundation sandbox project π
https://github.com/microcks/microcks
https://github.com/microcks/microcks
GitHub
GitHub - microcks/microcks: The open source, cloud native tool for API Mocking and Testing. Microcks is a Cloud Native Computingβ¦
The open source, cloud native tool for API Mocking and Testing. Microcks is a Cloud Native Computing Foundation incubating project π - microcks/microcks
π5
a guide related to GPU operators, possibly providing insights or instructions on how to use or manage GPU operators effectivel
https://lmyslinski.com/posts/gpu-operator-guide/
https://lmyslinski.com/posts/gpu-operator-guide/
Lmyslinski
A guide to NVIDIA's GPU Operator
How to get Nvidia cards running on K8s
π5
How to use k6s for stress tests, short article
https://grafana.com/blog/2024/01/30/stress-testing/
https://grafana.com/blog/2024/01/30/stress-testing/
Grafana Labs
Stress testing: A beginner's guide | Grafana Labs
A basic guide to stress testing and how to create a stress test in Grafana k6
π3
kube2iam provides different AWS IAM roles for pods running on Kubernetes
https://github.com/jtblin/kube2iam
https://github.com/jtblin/kube2iam
GitHub
GitHub - jtblin/kube2iam: kube2iam provides different AWS IAM roles for pods running on Kubernetes
kube2iam provides different AWS IAM roles for pods running on Kubernetes - jtblin/kube2iam
π5
Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://github.com/fluxcd/flagger
https://github.com/fluxcd/flagger
GitHub
GitHub - fluxcd/flagger: Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - fluxcd/flagger
π5π₯2π2
2024 Kubernetes Cost Benchmark Report.pdf
5.3 MB
An interesting statistic is how much resources are overprovisioned in Kubernetes.
π4π₯4π4β€1
This article explores Kubernetes resource limits, detailing strategies for balancing efficiency with predictability and how limits affect performance, planning, and Quality of Service (QoS) classes.
https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits/
https://kubernetes.io/blog/2023/11/16/the-case-for-kubernetes-resource-limits/
Kubernetes
The Case for Kubernetes Resource Limits: Predictability vs. Efficiency
Thereβs been quite a lot of posts suggesting that not using Kubernetes resource limits might be a fairly useful thing (for example, For the Love of God, Stop Using CPU Limits on Kubernetes or Kubernetes: Make your services faster by removing CPU limits ).β¦
π6
In this article, you'll learn how to avoid three common mistakes with PromQL and Kubernetes metrics
https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics
https://home.robusta.dev/blog/3-common-mistakes-with-promql-and-kubernetes-metrics
home.robusta.dev
3 Common Mistakes with PromQL and Kubernetes Metrics | Robusta
Millions of developers write PromQL queries and build custom Grafana dashboards for Kubernetes. And everyone uses the same underlying metrics from node-exporter, kubelet, and kube-state-metrics. Unfortunately, there are some little-known pitfalls that manyβ¦
π5
Easily check your clusters for use of deprecated APIs
https://github.com/doitintl/kube-no-trouble
https://github.com/doitintl/kube-no-trouble
GitHub
GitHub - doitintl/kube-no-trouble: Easily check your clusters for use of deprecated APIs
Easily check your clusters for use of deprecated APIs - doitintl/kube-no-trouble
π5
Scheduled snapshots for Kubernetes persistent volumes
https://github.com/backube/snapscheduler
https://github.com/backube/snapscheduler
GitHub
GitHub - backube/snapscheduler: Scheduled snapshots for Kubernetes persistent volumes
Scheduled snapshots for Kubernetes persistent volumes - backube/snapscheduler
π4
In this article, you'll learn how to maintain uninterrupted pod operation while utilizing Karpenter for node scaling.
https://rtfm.co.ua/en/kubernetes-ensuring-high-availability-for-pods/
https://rtfm.co.ua/en/kubernetes-ensuring-high-availability-for-pods/
RTFM: Linux, DevOps, and system administration | DevOps-engineering, and system administration. Cases from practice.
Kubernetes: ensuring High Availability for Pods
Setting up High Availability for Kubernetes Pods with Deployment replicas, Pod Topology Spread Constraints, PodDisruptionBudget and annotations for Karpenter
π7π₯2π―2π©1