Sysadmin Tools 🇺🇦
3.88K subscribers
692 photos
28 videos
302 files
5.1K links
Sysadmin/DevOps tools, news and other interesting things from modern IT world.
Feed https://t.me/s/sysadmin_tools
Download Telegram
A post looking at the role of an SRE team in adopting observability tooling. A lot of this depends, in my experience, on the reality on the ground of roles vs the titles.

https://rootly.io/blog/the-role-of-sres-in-observability

#sre #observability
The practical guide to incident management

https://incident.io/guide/

#sre #guide #oncall #incident
Kubernetes monitoring: why it is difficult and how to improve it

https://youtu.be/R9oV6DE0K10

#kubernetes #k8s #monitoring #observability #sre #victoriametrics
botkube

An app that helps you monitor your Kubernetes cluster, debug critical deployments & gives recommendations for standard practices

https://github.com/kubeshop/botkube

#kubernetes #devops #chatops #chatbot #sre #monitoring
How ilert Can Help Enhance Your Monitoring With Its VictoriaMetrics Integration

https://victoriametrics.com/blog/using-victoriametrics-and-ilert

#monitoring #sre #observability #ilert
Runbook automation platform with deep observability integrations for SRE & On-Call Teams

https://github.com/DrDroidLab/playbooks

#sre #monitoring #logs #metrics #alerts #traces #observability
🚀 Join Mathias Palmersheim – Solution Engineer at conf42.com! 🎙

🛠 How to Monitor your Monitoring 🖥

📟 If your monitoring system crashes in the middle of the night, does your team get alerted?

💡Hopefully, yes – but if not, this talk will provide simple, cost-effective solutions to get you started with #victoriaMetrics!

🔧 And even if you already have #monitoring for your monitoring, Mathias will share expert tips to help you improve your current setup.

🗓 October 17th – Online

https://www.conf42.com/Incident_Management_2024_Mathias_Palmersheim_27_monitoring_monitoring_how

#monitoring #devops #cloud #sre
Versus incident

An open-source incident management system with multi-channel alerting capabilities


https://github.com/VersusControl/versus-incident

#alerts #monitoring #sre #oncall
Preq

preq is the community-driven problem detector for Common Reliability Enumerations (CREs)


https://github.com/prequel-dev/preq

#monitoring #reliability #sre