Kubernetes Monitoring Best Practices

Implement effective monitoring and observability in your Kubernetes clusters

Kubernetes Monitoring Best Practices

Effective monitoring is crucial for maintaining healthy Kubernetes clusters. This guide covers essential monitoring and observability practices.

Prerequisites

  • Basic understanding of Kubernetes
  • Access to a Kubernetes cluster
  • kubectl CLI tool installed
  • Familiarity with monitoring concepts

Project Structure

.
├── monitoring/
│   ├── prometheus/        # Prometheus configurations
│   ├── grafana/          # Grafana dashboards
│   ├── alerts/           # Alert configurations
│   └── logging/          # Logging configurations
└── metrics/
    ├── custom-metrics/   # Custom metric definitions
    └── dashboards/       # Dashboard templates

Prometheus Setup

1. Prometheus Operator

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

2. Service Monitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-monitor
spec:
  selector:
    matchLabels:
      app: web-app
  endpoints:
  - port: metrics
    interval: 15s

Alert Configuration

1. Alert Manager

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: main
spec:
  replicas: 3
  configSecret: alertmanager-config

2. Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-alerts
spec:
  groups:
  - name: kubernetes
    rules:
    - alert: HighCPUUsage
      expr: container_cpu_usage_seconds_total > 90
      for: 5m
      labels:
        severity: warning
      annotations:
        description: Container {{ $labels.container }} has high CPU usage
    - alert: PodCrashLooping
      expr: kube_pod_container_status_restarts_total > 5
      for: 15m
      labels:
        severity: critical

Logging Setup

1. Fluentd Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
      </parse>
    </source>

2. Elasticsearch Configuration

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: logging
spec:
  version: 7.15.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false

Metrics Collection

1. Custom Metrics

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-metrics
data:
  metrics.yaml: |
    metrics:
      - name: http_requests_total
        type: counter
        help: Total number of HTTP requests
      - name: http_request_duration_seconds
        type: histogram
        help: HTTP request duration in seconds

2. Node Exporter

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100
          name: metrics

Dashboard Configuration

1. Grafana Dashboard

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: kubernetes-monitoring
spec:
  json: |
    {
      "title": "Kubernetes Monitoring",
      "panels": [
        {
          "title": "CPU Usage",
          "type": "graph",
          "datasource": "Prometheus"
        },
        {
          "title": "Memory Usage",
          "type": "graph",
          "datasource": "Prometheus"
        }
      ]
    }

Best Practices Checklist

  1. ✅ Set up Prometheus monitoring
  2. ✅ Configure AlertManager
  3. ✅ Implement logging solution
  4. ✅ Create custom metrics
  5. ✅ Set up dashboards
  6. ✅ Monitor cluster health
  7. ✅ Configure alerts
  8. ✅ Implement log aggregation
  9. ✅ Set up visualization
  10. ✅ Regular monitoring review

Monitoring Components

Core Metrics

  • CPU usage
  • Memory usage
  • Disk I/O
  • Network traffic
  • Pod status

Application Metrics

  • Request latency
  • Error rates
  • Throughput
  • Custom business metrics
  • Dependencies health

Infrastructure Metrics

  • Node health
  • Cluster capacity
  • Resource utilization
  • Network performance
  • Storage metrics

Alert Priorities

Critical Alerts

  • Node failures
  • Pod crashes
  • Service outages
  • Resource exhaustion
  • Security incidents

Warning Alerts

  • High resource usage
  • Slow response times
  • Backup failures
  • Certificate expiration
  • Storage warnings

Best Practices by Component

Prometheus

  • Use service discovery
  • Configure retention
  • Set up federation
  • Implement recording rules
  • Regular backups

Grafana

  • Use templating
  • Organize dashboards
  • Set up authentication
  • Configure alerting
  • Regular updates

Logging

  • Structured logging
  • Log rotation
  • Index management
  • Search optimization
  • Retention policies

Conclusion

Implementing these monitoring best practices ensures visibility and quick problem resolution in your Kubernetes clusters. Regular review and updates of monitoring configurations are essential for maintaining cluster health.

Additional Resources