Kubernetes Monitoring Best Practices

Effective monitoring is crucial for maintaining healthy Kubernetes clusters. This guide covers essential monitoring and observability practices.

Prerequisites

Basic understanding of Kubernetes
Access to a Kubernetes cluster
kubectl CLI tool installed
Familiarity with monitoring concepts

Project Structure

.
├── monitoring/
│   ├── prometheus/        # Prometheus configurations
│   ├── grafana/          # Grafana dashboards
│   ├── alerts/           # Alert configurations
│   └── logging/          # Logging configurations
└── metrics/
    ├── custom-metrics/   # Custom metric definitions
    └── dashboards/       # Dashboard templates

Prometheus Setup

1. Prometheus Operator

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

2. Service Monitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-monitor
spec:
  selector:
    matchLabels:
      app: web-app
  endpoints:
  - port: metrics
    interval: 15s

Alert Configuration

1. Alert Manager

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: main
spec:
  replicas: 3
  configSecret: alertmanager-config

2. Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-alerts
spec:
  groups:
  - name: kubernetes
    rules:
    - alert: HighCPUUsage
      expr: container_cpu_usage_seconds_total > 90
      for: 5m
      labels:
        severity: warning
      annotations:
        description: Container {{ $labels.container }} has high CPU usage
    - alert: PodCrashLooping
      expr: kube_pod_container_status_restarts_total > 5
      for: 15m
      labels:
        severity: critical

Logging Setup

1. Fluentd Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
      </parse>
    </source>

2. Elasticsearch Configuration

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: logging
spec:
  version: 7.15.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false

Metrics Collection

1. Custom Metrics

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-metrics
data:
  metrics.yaml: |
    metrics:
      - name: http_requests_total
        type: counter
        help: Total number of HTTP requests
      - name: http_request_duration_seconds
        type: histogram
        help: HTTP request duration in seconds

2. Node Exporter

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100
          name: metrics

Dashboard Configuration

1. Grafana Dashboard

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: kubernetes-monitoring
spec:
  json: |
    {
      "title": "Kubernetes Monitoring",
      "panels": [
        {
          "title": "CPU Usage",
          "type": "graph",
          "datasource": "Prometheus"
        },
        {
          "title": "Memory Usage",
          "type": "graph",
          "datasource": "Prometheus"
        }
      ]
    }

Best Practices Checklist

✅ Set up Prometheus monitoring
✅ Configure AlertManager
✅ Implement logging solution
✅ Create custom metrics
✅ Set up dashboards
✅ Monitor cluster health
✅ Configure alerts
✅ Implement log aggregation
✅ Set up visualization
✅ Regular monitoring review

Monitoring Components

Core Metrics

CPU usage
Memory usage
Disk I/O
Network traffic
Pod status

Application Metrics

Request latency
Error rates
Throughput
Custom business metrics
Dependencies health

Infrastructure Metrics

Node health
Cluster capacity
Resource utilization
Network performance
Storage metrics

Alert Priorities

Critical Alerts

Node failures
Pod crashes
Service outages
Resource exhaustion
Security incidents

Warning Alerts

High resource usage
Slow response times
Backup failures
Certificate expiration
Storage warnings

Best Practices by Component

Prometheus

Use service discovery
Configure retention
Set up federation
Implement recording rules
Regular backups

Grafana

Use templating
Organize dashboards
Set up authentication
Configure alerting
Regular updates

Logging

Structured logging
Log rotation
Index management
Search optimization
Retention policies

Conclusion

Implementing these monitoring best practices ensures visibility and quick problem resolution in your Kubernetes clusters. Regular review and updates of monitoring configurations are essential for maintaining cluster health.

Kubernetes Monitoring Best Practices

Kubernetes Monitoring Best Practices

Prerequisites

Project Structure

Prometheus Setup

1. Prometheus Operator

2. Service Monitor

Alert Configuration

1. Alert Manager

2. Alert Rules

Logging Setup

1. Fluentd Configuration

2. Elasticsearch Configuration

Metrics Collection

1. Custom Metrics

2. Node Exporter

Dashboard Configuration

1. Grafana Dashboard

Best Practices Checklist

Monitoring Components

Core Metrics

Application Metrics

Infrastructure Metrics

Alert Priorities

Critical Alerts

Warning Alerts

Best Practices by Component

Prometheus

Grafana

Logging

Conclusion

Additional Resources