Prometheus Operator in Kubernetes

Manage Prometheus deployments using the operator pattern in Kubernetes

Prometheus Operator in Kubernetes

The Prometheus Operator provides Kubernetes-native deployment and management of Prometheus and related monitoring components. This guide covers installation, configuration, and best practices.

Video Tutorial

Learn more about Prometheus Operator in Kubernetes in this comprehensive video tutorial:

View Source Code

Installation

1. Using Helm

# Add prometheus-community repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install prometheus-operator
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

2. Using Manifests

# Clone the repository
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus

# Create the namespace and CRDs
kubectl create -f manifests/setup

# Deploy the stack
kubectl create -f manifests/

Custom Resource Definitions (CRDs)

1. Prometheus CRD

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

2. ServiceMonitor CRD

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  namespace: monitoring
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example
  endpoints:
  - port: web
    interval: 15s

3. PodMonitor CRD

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-pods
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: example
  podMetricsEndpoints:
  - port: metrics
    interval: 30s

Advanced Configuration

1. Alertmanager Configuration

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  replicas: 3
  alertmanagerConfigSelector:
    matchLabels:
      role: alert-rules

2. PrometheusRule CRD

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example-rules
  namespace: monitoring
  labels:
    role: alert-rules
spec:
  groups:
  - name: example
    rules:
    - alert: HighRequestLatency
      expr: http_request_duration_seconds > 5
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: High request latency on {{ $labels.instance }}

Storage Configuration

1. Persistent Storage

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: standard
        resources:
          requests:
            storage: 100Gi

2. Retention Configuration

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  retention: 30d
  retentionSize: 10GB

High Availability

1. Multi-Replica Setup

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 2
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: prometheus
            operator: In
            values:
            - prometheus
        topologyKey: kubernetes.io/hostname

2. Alertmanager HA

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: alertmanager
spec:
  replicas: 3
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: alertmanager
            operator: In
            values:
            - alertmanager
        topologyKey: kubernetes.io/hostname

Security Configuration

1. RBAC Setup

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]

2. TLS Configuration

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  secrets:
  - etcd-certs
  tlsConfig:
    caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
    certFile: /etc/prometheus/secrets/etcd-certs/tls.crt
    keyFile: /etc/prometheus/secrets/etcd-certs/tls.key

Monitoring the Operator

1. Operator Metrics

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-operator
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: prometheus-operator
  endpoints:
  - port: http

2. Grafana Dashboard

{
  "dashboard": {
    "panels": [
      {
        "title": "Operator Reconciliation",
        "targets": [
          {
            "expr": "rate(prometheus_operator_reconcile_operations_total[5m])"
          }
        ]
      },
      {
        "title": "CRD Management",
        "targets": [
          {
            "expr": "prometheus_operator_managed_resources"
          }
        ]
      }
    ]
  }
}

Best Practices

  1. Resource Management

    • Set appropriate resource requests and limits
    • Configure retention policies
    • Use persistent storage
  2. Monitoring Strategy

    • Use ServiceMonitors for service discovery
    • Implement proper alerting rules
    • Configure recording rules for complex queries
  3. Security

    • Enable RBAC
    • Use TLS encryption
    • Implement network policies
  4. High Availability

    • Deploy multiple replicas
    • Use pod anti-affinity
    • Configure proper storage

Troubleshooting

Common Issues

  1. CRD Issues
# Check CRD status
kubectl get crd | grep monitoring.coreos.com

# Verify CRD configuration
kubectl describe prometheuses.monitoring.coreos.com
  1. Operator Issues
# Check operator logs
kubectl logs -l app=prometheus-operator -n monitoring

# Verify operator status
kubectl get pods -l app=prometheus-operator -n monitoring
  1. Resource Issues
# Check resource usage
kubectl top pods -n monitoring

# Verify PVC status
kubectl get pvc -n monitoring

Additional Resources