Prometheus Operator in Kubernetes
Manage Prometheus deployments using the operator pattern in Kubernetes
Prometheus Operator in Kubernetes
The Prometheus Operator provides Kubernetes-native deployment and management of Prometheus and related monitoring components. This guide covers installation, configuration, and best practices.
Video Tutorial
Learn more about Prometheus Operator in Kubernetes in this comprehensive video tutorial:
Installation
1. Using Helm
# Add prometheus-community repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install prometheus-operator
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
2. Using Manifests
# Clone the repository
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
# Create the namespace and CRDs
kubectl create -f manifests/setup
# Deploy the stack
kubectl create -f manifests/
Custom Resource Definitions (CRDs)
1. Prometheus CRD
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
2. ServiceMonitor CRD
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app
namespace: monitoring
labels:
team: frontend
spec:
selector:
matchLabels:
app: example
endpoints:
- port: web
interval: 15s
3. PodMonitor CRD
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: example-pods
namespace: monitoring
spec:
selector:
matchLabels:
app: example
podMetricsEndpoints:
- port: metrics
interval: 30s
Advanced Configuration
1. Alertmanager Configuration
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: alertmanager
namespace: monitoring
spec:
replicas: 3
alertmanagerConfigSelector:
matchLabels:
role: alert-rules
2. PrometheusRule CRD
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-rules
namespace: monitoring
labels:
role: alert-rules
spec:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: http_request_duration_seconds > 5
for: 10m
labels:
severity: warning
annotations:
summary: High request latency on {{ $labels.instance }}
Storage Configuration
1. Persistent Storage
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
resources:
requests:
storage: 100Gi
2. Retention Configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
retention: 30d
retentionSize: 10GB
High Availability
1. Multi-Replica Setup
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
replicas: 2
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: prometheus
operator: In
values:
- prometheus
topologyKey: kubernetes.io/hostname
2. Alertmanager HA
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: alertmanager
spec:
replicas: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: alertmanager
operator: In
values:
- alertmanager
topologyKey: kubernetes.io/hostname
Security Configuration
1. RBAC Setup
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
2. TLS Configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
secrets:
- etcd-certs
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
certFile: /etc/prometheus/secrets/etcd-certs/tls.crt
keyFile: /etc/prometheus/secrets/etcd-certs/tls.key
Monitoring the Operator
1. Operator Metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: prometheus-operator
namespace: monitoring
spec:
selector:
matchLabels:
app: prometheus-operator
endpoints:
- port: http
2. Grafana Dashboard
{
"dashboard": {
"panels": [
{
"title": "Operator Reconciliation",
"targets": [
{
"expr": "rate(prometheus_operator_reconcile_operations_total[5m])"
}
]
},
{
"title": "CRD Management",
"targets": [
{
"expr": "prometheus_operator_managed_resources"
}
]
}
]
}
}
Best Practices
-
Resource Management
- Set appropriate resource requests and limits
- Configure retention policies
- Use persistent storage
-
Monitoring Strategy
- Use ServiceMonitors for service discovery
- Implement proper alerting rules
- Configure recording rules for complex queries
-
Security
- Enable RBAC
- Use TLS encryption
- Implement network policies
-
High Availability
- Deploy multiple replicas
- Use pod anti-affinity
- Configure proper storage
Troubleshooting
Common Issues
- CRD Issues
# Check CRD status
kubectl get crd | grep monitoring.coreos.com
# Verify CRD configuration
kubectl describe prometheuses.monitoring.coreos.com
- Operator Issues
# Check operator logs
kubectl logs -l app=prometheus-operator -n monitoring
# Verify operator status
kubectl get pods -l app=prometheus-operator -n monitoring
- Resource Issues
# Check resource usage
kubectl top pods -n monitoring
# Verify PVC status
kubectl get pvc -n monitoring