Prometheus Service Discovery in Kubernetes

Implement dynamic service discovery in Prometheus for automatic target detection

Prometheus Service Discovery in Kubernetes

Service Discovery in Prometheus enables automatic detection and monitoring of services in your Kubernetes cluster. This guide covers various service discovery mechanisms and their implementation.

Service Discovery Methods

1. Kubernetes Service Discovery

Pod Discovery

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

Service Discovery

scrape_configs:
  - job_name: 'kubernetes-services'
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true

2. ServiceMonitor CRD

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example
  endpoints:
  - port: web
    interval: 15s
    path: /metrics

Implementation Patterns

1. Auto-Discovery with Annotations

apiVersion: v1
kind: Pod
metadata:
  name: example-app
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  containers:
  - name: example-app
    image: example/app:v1
    ports:
    - containerPort: 8080

2. Label-Based Discovery

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-monitor
spec:
  selector:
    matchLabels:
      monitoring: enabled
  namespaceSelector:
    matchNames:
      - default
      - prod
  endpoints:
  - port: http
    interval: 30s

Advanced Configurations

1. Multi-Target Discovery

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: multi-target
spec:
  endpoints:
  - port: http-metrics
    interval: 15s
  - port: additional-metrics
    interval: 30s
    path: /extra-metrics
  selector:
    matchLabels:
      app: multi-metric-app

2. Namespace Discovery

scrape_configs:
  - job_name: 'kubernetes-namespaces'
    kubernetes_sd_configs:
      - role: namespace
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace_label_monitoring]
        regex: enabled
        action: keep

Relabeling Configuration

1. Basic Relabeling

relabel_configs:
  - source_labels: [__meta_kubernetes_pod_label_app]
    target_label: application
  - source_labels: [__meta_kubernetes_namespace]
    target_label: namespace

2. Advanced Relabeling

relabel_configs:
  - source_labels: [__meta_kubernetes_pod_label_app, __meta_kubernetes_pod_label_version]
    separator: /
    target_label: app_version
  - source_labels: [__meta_kubernetes_pod_container_port_number]
    regex: '([0-9]+)'
    replacement: '$1'
    target_label: port

Filtering and Target Selection

1. Label-Based Filtering

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: filtered-monitor
spec:
  selector:
    matchExpressions:
      - key: environment
        operator: In
        values: [production, staging]
  endpoints:
  - port: metrics

2. Namespace Filtering

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: namespace-monitor
spec:
  namespaceSelector:
    matchExpressions:
      - key: environment
        operator: In
        values: [prod, staging]

Custom Service Discovery

1. File-Based Discovery

scrape_configs:
  - job_name: 'file-discovery'
    file_sd_configs:
      - files:
        - '/etc/prometheus/file_sd/*.yaml'
    relabel_configs:
      - source_labels: [environment]
        target_label: env

2. DNS-Based Discovery

scrape_configs:
  - job_name: 'dns-discovery'
    dns_sd_configs:
      - names:
        - 'service.consul.local'
        type: 'A'
        port: 9100

Monitoring Service Discovery

1. Service Discovery Metrics

# Monitor discovered targets
sum(prometheus_sd_discovered_targets)

# Monitor scrape pool synchronization
rate(prometheus_target_sync_length_seconds_sum[5m])

2. Target Status Dashboard

{
  "dashboard": {
    "panels": [
      {
        "title": "Active Targets",
        "targets": [
          {
            "expr": "sum(up) by (job)"
          }
        ]
      },
      {
        "title": "Scrape Duration",
        "targets": [
          {
            "expr": "rate(prometheus_target_interval_length_seconds_sum[5m])"
          }
        ]
      }
    ]
  }
}

Best Practices

  1. Labeling Strategy

    • Use consistent label naming
    • Avoid high cardinality labels
    • Document label meanings
  2. Performance Optimization

    • Set appropriate scrape intervals
    • Use efficient relabeling
    • Monitor scrape duration
  3. Security

    • Use RBAC for service discovery
    • Secure endpoint access
    • Monitor unauthorized access attempts
  4. Maintenance

    • Regular configuration review
    • Monitor discovery errors
    • Update service selectors

Troubleshooting

Common Issues

  1. Target Not Found
# Check service monitor
kubectl get servicemonitor

# Verify labels
kubectl get pods --show-labels
  1. Scrape Failures
# Query scrape errors
sum(scrape_samples_scraped) by (job) == 0

# Check scrape duration
rate(prometheus_target_interval_length_seconds_sum[5m])
  1. Configuration Issues
# Validate configuration
promtool check config prometheus.yml

# Check Prometheus logs
kubectl logs -l app=prometheus

Additional Resources