Prometheus Blackbox Monitoring in Kubernetes

Monitor external endpoints and services with Prometheus Blackbox Exporter

Prometheus Blackbox Monitoring in Kubernetes

The Prometheus Blackbox Exporter allows you to monitor network endpoints using HTTP, HTTPS, DNS, TCP, and ICMP probes. This guide covers setup, configuration, and best practices.

Video Tutorial

Learn more about Prometheus Blackbox Monitoring in this comprehensive video tutorial:

View Source Code

Installation

1. Using Helm

# Add prometheus-community repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install blackbox exporter
helm install blackbox-exporter prometheus-community/prometheus-blackbox-exporter \
  --namespace monitoring \
  --create-namespace

2. Using Manifests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: blackbox-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: blackbox-exporter
  template:
    metadata:
      labels:
        app: blackbox-exporter
    spec:
      containers:
      - name: blackbox-exporter
        image: prom/blackbox-exporter:v0.24.0
        args:
        - --config.file=/config/blackbox.yml
        ports:
        - containerPort: 9115
        volumeMounts:
        - name: config
          mountPath: /config
      volumes:
      - name: config
        configMap:
          name: blackbox-exporter-config

Configuration

1. Basic Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: blackbox-exporter-config
  namespace: monitoring
data:
  blackbox.yml: |
    modules:
      http_2xx:
        prober: http
        timeout: 5s
        http:
          valid_status_codes: [200]
          method: GET
      
      http_post_2xx:
        prober: http
        timeout: 5s
        http:
          method: POST
          valid_status_codes: [200,201,202]
      
      tcp_connect:
        prober: tcp
        timeout: 5s
      
      icmp:
        prober: icmp
        timeout: 5s

2. Advanced HTTP Probes

modules:
  http_advanced:
    prober: http
    timeout: 5s
    http:
      method: GET
      headers:
        Authorization: "Bearer secret-token"
      fail_if_ssl: false
      fail_if_not_ssl: true
      tls_config:
        insecure_skip_verify: false
      valid_status_codes: [200,201,202,203,204]
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      fail_if_body_matches_regexp:
        - "maintenance"
      fail_if_body_not_matches_regexp:
        - "ok"
      fail_if_header_matches:
        - header: Content-Type
          regexp: "text/html"
      fail_if_header_not_matches:
        - header: Access-Control-Allow-Origin
          regexp: "\\*"

Service Discovery

1. Static Targets

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: blackbox
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: http
    path: /probe
    params:
      module: ['http_2xx']
      target: ['example.com', 'api.example.com']
    relabelings:
    - sourceLabels: [__param_target]
      targetLabel: instance
    - sourceLabels: [__param_module]
      targetLabel: module

2. Dynamic Targets

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: blackbox-dynamic
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: http
    path: /probe
    params:
      module: ['http_2xx']
    relabelings:
    - sourceLabels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
      regex: true
      action: keep
    - sourceLabels: [__meta_kubernetes_service_annotation_prometheus_io_probe_path]
      targetLabel: __param_target
    - sourceLabels: [__meta_kubernetes_service_annotation_prometheus_io_probe_module]
      targetLabel: __param_module

Monitoring Different Protocols

1. HTTPS Monitoring

modules:
  https_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      fail_if_not_ssl: true
      tls_config:
        ca_file: /etc/ssl/certs/ca-certificates.crt

2. TCP Monitoring

modules:
  tcp_tls:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config:
        insecure_skip_verify: false

3. DNS Monitoring

modules:
  dns_example:
    prober: dns
    timeout: 5s
    dns:
      query_name: "example.com"
      query_type: "A"
      valid_rcodes:
      - NOERROR
      validate_answer_rrs:
        fail_if_matches_regexp:
        - ".*127.0.0.1"

Alerting Configuration

1. Basic Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: blackbox-alerts
  namespace: monitoring
spec:
  groups:
  - name: blackbox
    rules:
    - alert: BlackboxProbeFailure
      expr: probe_success == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Blackbox probe failed for {{ $labels.instance }}"
        description: "Probe failed\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

2. Advanced Alerts

groups:
- name: blackbox-advanced
  rules:
  - alert: BlackboxSlowProbe
    expr: avg_over_time(probe_duration_seconds[5m]) > 1
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Slow probe for {{ $labels.instance }}"
  
  - alert: BlackboxSSLCertExpiry
    expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "SSL certificate expiring soon for {{ $labels.instance }}"

Dashboard Configuration

1. Grafana Dashboard

{
  "dashboard": {
    "panels": [
      {
        "title": "Probe Success Rate",
        "targets": [
          {
            "expr": "probe_success",
            "legendFormat": "{{ instance }}"
          }
        ]
      },
      {
        "title": "HTTP Response Time",
        "targets": [
          {
            "expr": "probe_duration_seconds",
            "legendFormat": "{{ instance }}"
          }
        ]
      },
      {
        "title": "SSL Certificate Expiry",
        "targets": [
          {
            "expr": "(probe_ssl_earliest_cert_expiry - time()) / 86400",
            "legendFormat": "{{ instance }}"
          }
        ]
      }
    ]
  }
}

Best Practices

  1. Probe Configuration

    • Set appropriate timeouts
    • Use specific success criteria
    • Implement proper TLS validation
  2. Resource Management

    • Set resource limits
    • Configure appropriate intervals
    • Monitor exporter performance
  3. Security

    • Use TLS where possible
    • Implement proper RBAC
    • Secure sensitive endpoints
  4. Monitoring Strategy

    • Define clear SLOs
    • Implement meaningful alerts
    • Use appropriate probe frequency

Troubleshooting

Common Issues

  1. Probe Failures
# Check probe success rate
probe_success

# Check probe duration
probe_duration_seconds

# Check SSL issues
probe_ssl_earliest_cert_expiry
  1. Configuration Issues
# Verify config
kubectl get configmap blackbox-exporter-config -n monitoring -o yaml

# Check exporter logs
kubectl logs -l app=blackbox-exporter -n monitoring
  1. Network Issues
# Check network latency
rate(probe_duration_seconds[5m])

# Check DNS resolution
probe_dns_lookup_time_seconds

Additional Resources