Prometheus Blackbox Monitoring in Kubernetes
Monitor external endpoints and services with Prometheus Blackbox Exporter
Prometheus Blackbox Monitoring in Kubernetes
The Prometheus Blackbox Exporter allows you to monitor network endpoints using HTTP, HTTPS, DNS, TCP, and ICMP probes. This guide covers setup, configuration, and best practices.
Video Tutorial
Learn more about Prometheus Blackbox Monitoring in this comprehensive video tutorial:
Installation
1. Using Helm
# Add prometheus-community repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install blackbox exporter
helm install blackbox-exporter prometheus-community/prometheus-blackbox-exporter \
--namespace monitoring \
--create-namespace
2. Using Manifests
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox-exporter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: blackbox-exporter
template:
metadata:
labels:
app: blackbox-exporter
spec:
containers:
- name: blackbox-exporter
image: prom/blackbox-exporter:v0.24.0
args:
- --config.file=/config/blackbox.yml
ports:
- containerPort: 9115
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: blackbox-exporter-config
Configuration
1. Basic Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: blackbox-exporter-config
namespace: monitoring
data:
blackbox.yml: |
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_status_codes: [200]
method: GET
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
valid_status_codes: [200,201,202]
tcp_connect:
prober: tcp
timeout: 5s
icmp:
prober: icmp
timeout: 5s
2. Advanced HTTP Probes
modules:
http_advanced:
prober: http
timeout: 5s
http:
method: GET
headers:
Authorization: "Bearer secret-token"
fail_if_ssl: false
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
valid_status_codes: [200,201,202,203,204]
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
fail_if_body_matches_regexp:
- "maintenance"
fail_if_body_not_matches_regexp:
- "ok"
fail_if_header_matches:
- header: Content-Type
regexp: "text/html"
fail_if_header_not_matches:
- header: Access-Control-Allow-Origin
regexp: "\\*"
Service Discovery
1. Static Targets
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: blackbox
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http
path: /probe
params:
module: ['http_2xx']
target: ['example.com', 'api.example.com']
relabelings:
- sourceLabels: [__param_target]
targetLabel: instance
- sourceLabels: [__param_module]
targetLabel: module
2. Dynamic Targets
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: blackbox-dynamic
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http
path: /probe
params:
module: ['http_2xx']
relabelings:
- sourceLabels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
regex: true
action: keep
- sourceLabels: [__meta_kubernetes_service_annotation_prometheus_io_probe_path]
targetLabel: __param_target
- sourceLabels: [__meta_kubernetes_service_annotation_prometheus_io_probe_module]
targetLabel: __param_module
Monitoring Different Protocols
1. HTTPS Monitoring
modules:
https_2xx:
prober: http
timeout: 5s
http:
method: GET
fail_if_not_ssl: true
tls_config:
ca_file: /etc/ssl/certs/ca-certificates.crt
2. TCP Monitoring
modules:
tcp_tls:
prober: tcp
timeout: 5s
tcp:
tls: true
tls_config:
insecure_skip_verify: false
3. DNS Monitoring
modules:
dns_example:
prober: dns
timeout: 5s
dns:
query_name: "example.com"
query_type: "A"
valid_rcodes:
- NOERROR
validate_answer_rrs:
fail_if_matches_regexp:
- ".*127.0.0.1"
Alerting Configuration
1. Basic Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: blackbox-alerts
namespace: monitoring
spec:
groups:
- name: blackbox
rules:
- alert: BlackboxProbeFailure
expr: probe_success == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Blackbox probe failed for {{ $labels.instance }}"
description: "Probe failed\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
2. Advanced Alerts
groups:
- name: blackbox-advanced
rules:
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[5m]) > 1
for: 15m
labels:
severity: warning
annotations:
summary: "Slow probe for {{ $labels.instance }}"
- alert: BlackboxSSLCertExpiry
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 5m
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon for {{ $labels.instance }}"
Dashboard Configuration
1. Grafana Dashboard
{
"dashboard": {
"panels": [
{
"title": "Probe Success Rate",
"targets": [
{
"expr": "probe_success",
"legendFormat": "{{ instance }}"
}
]
},
{
"title": "HTTP Response Time",
"targets": [
{
"expr": "probe_duration_seconds",
"legendFormat": "{{ instance }}"
}
]
},
{
"title": "SSL Certificate Expiry",
"targets": [
{
"expr": "(probe_ssl_earliest_cert_expiry - time()) / 86400",
"legendFormat": "{{ instance }}"
}
]
}
]
}
}
Best Practices
-
Probe Configuration
- Set appropriate timeouts
- Use specific success criteria
- Implement proper TLS validation
-
Resource Management
- Set resource limits
- Configure appropriate intervals
- Monitor exporter performance
-
Security
- Use TLS where possible
- Implement proper RBAC
- Secure sensitive endpoints
-
Monitoring Strategy
- Define clear SLOs
- Implement meaningful alerts
- Use appropriate probe frequency
Troubleshooting
Common Issues
- Probe Failures
# Check probe success rate
probe_success
# Check probe duration
probe_duration_seconds
# Check SSL issues
probe_ssl_earliest_cert_expiry
- Configuration Issues
# Verify config
kubectl get configmap blackbox-exporter-config -n monitoring -o yaml
# Check exporter logs
kubectl logs -l app=blackbox-exporter -n monitoring
- Network Issues
# Check network latency
rate(probe_duration_seconds[5m])
# Check DNS resolution
probe_dns_lookup_time_seconds