Thanos Integration with Prometheus in Kubernetes
Scale Prometheus with Thanos for global view and long-term storage
Thanos Integration with Prometheus in Kubernetes
Thanos extends Prometheus capabilities by providing unlimited retention, high availability, and global query view across multiple Prometheus instances. This guide covers Thanos setup and configuration in Kubernetes.
Architecture Components
- Thanos Sidecar: Uploads metrics to object storage
- Thanos Store: Provides access to historical metrics
- Thanos Query: Global query view across Prometheus instances
- Thanos Compactor: Compacts and downsamples stored metrics
- Thanos Ruler: Evaluates recording and alerting rules
Installation
1. Using Helm
# Add bitnami repo
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# Install Thanos
helm install thanos bitnami/thanos \
--namespace monitoring \
--create-namespace \
--values thanos-values.yaml
2. Basic Configuration
# thanos-values.yaml
objstoreConfig:
type: s3
config:
bucket: thanos-metrics
endpoint: s3.amazonaws.com
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region: us-east-1
query:
enabled: true
replicaCount: 2
store:
enabled: true
replicaCount: 2
compactor:
enabled: true
retentionResolutionRaw: 30d
retentionResolution5m: 90d
retentionResolution1h: 1y
Prometheus Integration
1. Prometheus Configuration with Thanos Sidecar
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 2
thanos:
baseImage: quay.io/thanos/thanos
version: v0.31.0
objectStorageConfig:
key: thanos.yaml
name: thanos-objstore-config
2. Object Storage Configuration
apiVersion: v1
kind: Secret
metadata:
name: thanos-objstore-config
type: Opaque
stringData:
thanos.yaml: |
type: s3
config:
bucket: thanos-metrics
endpoint: s3.amazonaws.com
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region: us-east-1
Component Setup
1. Thanos Query
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
spec:
replicas: 2
template:
spec:
containers:
- name: thanos-query
image: quay.io/thanos/thanos:v0.31.0
args:
- query
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:9090
- --store=dnssrv+_grpc._tcp.thanos-store-gateway:10901
- --store=dnssrv+_grpc._tcp.thanos-sidecar:10901
ports:
- name: http
containerPort: 9090
- name: grpc
containerPort: 10901
2. Thanos Store
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-store-gateway
spec:
replicas: 2
template:
spec:
containers:
- name: thanos-store
image: quay.io/thanos/thanos:v0.31.0
args:
- store
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --data-dir=/var/thanos/store
- --objstore.config-file=/etc/thanos/objstore.yml
volumeMounts:
- name: thanos-store-data
mountPath: /var/thanos/store
- name: thanos-objstore
mountPath: /etc/thanos
readOnly: true
3. Thanos Compactor
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-compactor
spec:
template:
spec:
containers:
- name: thanos-compactor
image: quay.io/thanos/thanos:v0.31.0
args:
- compact
- --data-dir=/var/thanos/compact
- --objstore.config-file=/etc/thanos/objstore.yml
- --retention.resolution-raw=30d
- --retention.resolution-5m=90d
- --retention.resolution-1h=1y
volumeMounts:
- name: data
mountPath: /var/thanos/compact
- name: thanos-objstore
mountPath: /etc/thanos
Query and Visualization
1. Grafana Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
prometheus.yaml: |
apiVersion: 1
datasources:
- name: Thanos
type: prometheus
url: http://thanos-query:9090
access: proxy
isDefault: true
2. Example PromQL Queries
# Query with deduplication
sum(rate(http_requests_total[5m])) without (replica)
# Long-term query with auto-downsampling
sum(rate(http_requests_total[1d])) by (job)
High Availability Setup
1. Multi-Cluster Configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-eu
spec:
thanos:
baseImage: quay.io/thanos/thanos
version: v0.31.0
objectStorageConfig:
key: thanos.yaml
name: thanos-objstore-config
additionalArgs:
- --cluster=eu-west
2. Cross-Region Query
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
spec:
template:
spec:
containers:
- args:
- query
- --store=dnssrv+_grpc._tcp.thanos-store-eu-west:10901
- --store=dnssrv+_grpc._tcp.thanos-store-us-east:10901
Monitoring and Alerting
1. Thanos Alerts
groups:
- name: thanos-component-alerts
rules:
- alert: ThanosCompactHalted
expr: thanos_compact_halted == 1
for: 5m
labels:
severity: critical
annotations:
summary: Thanos Compact has halted
- alert: ThanosQueryHighDNSFailures
expr: rate(thanos_query_store_nodes_dns_failures_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: Thanos Query is having DNS resolution issues
2. Performance Monitoring
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: thanos-components
spec:
selector:
matchLabels:
app: thanos
endpoints:
- port: http
interval: 15s
Best Practices
-
Storage Configuration
- Use appropriate retention periods
- Configure bucket lifecycle policies
- Monitor storage usage
-
Query Optimization
- Use appropriate time ranges
- Implement deduplication
- Monitor query performance
-
High Availability
- Deploy multiple replicas
- Use cross-region setup
- Implement proper backup
-
Resource Management
- Set appropriate limits
- Monitor component health
- Scale based on metrics
Troubleshooting
Common Issues
- Store Issues
# Check store errors
rate(thanos_store_bucket_operations_failed_total[5m])
# Check series fetch duration
histogram_quantile(0.99, rate(thanos_bucket_store_series_fetch_duration_seconds_bucket[5m]))
- Query Issues
# Check query errors
rate(thanos_query_errors_total[5m])
# Check query duration
histogram_quantile(0.99, rate(thanos_query_duration_seconds_bucket[5m]))
- Compaction Issues
# Check compaction errors
rate(thanos_compact_group_compactions_failures_total[5m])
# Check compaction duration
thanos_compact_group_compactions_total