Prometheus Remote Storage in Kubernetes
Configure long-term storage solutions for Prometheus metrics in Kubernetes
Prometheus Remote Storage in Kubernetes
Remote storage allows Prometheus to durably store its metrics data for long-term retention and analysis. This guide covers various remote storage options and their implementation in Kubernetes.
Remote Storage Options
1. Thanos
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
thanos:
baseImage: quay.io/thanos/thanos
version: v0.31.0
objectStorageConfig:
key: thanos.yaml
name: thanos-objstore-config
2. Cortex
global:
remote_write:
- url: "http://cortex:9009/api/v1/push"
remote_timeout: 30s
write_relabel_configs:
- source_labels: [__name__]
regex: 'go_.*'
action: drop
3. Victoria Metrics
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
remoteWrite:
- url: "http://victoria-metrics:8428/api/v1/write"
queueConfig:
capacity: 10000
maxShards: 30
minShards: 1
Implementation Methods
1. Thanos Setup
# thanos-objstore-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: thanos-objstore-config
type: Opaque
stringData:
thanos.yaml: |
type: s3
config:
bucket: thanos-metrics
endpoint: s3.amazonaws.com
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region: us-east-1
2. Cortex Configuration
# cortex-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cortex-config
data:
cortex.yaml: |
distributor:
shard_by_all_labels: true
pool:
health_check_ingesters: true
ingester:
lifecycler:
ring:
replication_factor: 3
chunk_idle_period: 30m
max_chunk_age: 2h
3. Victoria Metrics Setup
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: victoria-metrics
spec:
serviceName: victoria-metrics
replicas: 1
selector:
matchLabels:
app: victoria-metrics
template:
metadata:
labels:
app: victoria-metrics
spec:
containers:
- name: victoria-metrics
image: victoriametrics/victoria-metrics
args:
- --storageDataPath=/storage
- --retentionPeriod=1y
Storage Configuration
1. S3 Storage
apiVersion: v1
kind: Secret
metadata:
name: remote-storage-credentials
type: Opaque
stringData:
config.yaml: |
type: S3
config:
bucket: metrics-storage
endpoint: s3.amazonaws.com
region: us-east-1
access_key: AKIAXXXXXXXXXXXXXXXX
secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2. GCS Storage
apiVersion: v1
kind: Secret
metadata:
name: gcs-credentials
type: Opaque
stringData:
gcs.json: |
{
"type": "service_account",
"project_id": "your-project",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "service-account@project.iam.gserviceaccount.com",
"client_id": "client-id"
}
Query Configuration
1. Query Frontend
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query-frontend
spec:
replicas: 2
template:
spec:
containers:
- name: thanos-query-frontend
image: quay.io/thanos/thanos:v0.31.0
args:
- query-frontend
- --query-frontend.downstream-url=http://thanos-query:9090
2. Query Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
spec:
template:
spec:
containers:
- name: thanos-query
image: quay.io/thanos/thanos:v0.31.0
args:
- query
- --store=dnssrv+_grpc._tcp.thanos-store-gateway:10901
Retention and Compaction
1. Retention Configuration
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-compactor
spec:
template:
spec:
containers:
- name: thanos-compactor
image: quay.io/thanos/thanos:v0.31.0
args:
- compact
- --retention.resolution-raw=30d
- --retention.resolution-5m=90d
- --retention.resolution-1h=1y
2. Compaction Settings
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
retention: 6h
retentionSize: 10GB
tsdb:
outOfOrderTimeWindow: 30m
Monitoring and Alerting
1. Remote Storage Alerts
groups:
- name: RemoteStorageAlerts
rules:
- alert: RemoteWriteErrors
expr: rate(prometheus_remote_storage_failed_samples_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: Remote write errors detected
- alert: RemoteStorageQueueFull
expr: prometheus_remote_storage_queue_highest_sent_timestamp_seconds
- prometheus_remote_storage_queue_oldest_unshipped_sample_timestamp_seconds
> 120
for: 15m
labels:
severity: critical
2. Performance Monitoring
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: remote-storage-monitor
spec:
selector:
matchLabels:
app: prometheus
endpoints:
- port: web
interval: 30s
metricRelabelings:
- sourceLabels: [__name__]
regex: prometheus_remote_storage_.*
action: keep
Best Practices
-
Performance Optimization
- Use appropriate queue configurations
- Implement proper retention policies
- Monitor write performance
-
High Availability
- Deploy multiple replicas
- Use cross-zone distribution
- Implement proper backup strategies
-
Cost Optimization
- Configure appropriate retention periods
- Use compression where possible
- Monitor storage usage
-
Security
- Use encryption at rest
- Implement proper access controls
- Regular security audits
Troubleshooting
Common Issues
- Write Failures
# Monitor failed writes
rate(prometheus_remote_storage_failed_samples_total[5m])
# Check queue length
prometheus_remote_storage_queue_length
- Performance Issues
# Monitor write latency
rate(prometheus_remote_storage_sent_batch_duration_seconds_sum[5m])
/
rate(prometheus_remote_storage_sent_batch_duration_seconds_count[5m])
- Storage Issues
# Monitor storage growth
rate(prometheus_tsdb_head_series_created_total[1h])