Cortex Integration with Prometheus in Kubernetes

Cortex is a horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus. This guide covers Cortex setup and integration with Prometheus in Kubernetes.

Architecture Components

Distributor: Handles incoming writes from Prometheus
Ingester: Writes data to long-term storage
Query Frontend: Optimizes and schedules queries
Store Gateway: Handles queries for historical data
Ruler: Evaluates recording and alerting rules
Alertmanager: Handles alert notifications

Installation

1. Using Helm

# Add Cortex helm repo
helm repo add cortex-helm https://cortexproject.github.io/cortex-helm-chart
helm repo update

# Install Cortex
helm install cortex cortex-helm/cortex \
  --namespace monitoring \
  --create-namespace \
  --values cortex-values.yaml

2. Basic Configuration

# cortex-values.yaml
ingester:
  replicas: 3
  persistence:
    enabled: true
    size: 100Gi

distributor:
  replicas: 3

querier:
  replicas: 2

ruler:
  enabled: true
  replicas: 2

store_gateway:
  enabled: true
  replicas: 2

compactor:
  enabled: true
  persistence:
    size: 100Gi

alertmanager:
  enabled: true
  replicas: 2

Prometheus Integration

1. Prometheus Configuration

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  remoteWrite:
    - url: http://cortex-distributor:9009/api/v1/push
      writeRelabelConfigs:
        - sourceLabels: [__name__]
          regex: 'up|prometheus_.*'
          action: keep

2. Storage Configuration

storage:
  backend: s3
  s3:
    bucket_name: cortex-storage
    endpoint: s3.amazonaws.com
    access_key_id: AKIAIOSFODNN7EXAMPLE
    secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    region: us-east-1

Component Setup

1. Distributor Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-distributor-config
data:
  cortex.yaml: |
    distributor:
      ring:
        kvstore:
          store: consul
          consul:
            host: consul:8500
      shard_by_all_labels: true
      pool:
        health_check_ingesters: true

2. Ingester Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ingester-config
data:
  cortex.yaml: |
    ingester:
      lifecycler:
        ring:
          kvstore:
            store: consul
          replication_factor: 3
      chunk_idle_period: 30m
      max_chunk_age: 2h
      chunk_target_size: 1536000
      max_transfer_retries: 10

3. Query Frontend Setup

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-query-frontend-config
data:
  cortex.yaml: |
    query_frontend:
      align_queries_with_step: true
      cache_results: true
      results_cache:
        cache:
          enable_fifocache: true
          fifocache:
            max_size_bytes: 1073741824
      split_queries_by_interval: 24h

Multi-tenancy Configuration

1. Authentication Setup

auth_enabled: true

auth:
  type: enterprise
  enterprise:
    url: http://auth:9090/api/users
    client_id: cortex
    client_secret: secret

2. Tenant Configuration

limits:
  per_user_override_config: /etc/cortex/overrides.yaml
  per_user_override_period: 10s

overrides:
  tenant1:
    ingestion_rate: 10000
    ingestion_burst_size: 20000
    max_series_per_metric: 100000
    max_series_per_query: 100000

High Availability Setup

1. Replication Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ha-config
data:
  cortex.yaml: |
    distributor:
      replication_factor: 3
      
    ingester:
      lifecycler:
        ring:
          replication_factor: 3

2. Zone-Aware Setup

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cortex-ingester
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: name
                operator: In
                values:
                - cortex-ingester
            topologyKey: kubernetes.io/hostname

Monitoring and Alerting

1. Component Monitoring

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cortex-components
spec:
  selector:
    matchLabels:
      app: cortex
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 15s

2. Alert Rules

groups:
- name: cortex.rules
  rules:
  - alert: CortexIngesterUnhealthy
    expr: |
      min(cortex_ring_members{state="ACTIVE", name="ingester"}) without(instance)
      < replication_factor
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: Cortex ingester unhealthy
  
  - alert: CortexRequestErrors
    expr: |
      100 * sum(rate(cortex_request_duration_seconds_count{status_code=~"5.."}[1m]))
      /
      sum(rate(cortex_request_duration_seconds_count[1m]))
      > 1
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: Cortex request errors

Query and Visualization

1. Grafana Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  prometheus.yaml: |
    apiVersion: 1
    datasources:
    - name: Cortex
      type: prometheus
      url: http://cortex-query-frontend:9009/prometheus
      access: proxy
      isDefault: true

2. Example PromQL Queries

# Query with tenant context
sum by(job) (rate(http_requests_total{tenant="tenant1"}[5m]))

# Cross-tenant query (if authorized)
sum by(tenant, job) (rate(http_requests_total[5m]))

Best Practices

Storage Management
- Configure appropriate retention
- Monitor storage usage
- Implement bucket lifecycle
Query Optimization
- Use query caching
- Configure query limits
- Monitor query performance
Resource Management
- Set appropriate limits
- Monitor component health
- Scale based on metrics
Security
- Enable authentication
- Configure RBAC
- Implement network policies

Troubleshooting

Common Issues

Ingestion Issues

# Check ingestion rate
rate(cortex_distributor_received_samples_total[5m])

# Check ingestion errors
rate(cortex_discarded_samples_total[5m])

Query Issues

# Check query latency
histogram_quantile(0.99, sum(rate(cortex_query_frontend_query_duration_seconds_bucket[5m])) by (le))

# Check query errors
rate(cortex_query_frontend_queries_failed_total[5m])

Storage Issues

# Check chunk operations
rate(cortex_chunk_store_index_entries_per_chunk[5m])

# Check store errors
rate(cortex_storage_request_duration_seconds_count{status_code=~"5.."}[5m])

Cortex Integration with Prometheus in Kubernetes

Cortex Integration with Prometheus in Kubernetes

Architecture Components

Installation

1. Using Helm

2. Basic Configuration

Prometheus Integration

1. Prometheus Configuration

2. Storage Configuration

Component Setup

1. Distributor Configuration

2. Ingester Configuration

3. Query Frontend Setup

Multi-tenancy Configuration

1. Authentication Setup

2. Tenant Configuration

High Availability Setup

1. Replication Configuration

2. Zone-Aware Setup

Monitoring and Alerting

1. Component Monitoring

2. Alert Rules

Query and Visualization

1. Grafana Configuration

2. Example PromQL Queries

Best Practices

Troubleshooting

Common Issues

Additional Resources