Managing Cloud Monitoring with Terraform

Learn how to set up and manage Google Cloud Monitoring (formerly Stackdriver) using Terraform

In this guide, we’ll explore how to manage Google Cloud Monitoring using Terraform.

Video Tutorial

Learn more about managing Google Cloud Monitoring with Terraform in this comprehensive video tutorial:

Prerequisites

  • Google Cloud SDK installed and configured
  • Terraform installed (version 1.0.0 or later)
  • A GCP project with billing enabled

Project Structure

.
├── main.tf                   # Main Terraform configuration file
├── variables.tf              # Variable definitions
├── outputs.tf               # Output definitions
├── terraform.tfvars         # Variable values
└── modules/
    └── monitoring/
        ├── main.tf          # Cloud Monitoring specific configurations
        ├── variables.tf      # Module variables
        ├── alerts.tf        # Alert configurations
        └── outputs.tf       # Module outputs

Provider Configuration

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 4.0"
    }
  }
}

provider "google" {
  project = var.project_id
  region  = var.region
}

Variables

variable "project_id" {
  description = "The ID of the GCP project"
  type        = string
}

variable "region" {
  description = "The region to deploy resources to"
  type        = string
  default     = "us-central1"
}

variable "notification_channel_email" {
  description = "Email address for notifications"
  type        = string
}

Monitoring Workspace Configuration

resource "google_monitoring_workspace" "workspace" {
  workspace_id = "terraform-workspace"
  display_name = "Terraform Managed Workspace"
  description  = "Monitoring workspace managed by Terraform"
}

Alert Policy Configuration

resource "google_monitoring_alert_policy" "alert_policy" {
  display_name = "High CPU Usage Alert"
  combiner     = "OR"
  conditions {
    display_name = "CPU usage above threshold"
    condition_threshold {
      filter     = "metric.type=\"compute.googleapis.com/instance/cpu/utilization\" AND resource.type=\"gce_instance\""
      duration   = "60s"
      comparison = "COMPARISON_GT"
      threshold_value = 0.8
      trigger {
        count = 1
      }
      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_MEAN"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.name]

  documentation {
    content   = "CPU usage is above 80%"
    mime_type = "text/markdown"
  }

  user_labels = {
    severity = "critical"
  }
}

Notification Channel Configuration

resource "google_monitoring_notification_channel" "email" {
  display_name = "Email Notification Channel"
  type         = "email"
  labels = {
    email_address = var.notification_channel_email
  }
}

resource "google_monitoring_notification_channel" "slack" {
  display_name = "Slack Notification Channel"
  type         = "slack"
  labels = {
    channel_name = "#monitoring-alerts"
  }
  sensitive_labels {
    auth_token = var.slack_token
  }
}

Uptime Check Configuration

resource "google_monitoring_uptime_check_config" "https" {
  display_name = "HTTPS Uptime Check"
  timeout      = "10s"

  http_check {
    path         = "/"
    port         = "443"
    use_ssl      = true
    validate_ssl = true
  }

  monitored_resource {
    type = "uptime_url"
    labels = {
      project_id = var.project_id
      host       = "example.com"
    }
  }

  period = "300s"
}

Dashboard Configuration

resource "google_monitoring_dashboard" "dashboard" {
  dashboard_json = jsonencode({
    displayName = "Terraform Managed Dashboard"
    gridLayout = {
      widgets = [
        {
          title = "CPU Usage"
          xyChart = {
            dataSets = [{
              timeSeriesQuery = {
                timeSeriesFilter = {
                  filter = "metric.type=\"compute.googleapis.com/instance/cpu/utilization\""
                }
              }
            }]
          }
        },
        {
          title = "Memory Usage"
          xyChart = {
            dataSets = [{
              timeSeriesQuery = {
                timeSeriesFilter = {
                  filter = "metric.type=\"compute.googleapis.com/instance/memory/utilization\""
                }
              }
            }]
          }
        }
      ]
    }
  })
}

Custom Metric Configuration

resource "google_monitoring_metric_descriptor" "custom" {
  description   = "Daily user transactions"
  display_name  = "Daily Transactions"
  type         = "custom.googleapis.com/daily_transactions"
  metric_kind  = "GAUGE"
  value_type   = "INT64"
  unit         = "1"
  
  labels {
    key         = "environment"
    value_type  = "STRING"
    description = "Environment (production, staging)"
  }
}

Outputs

output "workspace_name" {
  value       = google_monitoring_workspace.workspace.name
  description = "The name of the monitoring workspace"
}

output "alert_policy_name" {
  value       = google_monitoring_alert_policy.alert_policy.name
  description = "The name of the alert policy"
}

output "notification_channel_id" {
  value       = google_monitoring_notification_channel.email.name
  description = "The ID of the notification channel"
}

Best Practices

  1. Monitoring Setup:

    • Define clear metrics
    • Set appropriate thresholds
    • Configure proper intervals
    • Use meaningful labels
  2. Alerting:

    • Avoid alert fatigue
    • Set proper thresholds
    • Use appropriate channels
    • Document alerts
  3. Dashboard Design:

    • Group related metrics
    • Use appropriate visualizations
    • Keep it simple
    • Regular updates
  4. Cost Optimization:

    • Monitor API usage
    • Clean up unused resources
    • Optimize retention
    • Regular review

Common Operations

Creating Resources

terraform init
terraform plan
terraform apply

Testing Alerts

# Simulate high CPU usage
stress --cpu 8 --timeout 300

Best Practices and Tips

  1. Metric Management:

    • Define clear SLOs
    • Use appropriate metrics
    • Regular review
    • Document changes
  2. Alert Management:

    • Avoid noise
    • Define escalations
    • Regular testing
    • Document procedures
  3. Operations:

    • Monitor costs
    • Track usage
    • Regular maintenance
    • Update documentation

Conclusion

You’ve learned how to set up and manage Google Cloud Monitoring using Terraform. This setup provides:

  • Comprehensive monitoring
  • Automated alerting
  • Custom metrics and dashboards
  • Best practices implementation