Managing Cloud Monitoring with Terraform
Learn how to set up and manage Google Cloud Monitoring (formerly Stackdriver) using Terraform
In this guide, we’ll explore how to manage Google Cloud Monitoring using Terraform.
Video Tutorial
Learn more about managing Google Cloud Monitoring with Terraform in this comprehensive video tutorial:
Prerequisites
- Google Cloud SDK installed and configured
- Terraform installed (version 1.0.0 or later)
- A GCP project with billing enabled
Project Structure
.
├── main.tf # Main Terraform configuration file
├── variables.tf # Variable definitions
├── outputs.tf # Output definitions
├── terraform.tfvars # Variable values
└── modules/
└── monitoring/
├── main.tf # Cloud Monitoring specific configurations
├── variables.tf # Module variables
├── alerts.tf # Alert configurations
└── outputs.tf # Module outputs
Provider Configuration
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 4.0"
}
}
}
provider "google" {
project = var.project_id
region = var.region
}
Variables
variable "project_id" {
description = "The ID of the GCP project"
type = string
}
variable "region" {
description = "The region to deploy resources to"
type = string
default = "us-central1"
}
variable "notification_channel_email" {
description = "Email address for notifications"
type = string
}
Monitoring Workspace Configuration
resource "google_monitoring_workspace" "workspace" {
workspace_id = "terraform-workspace"
display_name = "Terraform Managed Workspace"
description = "Monitoring workspace managed by Terraform"
}
Alert Policy Configuration
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = "High CPU Usage Alert"
combiner = "OR"
conditions {
display_name = "CPU usage above threshold"
condition_threshold {
filter = "metric.type=\"compute.googleapis.com/instance/cpu/utilization\" AND resource.type=\"gce_instance\""
duration = "60s"
comparison = "COMPARISON_GT"
threshold_value = 0.8
trigger {
count = 1
}
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = [google_monitoring_notification_channel.email.name]
documentation {
content = "CPU usage is above 80%"
mime_type = "text/markdown"
}
user_labels = {
severity = "critical"
}
}
Notification Channel Configuration
resource "google_monitoring_notification_channel" "email" {
display_name = "Email Notification Channel"
type = "email"
labels = {
email_address = var.notification_channel_email
}
}
resource "google_monitoring_notification_channel" "slack" {
display_name = "Slack Notification Channel"
type = "slack"
labels = {
channel_name = "#monitoring-alerts"
}
sensitive_labels {
auth_token = var.slack_token
}
}
Uptime Check Configuration
resource "google_monitoring_uptime_check_config" "https" {
display_name = "HTTPS Uptime Check"
timeout = "10s"
http_check {
path = "/"
port = "443"
use_ssl = true
validate_ssl = true
}
monitored_resource {
type = "uptime_url"
labels = {
project_id = var.project_id
host = "example.com"
}
}
period = "300s"
}
Dashboard Configuration
resource "google_monitoring_dashboard" "dashboard" {
dashboard_json = jsonencode({
displayName = "Terraform Managed Dashboard"
gridLayout = {
widgets = [
{
title = "CPU Usage"
xyChart = {
dataSets = [{
timeSeriesQuery = {
timeSeriesFilter = {
filter = "metric.type=\"compute.googleapis.com/instance/cpu/utilization\""
}
}
}]
}
},
{
title = "Memory Usage"
xyChart = {
dataSets = [{
timeSeriesQuery = {
timeSeriesFilter = {
filter = "metric.type=\"compute.googleapis.com/instance/memory/utilization\""
}
}
}]
}
}
]
}
})
}
Custom Metric Configuration
resource "google_monitoring_metric_descriptor" "custom" {
description = "Daily user transactions"
display_name = "Daily Transactions"
type = "custom.googleapis.com/daily_transactions"
metric_kind = "GAUGE"
value_type = "INT64"
unit = "1"
labels {
key = "environment"
value_type = "STRING"
description = "Environment (production, staging)"
}
}
Outputs
output "workspace_name" {
value = google_monitoring_workspace.workspace.name
description = "The name of the monitoring workspace"
}
output "alert_policy_name" {
value = google_monitoring_alert_policy.alert_policy.name
description = "The name of the alert policy"
}
output "notification_channel_id" {
value = google_monitoring_notification_channel.email.name
description = "The ID of the notification channel"
}
Best Practices
-
Monitoring Setup:
- Define clear metrics
- Set appropriate thresholds
- Configure proper intervals
- Use meaningful labels
-
Alerting:
- Avoid alert fatigue
- Set proper thresholds
- Use appropriate channels
- Document alerts
-
Dashboard Design:
- Group related metrics
- Use appropriate visualizations
- Keep it simple
- Regular updates
-
Cost Optimization:
- Monitor API usage
- Clean up unused resources
- Optimize retention
- Regular review
Common Operations
Creating Resources
terraform init
terraform plan
terraform apply
Testing Alerts
# Simulate high CPU usage
stress --cpu 8 --timeout 300
Best Practices and Tips
-
Metric Management:
- Define clear SLOs
- Use appropriate metrics
- Regular review
- Document changes
-
Alert Management:
- Avoid noise
- Define escalations
- Regular testing
- Document procedures
-
Operations:
- Monitor costs
- Track usage
- Regular maintenance
- Update documentation
Conclusion
You’ve learned how to set up and manage Google Cloud Monitoring using Terraform. This setup provides:
- Comprehensive monitoring
- Automated alerting
- Custom metrics and dashboards
- Best practices implementation