Managing Google Kubernetes Engine (GKE) with Terraform

Managing Google Kubernetes Engine (GKE) with Terraform is a great choice for infrastructure as code (IaC) enthusiasts. Terraform allows you to define and manage your GKE clusters, node pools, and other resources in a declarative way. Here’s a basic guide to get you started

Video Tutorial

Learn more about managing Google Kubernetes Engine with Terraform in this comprehensive video tutorial:

View Source Code

Prerequisites

  • Google Cloud SDK installed and configured
  • Terraform installed (version 1.0.0 or later)
  • Basic understanding of Kubernetes concepts
  • A GCP project with billing enabled

Project Structure

gke-terraform
|-- README.md
|-- environments
|   |-- dev
|   |   |-- README.md
|   |   |-- main.tf
|   |   |-- providers.tf
|   |   |-- variables.tf
|   |   `-- versions.tf
|   `-- stage
`-- modules
    `-- gke
        |-- README.md
        |-- main.tf
        |-- outputs.tf
        `-- variables.tf

Step 1: Initialize Terraform and Configure GCP GKE Provider

Provider Configuration

terraform {
  required_version = ">= 1.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = ">= 4.0"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = ">= 4.0"
    }
  }
}

provider "google" {
  project = var.project_id  # The GCP project ID
  region  = var.region     # The region for resource deployment
}

Variables

variable "project_id" {
  description = "The ID of the GCP project"
  type        = string
  default = ""
}

variable "region" {
  description = "The region where resources will be deployed"
  type        = string
  default = "us-central1"
}

variable "cluster_name" {
  description = "The name of the GKE cluster"
  type        = string
  default = "dev-gke-cluster"
}

variable "network" {
  description = "The name of the VPC network"
  type        = string
  default = "dev-network"
}

variable "subnetwork" {
  description = "The name of the subnetwork"
  type        = string
  default = "dev-subnetwork"
}

variable "cluster_secondary_range_name" {
  description = "The name of the secondary range for pods"
  type        = string
  default = "pods-range"
}

variable "services_secondary_range_name" {
  description = "The name of the secondary range for services"
  type        = string
  default = "services-range"
}

variable "master_ipv4_cidr_block" {
  description = "The CIDR block for the master"
  type        = string
  default = "10.0.0.0/28"
}

variable "node_count" {
  description = "The number of nodes in the node pool"
  type        = number
  default = 3
} 

variable "machine_type" {
  description = "The machine type for the nodes"
  type        = string
  default = "e2-standard-2"
} 

variable "disk_size_gb" {
  description = "The disk size for the nodes"
  type        = number
  default = 100
}

variable "disk_type" {
  description = "The disk type for the nodes"
  type        = string
  default = "pd-standard"
}

variable "node_labels" {
  description = "The labels for the nodes"
  type        = map(string)
  default = {
    "env" = "dev"
    "team" = "devops"
  }
} 

variable "node_tags" {
  description = "The tags for the nodes"
  type        = list(string)
  default = ["gke-node", "production"]
}

variable "maintenance_start_time" {
  description = "The start time for the maintenance window"
  type        = string
  default = "2025-01-01T00:00:00Z"
}

variable "maintenance_end_time" {
  description = "The end time for the maintenance window"
  type        = string
  default = "2026-01-01T00:00:00Z"
}

variable "maintenance_recurrence" {
  description = "The recurrence for the maintenance window"
  type        = string
  default = "FREQ=WEEKLY;BYDAY=SA,SU"
}

variable "node_metadata" {
  description = "The metadata for the nodes"
  type        = map(string)
  default = {
    "disable-legacy-endpoints" = "true"
  }
}
variable "master_authorized_networks" {
  description = "The authorized networks for the master"
  type        = list(map(string))
  default = [
    {
      cidr_block   = "0.0.0.0/0"
      display_name = "all"
    }
  ]
}

GKE Cluster

resource "google_container_cluster" "primary" {
  name     = var.cluster_name
  location = var.region

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1

  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.subnet.name

  ip_allocation_policy {
    cluster_secondary_range_name  = "pod-ranges"
    services_secondary_range_name = "services-range"
  }

  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = "0.0.0.0/0"
      display_name = "All"
    }
  }
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "${google_container_cluster.primary.name}-node-pool"
  location   = var.region
  cluster    = google_container_cluster.primary.name
  node_count = var.node_count

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/service.management.readonly",
      "https://www.googleapis.com/auth/servicecontrol",
      "https://www.googleapis.com/auth/trace.append",
    ]

    labels = {
      env = "production"
    }

    machine_type = var.machine_type
    tags         = ["gke-node"]

    metadata = {
      disable-legacy-endpoints = "true"
    }
  }
}

Outputs

output "kubernetes_cluster_name" {
  value       = google_container_cluster.primary.name
  description = "GKE Cluster Name"
}

output "kubernetes_cluster_host" {
  value       = google_container_cluster.primary.endpoint
  description = "GKE Cluster Host"
}

Best Practices

  1. Security:

    • Enable Workload Identity
    • Use Binary Authorization
    • Implement Network Policies
  2. Networking:

    • Use VPC-native clusters
    • Configure private clusters
    • Implement proper firewall rules
  3. Cost Optimization:

    • Use preemptible nodes when possible
    • Implement autoscaling
    • Right-size node pools
  4. Maintenance:

    • Enable auto-upgrades
    • Configure maintenance windows
    • Use node auto-repair

Common Operations

Creating the Cluster

terraform init
terraform plan
terraform apply

Getting Cluster Credentials

gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --region $(terraform output -raw region)

Destroying the Cluster

terraform destroy

Best Practices and Tips

  1. Cluster Management:

    • Use multiple node pools
    • Implement proper monitoring
    • Regular security audits
  2. Security:

    • Use Workload Identity
    • Enable network policies
    • Regular security updates
  3. Performance:

    • Configure autoscaling
    • Monitor resource usage
    • Use appropriate machine types

Conclusion

You’ve learned how to set up and manage Google Kubernetes Engine using Terraform. This setup provides:

  • Automated cluster deployment
  • Secure and scalable infrastructure
  • Best practices implementation
  • Easy cluster management and maintenance