Managing Azure Synapse Analytics with Terraform

Learn how to deploy and manage Azure Synapse Analytics using Terraform

Managing Azure Synapse Analytics with Terraform

Azure Synapse Analytics is an enterprise analytics service that accelerates time to insight across data warehouses and big data systems. This guide shows you how to manage Synapse Analytics using Terraform.

Video Tutorial

Prerequisites

  • Azure subscription
  • Terraform installed
  • Azure CLI installed
  • Basic understanding of data warehousing concepts

Project Structure

.
├── main.tf                   # Main Terraform configuration file
├── variables.tf              # Variable definitions
├── outputs.tf               # Output definitions
├── terraform.tfvars         # Variable values
└── modules/
    └── synapse/
        ├── main.tf          # Synapse Analytics specific configurations
        ├── variables.tf      # Module variables
        ├── pools.tf         # SQL and Spark pool configurations
        └── outputs.tf       # Module outputs

Basic Configuration

Here’s a basic example of setting up Synapse Analytics:

resource "azurerm_resource_group" "synapse_rg" {
  name     = "synapse-resources"
  location = "eastus"
}

resource "azurerm_storage_account" "synapse_storage" {
  name                     = "synapsestorage"
  resource_group_name      = azurerm_resource_group.synapse_rg.name
  location                = azurerm_resource_group.synapse_rg.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  is_hns_enabled          = true
}

resource "azurerm_storage_data_lake_gen2_filesystem" "synapse_fs" {
  name               = "synapsefilesystem"
  storage_account_id = azurerm_storage_account.synapse_storage.id
}

resource "azurerm_synapse_workspace" "workspace" {
  name                                 = "synapse-workspace"
  resource_group_name                  = azurerm_resource_group.synapse_rg.name
  location                            = azurerm_resource_group.synapse_rg.location
  storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.synapse_fs.id
  sql_administrator_login              = "sqladmin"
  sql_administrator_login_password     = "P@ssw0rd1234!"

  identity {
    type = "SystemAssigned"
  }
}

Advanced Features

Dedicated SQL Pool

resource "azurerm_synapse_sql_pool" "sql_pool" {
  name                 = "sqlpool"
  synapse_workspace_id = azurerm_synapse_workspace.workspace.id
  sku_name            = "DW100c"
  create_mode         = "Default"
}

Apache Spark Pool

resource "azurerm_synapse_spark_pool" "spark_pool" {
  name                 = "sparkpool"
  synapse_workspace_id = azurerm_synapse_workspace.workspace.id
  node_size_family    = "MemoryOptimized"
  node_size           = "Small"

  auto_scale {
    max_node_count = 10
    min_node_count = 3
  }

  auto_pause {
    delay_in_minutes = 15
  }
}

Best Practices

  1. Use Infrastructure as Code for consistent deployments
  2. Implement proper monitoring and logging
  3. Use managed identities for enhanced security
  4. Configure auto-scaling appropriately
  5. Implement proper backup and disaster recovery

Security Considerations

Network Security

Configure private endpoints:

resource "azurerm_virtual_network" "synapse_vnet" {
  name                = "synapse-vnet"
  resource_group_name = azurerm_resource_group.synapse_rg.name
  location           = azurerm_resource_group.synapse_rg.location
  address_space      = ["10.0.0.0/16"]
}

resource "azurerm_subnet" "synapse_subnet" {
  name                                           = "synapse-subnet"
  resource_group_name                            = azurerm_resource_group.synapse_rg.name
  virtual_network_name                           = azurerm_virtual_network.synapse_vnet.name
  address_prefixes                               = ["10.0.1.0/24"]
  enforce_private_link_endpoint_network_policies = true
}

resource "azurerm_private_endpoint" "synapse_pe" {
  name                = "synapse-pe"
  location           = azurerm_resource_group.synapse_rg.location
  resource_group_name = azurerm_resource_group.synapse_rg.name
  subnet_id          = azurerm_subnet.synapse_subnet.id

  private_service_connection {
    name                           = "synapse-privateserviceconnection"
    private_connection_resource_id = azurerm_synapse_workspace.workspace.id
    subresource_names             = ["sql"]
    is_manual_connection          = false
  }
}

Firewall Rules

resource "azurerm_synapse_firewall_rule" "allow_azure" {
  name                 = "AllowAzure"
  synapse_workspace_id = azurerm_synapse_workspace.workspace.id
  start_ip_address    = "0.0.0.0"
  end_ip_address      = "0.0.0.0"
}

resource "azurerm_synapse_firewall_rule" "allow_client" {
  name                 = "AllowClient"
  synapse_workspace_id = azurerm_synapse_workspace.workspace.id
  start_ip_address    = "203.0.113.0"
  end_ip_address      = "203.0.113.255"
}

Monitoring and Logging

Configure diagnostics settings:

resource "azurerm_monitor_diagnostic_setting" "synapse_diagnostics" {
  name                       = "synapse-diagnostics"
  target_resource_id        = azurerm_synapse_workspace.workspace.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.workspace.id

  log {
    category = "SQLSecurityAuditEvents"
    enabled  = true
  }

  log {
    category = "SynapseRbacOperations"
    enabled  = true
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Cost Management

Configure budgets and alerts:

resource "azurerm_consumption_budget_resource_group" "synapse_budget" {
  name              = "synapse-budget"
  resource_group_id = azurerm_resource_group.synapse_rg.id

  amount     = 5000
  time_grain = "Monthly"

  notification {
    enabled   = true
    threshold = 90.0
    operator  = "GreaterThan"

    contact_emails = [
      "admin@example.com"
    ]
  }
}

Integration Examples

Integration with Azure Data Factory:

resource "azurerm_data_factory" "adf" {
  name                = "synapse-adf"
  location           = azurerm_resource_group.synapse_rg.location
  resource_group_name = azurerm_resource_group.synapse_rg.name
}

resource "azurerm_data_factory_linked_service_azure_synapse_analytics" "linked_synapse" {
  name                = "linked-synapse"
  data_factory_id    = azurerm_data_factory.adf.id
  connection_string  = azurerm_synapse_workspace.workspace.connectivity_endpoints["sqlOnDemand"]
}

Integration with Power BI:

resource "azurerm_synapse_workspace_aad_admin" "workspace_admin" {
  synapse_workspace_id = azurerm_synapse_workspace.workspace.id
  login               = "AzureAD Admin"
  object_id          = data.azurerm_client_config.current.object_id
  tenant_id          = data.azurerm_client_config.current.tenant_id
}

Conclusion

Azure Synapse Analytics with Terraform provides powerful data warehousing and analytics capabilities that can be managed through Infrastructure as Code. By following these best practices and configurations, you can create secure and scalable analytics solutions in Azure.