Managing Azure Machine Learning with Terraform
Learn how to deploy and manage Azure Machine Learning workspaces using Terraform
Managing Azure Machine Learning with Terraform
Azure Machine Learning provides a cloud-based environment for training, deploying, and managing ML models. This guide shows you how to manage Azure Machine Learning resources using Terraform.
Video Tutorial
Prerequisites
- Azure subscription
- Terraform installed
- Azure CLI installed
- Basic understanding of Machine Learning concepts
Project Structure
.
├── main.tf # Main Terraform configuration file
├── variables.tf # Variable definitions
├── outputs.tf # Output definitions
├── terraform.tfvars # Variable values
└── modules/
└── machine-learning/
├── main.tf # Machine Learning specific configurations
├── variables.tf # Module variables
├── compute.tf # Compute cluster configurations
└── outputs.tf # Module outputs
Basic Configuration
Here’s a basic example of setting up an Azure Machine Learning workspace:
resource "azurerm_resource_group" "ml_rg" {
name = "ml-resources"
location = "eastus"
}
resource "azurerm_application_insights" "ml_insights" {
name = "ml-insights"
location = azurerm_resource_group.ml_rg.location
resource_group_name = azurerm_resource_group.ml_rg.name
application_type = "web"
}
resource "azurerm_key_vault" "ml_vault" {
name = "mlkeyvault"
location = azurerm_resource_group.ml_rg.location
resource_group_name = azurerm_resource_group.ml_rg.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
}
resource "azurerm_storage_account" "ml_storage" {
name = "mlstorage"
location = azurerm_resource_group.ml_rg.location
resource_group_name = azurerm_resource_group.ml_rg.name
account_tier = "Standard"
account_replication_type = "LRS"
}
resource "azurerm_machine_learning_workspace" "ml_workspace" {
name = "mlworkspace"
location = azurerm_resource_group.ml_rg.location
resource_group_name = azurerm_resource_group.ml_rg.name
application_insights_id = azurerm_application_insights.ml_insights.id
key_vault_id = azurerm_key_vault.ml_vault.id
storage_account_id = azurerm_storage_account.ml_storage.id
identity {
type = "SystemAssigned"
}
}
Advanced Features
Compute Clusters
Create compute clusters for training:
resource "azurerm_machine_learning_compute_cluster" "compute_cluster" {
name = "cpu-cluster"
location = azurerm_resource_group.ml_rg.location
machine_learning_workspace_id = azurerm_machine_learning_workspace.ml_workspace.id
vm_priority = "LowPriority"
vm_size = "Standard_DS2_v2"
scale_settings {
min_node_count = 0
max_node_count = 4
scale_down_nodes_after_idle_duration = "PT30M"
}
identity {
type = "SystemAssigned"
}
}
Compute Instances
Create development compute instances:
resource "azurerm_machine_learning_compute_instance" "compute_instance" {
name = "dev-instance"
location = azurerm_resource_group.ml_rg.location
machine_learning_workspace_id = azurerm_machine_learning_workspace.ml_workspace.id
virtual_machine_size = "Standard_DS3_v2"
identity {
type = "SystemAssigned"
}
}
Best Practices
- Use Infrastructure as Code for consistent deployments
- Implement proper monitoring and logging
- Use managed identities for enhanced security
- Configure auto-scaling appropriately
- Implement proper backup and disaster recovery
Security Considerations
- Use Azure Key Vault for secrets management
- Implement network isolation using private endpoints
- Use managed identities instead of service principals
- Enable Azure Monitor for monitoring and alerting
- Regularly audit access and permissions
Network Configuration
Configure private endpoints for enhanced security:
resource "azurerm_virtual_network" "ml_vnet" {
name = "ml-vnet"
resource_group_name = azurerm_resource_group.ml_rg.name
location = azurerm_resource_group.ml_rg.location
address_space = ["10.0.0.0/16"]
}
resource "azurerm_subnet" "ml_subnet" {
name = "ml-subnet"
resource_group_name = azurerm_resource_group.ml_rg.name
virtual_network_name = azurerm_virtual_network.ml_vnet.name
address_prefixes = ["10.0.1.0/24"]
enforce_private_link_endpoint_network_policies = true
}
resource "azurerm_private_endpoint" "ml_pe" {
name = "ml-pe"
location = azurerm_resource_group.ml_rg.location
resource_group_name = azurerm_resource_group.ml_rg.name
subnet_id = azurerm_subnet.ml_subnet.id
private_service_connection {
name = "ml-privateserviceconnection"
private_connection_resource_id = azurerm_machine_learning_workspace.ml_workspace.id
subresource_names = ["amlworkspace"]
is_manual_connection = false
}
}
Monitoring and Logging
Configure diagnostics settings:
resource "azurerm_monitor_diagnostic_setting" "ml_diagnostics" {
name = "ml-diagnostics"
target_resource_id = azurerm_machine_learning_workspace.ml_workspace.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.workspace.id
log {
category = "AmlComputeClusterEvent"
enabled = true
}
log {
category = "AmlComputeClusterNodeEvent"
enabled = true
}
metric {
category = "AllMetrics"
enabled = true
}
}
Cost Management
Implement cost management features:
resource "azurerm_monitor_action_group" "cost_alert" {
name = "cost-alert"
resource_group_name = azurerm_resource_group.ml_rg.name
short_name = "cost"
email_receiver {
name = "admin"
email_address = "admin@example.com"
}
}
resource "azurerm_consumption_budget_resource_group" "ml_budget" {
name = "ml-budget"
resource_group_id = azurerm_resource_group.ml_rg.id
amount = 1000
time_grain = "Monthly"
notification {
enabled = true
threshold = 90.0
operator = "GreaterThan"
contact_emails = [
"admin@example.com"
]
}
}
Conclusion
Azure Machine Learning with Terraform provides a powerful way to manage ML resources in Azure. By following these best practices and configurations, you can create secure and scalable machine learning environments in the cloud.