Cover image by Taylor Vick.


Intro

I recently came across an old module that I had developed on v0.11.7 which deploys Linux (Ubuntu) virtual machines on Azure, unfortunately not usable now as it requires a whole lot of refactoring. During the porting process I discovered some immaturity on my part in my Terraform code (lack of experience back then), and I wanted to add the use of Azure Managed Disks to my deployment.

Future-proofing the script means refactoring resources such as azurerm_virtual_machine, as per the docs:

Note: The azurerm_virtual_machine resource has been superseded by the azurerm_linux_virtual_machine and azurerm_windows_virtual_machine resources. The existing azurerm_virtual_machine resource will continue to be available throughout the 2.x releases however is in a feature-frozen state to maintain compatibility - new functionality will instead be added to the azurerm_linux_virtual_machine and azurerm_windows_virtual_machine resources.

Prerequisites

In this post we are using the following versions:

terraform {
  required_version = ">= 0.12.25"

  required_providers {
    azurerm = ">= 2.10.0"
  }
}

Deploying Multiple VMs with Multiple Data Disks

Using count was the obvious answer (at first!) because of its easy-to-use nature. I’m passing in a list of VMs for some fine grained control of what is deployed/destroyed. For example:

terraform.tfvars

resource_group_name   = "my-test-rg"
instances             = ["vm-test-1", "vm-test-2", "vm-test-3"]
nb_disks_per_instance = 2
tags = {
  environment = "test"
}

main.tf

resource "azurerm_linux_virtual_machine" "vm" {
  count                 = length(var.instances)
  name                  = element(var.instances, count.index)
  resource_group_name   = azurerm_resource_group.rg.name
  location              = azurerm_resource_group.rg.location
  size                  = "Standard_D2s_v3"
  network_interface_ids = [element(azurerm_network_interface.nic.*.id, count.index)]
  admin_username        = "adminuser"
  admin_password        = "Password1234!@"

  disable_password_authentication = false

  os_disk {
    name                 = "osdisk-${element(var.instances, count.index)}-${count.index}"
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "18.04-LTS"
    version   = "latest"
  }

  tags = var.tags
}

resource "azurerm_managed_disk" "managed_disk" {
  count                = length(var.instances) * var.nb_disks_per_instance
  name                 = element(var.instances, count.index)
  location             = azurerm_resource_group.rg.location
  resource_group_name  = azurerm_resource_group.rg.name
  storage_account_type = "Standard_LRS"
  create_option        = "Empty"
  disk_size_gb         = 10
  tags                 = var.tags
}

resource "azurerm_virtual_machine_data_disk_attachment" "managed_disk_attach" {
  count              = length(var.instances) * var.nb_disks_per_instance
  managed_disk_id    = azurerm_managed_disk.managed_disk.*.id[count.index]
  virtual_machine_id = azurerm_linux_virtual_machine.vm.*.id[ceil((count.index + 1) * 1.0 / var.nb_disks_per_instance) - 1]
  lun                = count.index + 10
  caching            = "ReadWrite"
}

3 VMs each with 2 data disks! Awesome right? WRONG!

Problems with count

What if I remove vm-test-2 from the list of instances?

  • Problem 1: Pretty much all of the disks have to be amended. With managed disks, this mean replacements are forced which is probably not very healthy.

  • Problem 2: Each managed disk attachment for each VM requires a unique LUN (Logical Unit Number) - although we did achieve this in the script above (6 data disks, with LUNs [10, 11, 12, 13, 14, 15]), any change to this number forces a new resource created. This is inefficient.

  • Problem 3: It doesn’t achieve what I want it to do.

A Better Solution - for_each

After endlessly searching for solutions, I mashed up a few and found a way using for_each, for and locals.

for_each and for were introduced in v0.12 of Terraform.

I’ve abstracted away the evolved resources as it’s pretty straight forward to do - all you need to do is remove the references to count and replace with for_each at the top of each resource and iterate over var.instances. As this is a list, you would need to use toset() to convert its type.

Introducing locals at the top of main.tf as follows:

locals {
  vm_datadiskdisk_count_map = { for k in toset(var.instances) : k => var.nb_disks_per_instance }
  luns                      = { for k in local.datadisk_lun_map : k.datadisk_name => k.lun }
  datadisk_lun_map = flatten([
    for vm_name, count in local.vm_datadiskdisk_count_map : [
      for i in range(count) : {
        datadisk_name = format("datadisk_%s_disk%02d", vm_name, i)
        lun           = i
      }
    ]
  ])
}

This is like a temporary store to evaluate expressions before being used in resources. For example, the expression for luns above is looping through the datadisk_lun_map, and creating a key / value pair for each key. Check the result using terraform console (you reference locals in resources as local.variable):

➜ terraform console
> local.luns
{
  "datadisk_vm-test-1_disk00" = 0
  "datadisk_vm-test-1_disk01" = 1
  "datadisk_vm-test-2_disk00" = 0
  "datadisk_vm-test-2_disk01" = 1
  "datadisk_vm-test-3_disk00" = 0
  "datadisk_vm-test-3_disk01" = 1
}

Now I can use these locals in my resource blocks - here are the managed_disk and managed_disk_attach blocks:

resource "azurerm_managed_disk" "managed_disk" {
  for_each             = toset([for j in local.datadisk_lun_map : j.datadisk_name])
  name                 = each.key
  location             = azurerm_resource_group.rg.location
  resource_group_name  = azurerm_resource_group.rg.name
  storage_account_type = "Standard_LRS"
  create_option        = "Empty"
  disk_size_gb         = 10
  tags                 = var.tags
}

resource "azurerm_virtual_machine_data_disk_attachment" "managed_disk_attach" {
  for_each           = toset([for j in local.datadisk_lun_map : j.datadisk_name])
  managed_disk_id    = azurerm_managed_disk.managed_disk[each.key].id
  virtual_machine_id = azurerm_linux_virtual_machine.vm[element(split("_", each.key), 1)].id
  lun                = lookup(local.luns, each.key)
  caching            = "ReadWrite"
}

Results

When removing vm-test-2 from instances, the plan shows that ONLY the resources related to this particular machine are destroyed. No unnecessary updates or forced replacements are carried out.

➜ terraform plan
azurerm_linux_virtual_machine.vm["vm-test-2"] will be destroyed
...
azurerm_managed_disk.managed_disk["datadisk_vm-test-2_disk00"] will be destroyed
...
azurerm_managed_disk.managed_disk["datadisk_vm-test-2_disk01"] will be destroyed
...
azurerm_network_interface.nic["vm-test-2"] will be destroyed
...
azurerm_virtual_machine_data_disk_attachment.managed_disk_attach["datadisk_vm-test-2_disk00"]
...
azurerm_virtual_machine_data_disk_attachment.managed_disk_attach["datadisk_vm-test-2_disk01"]
...

Plan: 0 to add, 0 to change, 6 to destroy.

Conclusion

The solution seems to work well and provides a relatively simple way to manage larger deployments that require multiple data disks per VM. I cannot confirm whether this is the best practices way of achieving this goal, I’d open to hearing feedback on how to improve it! :)