12 – Setup Azure Monitoring And Alerting With Terraform (Hands-On Mini Project)

In this mini project we will build a real Azure VM → deploy a website → monitor it → get email alerts when something goes wrong — all using Terraform.

This is not just copy-paste infrastructure.
We will understand why each piece exists and what Azure is actually doing behind the scenes.

By the end you will know:

How Azure Monitor actually works 🧠
Difference between resource, metric, and action
How alerts really get triggered
How to simulate failures (CPU stress testing 🔥)

Architecture Overview
Step 1 — Networking Infrastructure
Step 2 — Create the Virtual Machine
Step 3 — Deploy Website Automatically (Remote-Exec)
Step 4 — Create Notification Channel (Action Group)
Step 5 — CPU Alert (High Usage)
Step 6 — Memory Alert
⚠️ Important Learning (Real-World Insight)
Final Result
What You Learned
Final Thoughts

Architecture Overview

We will build:

Component	Purpose
Resource Group	Container for everything
VNet + Subnet	Network for VM
NSG	Firewall rules
Public IP	Internet access
Linux VM	Runs website
Nginx	Sample application
Action Group	Notification channel
Metric Alerts	Detect problems

Flow:

Problem happens → Azure detects metric → Alert rule triggers → Action group emails you 📧

Step 1 — Networking Infrastructure

We first create the foundation: network + firewall + IP + NIC

Resource Group

resource "azurerm_resource_group" "rg" {
  name     = "rgminipro090212"
  location = "Central US"
}

Virtual Network & Subnet

resource "azurerm_virtual_network" "vnet" {
  name                = "vnetminipro12212"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
}

resource "azurerm_subnet" "subnet" {
  name                 = "subnetminipro12100339"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = ["10.0.2.0/24"]
}

Network Security Group (Firewall)

We allow:

SSH (22) → remote login
HTTP (80) → website access

resource "azurerm_network_security_group" "nsg" {
  name                = "nsgminipro98922"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  security_rule {
    name                       = "SSH"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "22"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "HTTP"
    priority                   = 110
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "80"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

Public IP + Network Interface

resource "azurerm_public_ip" "pip" {
  name                = "pipminipro1212909"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Static"
}

resource "azurerm_network_interface" "nic" {
  name                = "nicminipro90909111"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = azurerm_subnet.subnet.id
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = azurerm_public_ip.pip.id
  }
}

resource "azurerm_network_interface_security_group_association" "assoc" {
  network_interface_id      = azurerm_network_interface.nic.id
  network_security_group_id = azurerm_network_security_group.nsg.id
}

❓ Important Concept — Where is NSG applied?

You attached NSG to NIC, not subnet.

How to verify in portal:

Where to check	What you see
NIC → Networking	NSG attached
NSG → Subnets	Empty

Why?

Azure firewall works at 2 levels:

Level	Scope
Subnet NSG	Applies to all VMs
NIC NSG	Applies to single VM

We used NIC because this project has only one VM.

Step 2 — Create the Virtual Machine

resource "azurerm_linux_virtual_machine" "vm" {
  name                = "vmminipro343900"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  size                = "Standard_D2s_v3"
  network_interface_ids = [azurerm_network_interface.nic.id]

  admin_username = "azureuser"

  admin_ssh_key {
    username   = "azureuser"
    public_key = file("C:/Alan/MyWork/linuxvms/mykeys/key1.pub")
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "18.04-LTS"
    version   = "latest"
  }
}

SSH Into the VM

Fix Windows SSH key permission:

icacls <key> /inheritance:r
icacls <key> /grant:r "$($env:USERNAME):(R)"
icacls <key> /remove "Authenticated Users" "BUILTIN\Users" "Everyone"

ssh -i <key> azureuser@<public-ip>

✔ VM verified working

Step 3 — Deploy Website Automatically (Remote-Exec)

Now Terraform becomes powerful 💥
We configure the server automatically.

provisioner "remote-exec" {

  inline = [

    "echo waiting for cloud-init...",
    "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 2; done",

    "sudo apt-get update -y",
    "sudo apt-get install -y nginx",

    "echo '<h1>Terraform Monitoring Lab Working</h1>' | sudo tee /var/www/html/index.html",

    "sudo systemctl restart nginx",
    "sudo systemctl enable nginx"
  ]

  connection {
      type        = "ssh"
      user        = "azureuser"
      private_key = file("C:/Alan/MyWork/linuxvms/mykeys/key1")
      host        = azurerm_public_ip.pip.ip_address
  }
}

⚠️ Important learning:

Provisioners run only during resource creation
So we had to destroy and apply again

Now open browser:

http://<public-ip>

Website works 🎉

Step 4 — Create Notification Channel (Action Group)

We tell Azure:

When something breaks → email me

resource "azurerm_monitor_action_group" "ag" {
  name                = "agminipro9090"
  resource_group_name = azurerm_resource_group.rg.name
  short_name          = "alerts"

  email_receiver {
    name          = "sendtoadmin"
    email_address = "alankseb@gmail.com"
  }
}

Verify in portal:

Azure Monitor → Alerts → Action Groups

Step 5 — CPU Alert (High Usage)

Now we create the actual monitoring rule.

resource "azurerm_monitor_metric_alert" "cpu_alert" {
  name                = "highcpualertminipro990922"
  resource_group_name = azurerm_resource_group.rg.name
  scopes              = [azurerm_linux_virtual_machine.vm.id]
  description         = "Alert when CPU usage is greater than 60%"

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachines"
    metric_name      = "Percentage CPU"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 60
  }

  action {
    action_group_id = azurerm_monitor_action_group.ag.id
  }
}

Test the Alert 🔥

SSH into VM:

sudo apt-get install stress -y
stress --cpu 6 --timeout 300

Wait 5 minutes…

📧 You receive email:

Azure Monitor alert triggered

Congratulations — you built real monitoring.

Step 6 — Memory Alert

We add another rule:

resource "azurerm_monitor_metric_alert" "disk_alert" {
  name                = "lowdiskalertminipro9090223333"
  resource_group_name = azurerm_resource_group.rg.name
  scopes              = [azurerm_linux_virtual_machine.vm.id]
  description         = "Alert when disk free space is less than 20%"

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachines"
    metric_name      = "Available Memory Bytes"
    aggregation      = "Average"
    operator         = "LessThan"
    threshold        = 50
  }

  action {
    action_group_id = azurerm_monitor_action_group.ag.id
  }
}

⚠️ Important Learning (Real-World Insight)

During testing I discovered:

Azure VM metrics do NOT expose actual disk usage by default.

This alert monitors memory (RAM), not filesystem disk space.

Real disk monitoring requires:

Azure Monitor Agent
Log Analytics
Log-based alerts

This was one of the biggest learning moments in this project 🧠

Final Result

We built a complete monitoring pipeline:

Event	What Happens
CPU spike	Azure detects
Alert rule fires	Action group triggered
Email sent	Admin notified

What You Learned

✔ Terraform provisioning
✔ Remote configuration
✔ Azure networking
✔ Monitoring architecture
✔ Real difference between metric vs log alerts

Final Thoughts

This project transforms Terraform from:

“tool that creates resources”

into

“tool that builds reliable production systems”

Because infrastructure without monitoring is just waiting to fail.

Happy learning 🚀

TechMilestoneHub