12 – Setup Azure Monitoring And Alerting With Terraform (Hands-On Mini Project)

In this mini project we will build a real Azure VM → deploy a website → monitor it → get email alerts when something goes wrong — all using Terraform.

This is not just copy-paste infrastructure.
We will understand why each piece exists and what Azure is actually doing behind the scenes.

By the end you will know:

  • How Azure Monitor actually works 🧠
  • Difference between resource, metric, and action
  • How alerts really get triggered
  • How to simulate failures (CPU stress testing 🔥)

Table of Contents

  1. Architecture Overview
  2. Step 1 — Networking Infrastructure
  3. Step 2 — Create the Virtual Machine
  4. Step 3 — Deploy Website Automatically (Remote-Exec)
  5. Step 4 — Create Notification Channel (Action Group)
  6. Step 5 — CPU Alert (High Usage)
  7. Step 6 — Memory Alert
  8. ⚠️ Important Learning (Real-World Insight)
  9. Final Result
  10. What You Learned
  11. Final Thoughts

Architecture Overview

We will build:

ComponentPurpose
Resource GroupContainer for everything
VNet + SubnetNetwork for VM
NSGFirewall rules
Public IPInternet access
Linux VMRuns website
NginxSample application
Action GroupNotification channel
Metric AlertsDetect problems

Flow:

Problem happens → Azure detects metric → Alert rule triggers → Action group emails you 📧


Step 1 — Networking Infrastructure

We first create the foundation: network + firewall + IP + NIC

Resource Group

resource "azurerm_resource_group" "rg" {
  name     = "rgminipro090212"
  location = "Central US"
}

Virtual Network & Subnet

resource "azurerm_virtual_network" "vnet" {
  name                = "vnetminipro12212"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
}

resource "azurerm_subnet" "subnet" {
  name                 = "subnetminipro12100339"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = ["10.0.2.0/24"]
}

Network Security Group (Firewall)

We allow:

  • SSH (22) → remote login
  • HTTP (80) → website access
resource "azurerm_network_security_group" "nsg" {
  name                = "nsgminipro98922"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  security_rule {
    name                       = "SSH"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "22"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "HTTP"
    priority                   = 110
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "80"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

Public IP + Network Interface

resource "azurerm_public_ip" "pip" {
  name                = "pipminipro1212909"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Static"
}

resource "azurerm_network_interface" "nic" {
  name                = "nicminipro90909111"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = azurerm_subnet.subnet.id
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = azurerm_public_ip.pip.id
  }
}

resource "azurerm_network_interface_security_group_association" "assoc" {
  network_interface_id      = azurerm_network_interface.nic.id
  network_security_group_id = azurerm_network_security_group.nsg.id
}

❓ Important Concept — Where is NSG applied?

You attached NSG to NIC, not subnet.

How to verify in portal:

Where to checkWhat you see
NIC → NetworkingNSG attached
NSG → SubnetsEmpty

Why?

Azure firewall works at 2 levels:

LevelScope
Subnet NSGApplies to all VMs
NIC NSGApplies to single VM

We used NIC because this project has only one VM.


Step 2 — Create the Virtual Machine

resource "azurerm_linux_virtual_machine" "vm" {
  name                = "vmminipro343900"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  size                = "Standard_D2s_v3"
  network_interface_ids = [azurerm_network_interface.nic.id]

  admin_username = "azureuser"

  admin_ssh_key {
    username   = "azureuser"
    public_key = file("C:/Alan/MyWork/linuxvms/mykeys/key1.pub")
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "18.04-LTS"
    version   = "latest"
  }
}

SSH Into the VM

Fix Windows SSH key permission:

icacls <key> /inheritance:r
icacls <key> /grant:r "$($env:USERNAME):(R)"
icacls <key> /remove "Authenticated Users" "BUILTIN\Users" "Everyone"

Login:

ssh -i <key> azureuser@<public-ip>

✔ VM verified working


Step 3 — Deploy Website Automatically (Remote-Exec)

Now Terraform becomes powerful 💥
We configure the server automatically.

provisioner "remote-exec" {

  inline = [

    "echo waiting for cloud-init...",
    "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 2; done",

    "sudo apt-get update -y",
    "sudo apt-get install -y nginx",

    "echo '<h1>Terraform Monitoring Lab Working</h1>' | sudo tee /var/www/html/index.html",

    "sudo systemctl restart nginx",
    "sudo systemctl enable nginx"
  ]

  connection {
      type        = "ssh"
      user        = "azureuser"
      private_key = file("C:/Alan/MyWork/linuxvms/mykeys/key1")
      host        = azurerm_public_ip.pip.ip_address
  }
}

⚠️ Important learning:

Provisioners run only during resource creation
So we had to destroy and apply again

Now open browser:

http://<public-ip>

Website works 🎉


Step 4 — Create Notification Channel (Action Group)

We tell Azure:

When something breaks → email me

resource "azurerm_monitor_action_group" "ag" {
  name                = "agminipro9090"
  resource_group_name = azurerm_resource_group.rg.name
  short_name          = "alerts"

  email_receiver {
    name          = "sendtoadmin"
    email_address = "alankseb@gmail.com"
  }
}

Verify in portal:

Azure Monitor → Alerts → Action Groups


Step 5 — CPU Alert (High Usage)

Now we create the actual monitoring rule.

resource "azurerm_monitor_metric_alert" "cpu_alert" {
  name                = "highcpualertminipro990922"
  resource_group_name = azurerm_resource_group.rg.name
  scopes              = [azurerm_linux_virtual_machine.vm.id]
  description         = "Alert when CPU usage is greater than 60%"

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachines"
    metric_name      = "Percentage CPU"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 60
  }

  action {
    action_group_id = azurerm_monitor_action_group.ag.id
  }
}

Test the Alert 🔥

SSH into VM:

sudo apt-get install stress -y
stress --cpu 6 --timeout 300

Wait 5 minutes…

📧 You receive email:

Azure Monitor alert triggered

Congratulations — you built real monitoring.


Step 6 — Memory Alert

We add another rule:

resource "azurerm_monitor_metric_alert" "disk_alert" {
  name                = "lowdiskalertminipro9090223333"
  resource_group_name = azurerm_resource_group.rg.name
  scopes              = [azurerm_linux_virtual_machine.vm.id]
  description         = "Alert when disk free space is less than 20%"

  criteria {
    metric_namespace = "Microsoft.Compute/virtualMachines"
    metric_name      = "Available Memory Bytes"
    aggregation      = "Average"
    operator         = "LessThan"
    threshold        = 50
  }

  action {
    action_group_id = azurerm_monitor_action_group.ag.id
  }
}

⚠️ Important Learning (Real-World Insight)

During testing I discovered:

Azure VM metrics do NOT expose actual disk usage by default.

This alert monitors memory (RAM), not filesystem disk space.

Real disk monitoring requires:

  • Azure Monitor Agent
  • Log Analytics
  • Log-based alerts

This was one of the biggest learning moments in this project 🧠


Final Result

We built a complete monitoring pipeline:

EventWhat Happens
CPU spikeAzure detects
Alert rule firesAction group triggered
Email sentAdmin notified

What You Learned

✔ Terraform provisioning
✔ Remote configuration
✔ Azure networking
✔ Monitoring architecture
✔ Real difference between metric vs log alerts


Final Thoughts

This project transforms Terraform from:

“tool that creates resources”

into

“tool that builds reliable production systems”

Because infrastructure without monitoring is just waiting to fail.


Happy learning 🚀

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

TechMilestoneHub

Build Skills, Unlock Milestones

© 2025 TechMilestoneHub


The content on TechMilestoneHub is for educational purposes only and may not always reflect the latest official guidance. Tutorials, quizzes, and examples do not guarantee certification success or specific results. We are not affiliated with certification vendors unless stated. Some pages may contain affiliate links, which may earn us a commission at no extra cost to you. By using this site, you agree to use the information at your own risk. See our Disclaimer, Terms & Conditions, and Privacy Policy for details.