In this mini project we will build a real Azure VM → deploy a website → monitor it → get email alerts when something goes wrong — all using Terraform.
This is not just copy-paste infrastructure.
We will understand why each piece exists and what Azure is actually doing behind the scenes.
By the end you will know:
- How Azure Monitor actually works 🧠
- Difference between resource, metric, and action
- How alerts really get triggered
- How to simulate failures (CPU stress testing 🔥)
Table of Contents
- Architecture Overview
- Step 1 — Networking Infrastructure
- Step 2 — Create the Virtual Machine
- Step 3 — Deploy Website Automatically (Remote-Exec)
- Step 4 — Create Notification Channel (Action Group)
- Step 5 — CPU Alert (High Usage)
- Step 6 — Memory Alert
- ⚠️ Important Learning (Real-World Insight)
- Final Result
- What You Learned
- Final Thoughts
Architecture Overview
We will build:
| Component | Purpose |
|---|---|
| Resource Group | Container for everything |
| VNet + Subnet | Network for VM |
| NSG | Firewall rules |
| Public IP | Internet access |
| Linux VM | Runs website |
| Nginx | Sample application |
| Action Group | Notification channel |
| Metric Alerts | Detect problems |
Flow:
Problem happens → Azure detects metric → Alert rule triggers → Action group emails you 📧
Step 1 — Networking Infrastructure
We first create the foundation: network + firewall + IP + NIC
Resource Group
resource "azurerm_resource_group" "rg" {
name = "rgminipro090212"
location = "Central US"
}
Virtual Network & Subnet
resource "azurerm_virtual_network" "vnet" {
name = "vnetminipro12212"
address_space = ["10.0.0.0/16"]
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
}
resource "azurerm_subnet" "subnet" {
name = "subnetminipro12100339"
resource_group_name = azurerm_resource_group.rg.name
virtual_network_name = azurerm_virtual_network.vnet.name
address_prefixes = ["10.0.2.0/24"]
}
Network Security Group (Firewall)
We allow:
- SSH (22) → remote login
- HTTP (80) → website access
resource "azurerm_network_security_group" "nsg" {
name = "nsgminipro98922"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
security_rule {
name = "SSH"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "22"
source_address_prefix = "*"
destination_address_prefix = "*"
}
security_rule {
name = "HTTP"
priority = 110
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "80"
source_address_prefix = "*"
destination_address_prefix = "*"
}
}
Public IP + Network Interface
resource "azurerm_public_ip" "pip" {
name = "pipminipro1212909"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
allocation_method = "Static"
}
resource "azurerm_network_interface" "nic" {
name = "nicminipro90909111"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
ip_configuration {
name = "internal"
subnet_id = azurerm_subnet.subnet.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.pip.id
}
}
resource "azurerm_network_interface_security_group_association" "assoc" {
network_interface_id = azurerm_network_interface.nic.id
network_security_group_id = azurerm_network_security_group.nsg.id
}
❓ Important Concept — Where is NSG applied?
You attached NSG to NIC, not subnet.
How to verify in portal:
| Where to check | What you see |
|---|---|
| NIC → Networking | NSG attached |
| NSG → Subnets | Empty |
Why?
Azure firewall works at 2 levels:
| Level | Scope |
|---|---|
| Subnet NSG | Applies to all VMs |
| NIC NSG | Applies to single VM |
We used NIC because this project has only one VM.
Step 2 — Create the Virtual Machine
resource "azurerm_linux_virtual_machine" "vm" {
name = "vmminipro343900"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
size = "Standard_D2s_v3"
network_interface_ids = [azurerm_network_interface.nic.id]
admin_username = "azureuser"
admin_ssh_key {
username = "azureuser"
public_key = file("C:/Alan/MyWork/linuxvms/mykeys/key1.pub")
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "UbuntuServer"
sku = "18.04-LTS"
version = "latest"
}
}
SSH Into the VM
Fix Windows SSH key permission:
icacls <key> /inheritance:r
icacls <key> /grant:r "$($env:USERNAME):(R)"
icacls <key> /remove "Authenticated Users" "BUILTIN\Users" "Everyone"
Login:
ssh -i <key> azureuser@<public-ip>
✔ VM verified working
Step 3 — Deploy Website Automatically (Remote-Exec)
Now Terraform becomes powerful 💥
We configure the server automatically.
provisioner "remote-exec" {
inline = [
"echo waiting for cloud-init...",
"while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 2; done",
"sudo apt-get update -y",
"sudo apt-get install -y nginx",
"echo '<h1>Terraform Monitoring Lab Working</h1>' | sudo tee /var/www/html/index.html",
"sudo systemctl restart nginx",
"sudo systemctl enable nginx"
]
connection {
type = "ssh"
user = "azureuser"
private_key = file("C:/Alan/MyWork/linuxvms/mykeys/key1")
host = azurerm_public_ip.pip.ip_address
}
}
⚠️ Important learning:
Provisioners run only during resource creation
So we had to destroy and apply again
Now open browser:
http://<public-ip>
Website works 🎉
Step 4 — Create Notification Channel (Action Group)
We tell Azure:
When something breaks → email me
resource "azurerm_monitor_action_group" "ag" {
name = "agminipro9090"
resource_group_name = azurerm_resource_group.rg.name
short_name = "alerts"
email_receiver {
name = "sendtoadmin"
email_address = "alankseb@gmail.com"
}
}
Verify in portal:
Azure Monitor → Alerts → Action Groups
Step 5 — CPU Alert (High Usage)
Now we create the actual monitoring rule.
resource "azurerm_monitor_metric_alert" "cpu_alert" {
name = "highcpualertminipro990922"
resource_group_name = azurerm_resource_group.rg.name
scopes = [azurerm_linux_virtual_machine.vm.id]
description = "Alert when CPU usage is greater than 60%"
criteria {
metric_namespace = "Microsoft.Compute/virtualMachines"
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 60
}
action {
action_group_id = azurerm_monitor_action_group.ag.id
}
}
Test the Alert 🔥
SSH into VM:
sudo apt-get install stress -y
stress --cpu 6 --timeout 300
Wait 5 minutes…
📧 You receive email:
Azure Monitor alert triggered
Congratulations — you built real monitoring.
Step 6 — Memory Alert
We add another rule:
resource "azurerm_monitor_metric_alert" "disk_alert" {
name = "lowdiskalertminipro9090223333"
resource_group_name = azurerm_resource_group.rg.name
scopes = [azurerm_linux_virtual_machine.vm.id]
description = "Alert when disk free space is less than 20%"
criteria {
metric_namespace = "Microsoft.Compute/virtualMachines"
metric_name = "Available Memory Bytes"
aggregation = "Average"
operator = "LessThan"
threshold = 50
}
action {
action_group_id = azurerm_monitor_action_group.ag.id
}
}
⚠️ Important Learning (Real-World Insight)
During testing I discovered:
Azure VM metrics do NOT expose actual disk usage by default.
This alert monitors memory (RAM), not filesystem disk space.
Real disk monitoring requires:
- Azure Monitor Agent
- Log Analytics
- Log-based alerts
This was one of the biggest learning moments in this project 🧠
Final Result
We built a complete monitoring pipeline:
| Event | What Happens |
|---|---|
| CPU spike | Azure detects |
| Alert rule fires | Action group triggered |
| Email sent | Admin notified |
What You Learned
✔ Terraform provisioning
✔ Remote configuration
✔ Azure networking
✔ Monitoring architecture
✔ Real difference between metric vs log alerts
Final Thoughts
This project transforms Terraform from:
“tool that creates resources”
into
“tool that builds reliable production systems”
Because infrastructure without monitoring is just waiting to fail.
Happy learning 🚀

Leave a Reply