Upgrading Windows Server 2016 → 2022 on AWS Without Losing the MAC Address

If you've ever been asked to upgrade a production Windows Server on AWS, you know the real challenge isn't the OS upgrade itself — it's everything around it: keeping the IP, not breaking MAC-bound licensing, having a rollback path, and minimising downtime. This post walks through exactly how we handled all four constraints in a single automated workflow.

In this post

Background & constraints
The automation: AWSEC2-CloneInstanceAndUpgradeWindows
The issue we hit: SSM Agent offline
The hard part: keeping the MAC address
Lessons learned & what we'd do differently

Section 01

Background & Constraints

The ask came in with four hard constraints attached. The instance was Windows Server 2016, it needed to become 2022, and nothing about its network identity could change. Here's why each constraint mattered:

🔗

Same Private IP

Firewall rules, DNS entries, and app configs were all keyed to this address.

🔑

Same MAC Address

Application licensing was bound to the network interface MAC — change it and the app stops working.

⏰

Minimal Downtime

The instance served production traffic. Cutover needed to be surgical, not a maintenance window weekend.

🔄

Rollback Path

No upgrade is risk-free. We needed a clear way back if something went wrong post-cutover.

✅

Zero Data Loss

All volumes, all data, all configuration — untouched through the entire process.

The total runtime came in at approximately 2 hours end-to-end, including all validation steps. Plan your downtime window accordingly — especially if your instance has large EBS volumes.

Section 02

The Automation: AWSEC2-CloneInstanceAndUpgradeWindows

AWS ships a purpose-built SSM Automation document for exactly this scenario. Rather than manually orchestrating upgrade steps, we used the pre-built runbook directly from the Systems Manager Documents library.

AWSEC2-CloneInstanceAndUpgradeWindows document content in AWS Systems Manager

The SSM document AWSEC2-CloneInstanceAndUpgradeWindows — version 45 (default). Supports upgrades from Windows Server 2008R2 through to the 2025 target.

The document handles the heavy lifting end-to-end: cloning the source instance, spinning up a temporary worker, running the in-place upgrade, and producing a post-upgrade AMI you launch from. The running instance stays untouched throughout.

Input Parameters We Provided

Input parameters for the SSM automation execution

Execution input parameters — instance ID, IAM profile, subnet, and target Windows version set to 2022. All real IDs replaced with dummy values below.

📄 automation-input-parameters.json

{
  "InstanceId": "i-0abc1234def56789a",
  "IamInstanceProfile": "cloudwatch-agent-role",
  "SubnetId": "subnet-0aa1bb2cc3dd4ee5f",
  "TargetWindowVersion": "2022",
  "KeepPreUpgradeImageBackUp": false,
  "RebootInstanceBeforeTakingImage": false
}

IAM heads-up: The instance profile needs at minimum AmazonSSMManagedInstanceCore. If your EBS volumes are encrypted with a customer-managed KMS key, also grant kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey — missing these causes AMI creation to silently fail on volume snapshot.

Execution Flow

The automation runs as a chain of discrete, inspectable steps. Here's how the workflow progresses from trigger to post-upgrade AMI:

📷

Pre-upgrade AMI created

Snapshot of the source instance — your rollback insurance

↓

⚡

Worker instance launched

Temporary EC2 cloned from the pre-upgrade AMI — production instance untouched

↓

⚙

In-place upgrade runs on the worker

Pre-checks, disk compatibility, 2016 → 2022 upgrade, reboot cycles

↓

✓

Post-upgrade AMI produced

Worker terminated — new AMI ready to launch the replacement instance from

SSM Runbook visual design view showing the automation flow

The runbook design view — the automation has dozens of steps spanning multiple pages.

The same runbook on a narrower viewport — steps visible include Loop, Branch, Sleep, Pause, and Approve flow controls.

What a Successful Execution Looks Like

After two failed attempts (more on that below), the third run completed successfully. The execution list tells the story clearly:

SSM Automation executions list showing two failed and one successful execution

Three execution attempts visible — the first two failed, the third completed successfully after fixing the SSM reachability issue.

The successful execution ran 53 steps total — 51 succeeded, 2 failed but were handled by the document's built-in branching logic (the failures were on skipped upgrade paths, not actual errors).

Execution detail showing 53 steps, 51 succeeded, 2 failed

Execution detail for the successful run — 68 total steps across 7 pages, overall status: Success.

First page of executed steps including describeOriginalInstanceDetails, branchOnImdsV2Required, and assertRootVolumeIsEbs

First page of executed steps — the automation starts by inspecting the source instance before any changes are made.

The Output: Post-Upgrade AMI

Execution outputs showing the post-upgrade AMI ID

Execution outputs — the post-upgrade AMI ID is the key artifact. This is what you launch the new production instance from.

AMI naming convention

The automation names the output AMI automatically using the pattern AWSEC2_UPGRADED_AMI_TO_<VERSION>_FOR_INSTANCE_<instance-id>_<execution-id>. Keep note of the AMI ID from the Outputs panel — you'll need it for the new instance launch.

Section 03

The Issue We Hit: SSM Agent Offline

Halfway through the first two attempts, the automation stalled. The worker instance was up, the subnet was public, and yet the SSM Agent never came online. The automation just sat there waiting.

Symptoms we saw

✗ Public IP was not auto-assigned to the worker

✗ Instance couldn't reach SSM endpoints

✗ SSM Agent status: offline

✗ Automation: stuck at the worker health check step

Root cause

✓ Subnet had Auto-assign public IPv4 set to Disabled

✓ A public subnet does not automatically assign a public IP

✓ Fix: enable auto-assign at subnet level, relaunch

✓ SSM came online, automation resumed

Public subnet ≠ public IP. This is the most common cause of "SSM Agent offline" during automation. If you must stay in a public subnet, enable auto-assign public IP. The production-grade answer is SSM VPC Endpoints — the instance reaches SSM over private networking and never needs internet access at all.

Section 04

The Hard Part: Keeping the MAC Address

This is the constraint that makes the workflow genuinely interesting. AWS does not let you manually set a MAC address on an EC2 instance. The MAC is bound to the Elastic Network Interface (ENI), and each ENI has its own MAC for life. So how do you preserve a MAC across an upgrade that effectively launches a new instance?

Answer: reuse the original ENI. An ENI can be detached from one instance and attached to another. The new instance inherits the same MAC address and private IP — no downstream systems need to be updated.

The Exact Steps

Set DeleteOnTermination = false on the primary ENI

Do this BEFORE terminating the original instance · without this, terminating destroys the ENI and the MAC goes with it

Stop or terminate the original instance

The primary ENI cannot be detached while the instance is running · stop or terminate first

Detach the primary ENI

Now detachable since the instance is stopped/terminated · note the ENI ID

Launch the new instance from the post-upgrade AMI

Launch without a default ENI attached · use the post-upgrade AMI produced by the automation

Attach the preserved ENI as the primary network interface

Same MAC, same private IP · no firewall rules, DNS entries, or licensing bindings need updating

The primary ENI (eth0) cannot be detached from a running instance — only from a stopped or terminated one. Plan this into your cutover window. The secondary ENIs (eth1+) can be detached from running instances without this restriction.

Why ENI Reuse Works

The MAC address in AWS is a property of the ENI, not the instance. When you move an ENI to a new instance, you move the MAC with it. The new OS sees the same network interface it would have seen on the old instance — same MAC, same IP, same ARP entries from the network's perspective.

Property	Tied To	Survives ENI Reuse?
MAC address	ENI	✓ Yes
Private IP address	ENI	✓ Yes
Elastic IP (if attached)	ENI	✓ Yes
Security groups	ENI	✓ Yes
Instance ID	Instance	✗ Changes
Instance metadata (IMDSv2 token)	Instance	✗ Changes

Section 05

Lessons Learned & What We'd Do Differently

1. Public subnet ≠ public IP

The most common cause of SSM Agent going offline during automation. If you stay in a public subnet, verify auto-assign public IPv4 is enabled before you kick off the automation. Better still, move to a private subnet with SSM VPC Endpoints — the instance reaches SSM over private networking with no internet dependency at all.

2. ENI handling is not optional

If you care about MAC and IP retention, the ENI steps are the entire game. The single most important thing: set DeleteOnTermination = false on the primary ENI before you terminate the original instance. Miss that and the MAC is gone permanently.

3. Always have dual backups

The automation creates a pre-upgrade AMI automatically — that's your primary rollback. We also took a manual AMI immediately before kicking off the automation. Belt and suspenders. An extra AMI costs almost nothing and removes all pressure from the cutover decision.

4. Plan the downtime window honestly

The automation is unattended, but it isn't instant. The ~2 hour runtime includes cloning, upgrading, rebooting, and all the intermediate sleep/wait steps. Add time for your own validation after launch before you decommission the old AMIs.

What we'd do differently next time

Run the upgrade entirely inside a private subnet and reach SSM through VPC Interface Endpoints (ssm, ssmmessages, ec2messages). No public IP, no IGW, no NAT required for SSM. It's more secure, removes the entire "public subnet but no public IP" failure mode, and is the pattern AWS recommends for regulated workloads.

TL;DR Checklist

Take a manual AMI before you start

In addition to the automation's built-in pre-upgrade snapshot

Verify IAM has AmazonSSMManagedInstanceCore + KMS permissions

Missing KMS perms causes AMI creation to silently fail on encrypted volumes

Ensure SSM reachability before kicking off

Public IP (if public subnet) OR VPC endpoints (if private subnet)

Set DeleteOnTermination = false on the primary ENI

Do this before terminating — it cannot be undone after the fact

Stop/terminate → detach ENI → launch new instance → attach ENI

In that order — primary ENI can only be detached from a stopped/terminated instance

Validate MAC, IP, and licensing before decommissioning old AMIs

Keep the pre-upgrade and post-upgrade AMIs around until you're confident

Have questions or hit a different edge case?

→ Have you run MAC-bound licensing across a cloud migration? I'd love to hear how you handled it.

→ Did you use VPC endpoints instead of public IPs for SSM? Drop a comment with your setup.

→ Running 2008R2 and need the two-hop upgrade path? Reach out — we've navigated that too.