AWS Systems Manager EC2 Windows Server

Upgrading Windows Server 2016 → 2022 on AWS Without Losing the MAC Address

A real-world in-place upgrade pattern using AWSEC2-CloneInstanceAndUpgradeWindows, ENI reuse, and a few hard-earned lessons.

VR
Vishnu Rachapudi
Cloud & AI Engineer · AWS Community Builder (Security)
MA
Muhammad Ahmed
Co-Author · Cloud Engineer
April 2026
10 min read

If you've ever been asked to upgrade a production Windows Server on AWS, you know the real challenge isn't the OS upgrade itself — it's everything around it: keeping the IP, not breaking MAC-bound licensing, having a rollback path, and minimising downtime. This post walks through exactly how we handled all four constraints in a single automated workflow.

In this post
  1. Background & constraints
  2. The automation: AWSEC2-CloneInstanceAndUpgradeWindows
  3. The issue we hit: SSM Agent offline
  4. The hard part: keeping the MAC address
  5. Lessons learned & what we'd do differently
Section 01

Background & Constraints

The ask came in with four hard constraints attached. The instance was Windows Server 2016, it needed to become 2022, and nothing about its network identity could change. Here's why each constraint mattered:

🔗
Same Private IP
Firewall rules, DNS entries, and app configs were all keyed to this address.
🔑
Same MAC Address
Application licensing was bound to the network interface MAC — change it and the app stops working.
Minimal Downtime
The instance served production traffic. Cutover needed to be surgical, not a maintenance window weekend.
🔄
Rollback Path
No upgrade is risk-free. We needed a clear way back if something went wrong post-cutover.
Zero Data Loss
All volumes, all data, all configuration — untouched through the entire process.

The total runtime came in at approximately 2 hours end-to-end, including all validation steps. Plan your downtime window accordingly — especially if your instance has large EBS volumes.

Section 02

The Automation: AWSEC2-CloneInstanceAndUpgradeWindows

AWS ships a purpose-built SSM Automation document for exactly this scenario. Rather than manually orchestrating upgrade steps, we used the pre-built runbook directly from the Systems Manager Documents library.

AWSEC2-CloneInstanceAndUpgradeWindows document content in AWS Systems Manager
The SSM document AWSEC2-CloneInstanceAndUpgradeWindows — version 45 (default). Supports upgrades from Windows Server 2008R2 through to the 2025 target.

The document handles the heavy lifting end-to-end: cloning the source instance, spinning up a temporary worker, running the in-place upgrade, and producing a post-upgrade AMI you launch from. The running instance stays untouched throughout.

Input Parameters We Provided

Input parameters for the SSM automation execution
Execution input parameters — instance ID, IAM profile, subnet, and target Windows version set to 2022. All real IDs replaced with dummy values below.
📄 automation-input-parameters.json
{
  "InstanceId": "i-0abc1234def56789a",
  "IamInstanceProfile": "cloudwatch-agent-role",
  "SubnetId": "subnet-0aa1bb2cc3dd4ee5f",
  "TargetWindowVersion": "2022",
  "KeepPreUpgradeImageBackUp": false,
  "RebootInstanceBeforeTakingImage": false
}

IAM heads-up: The instance profile needs at minimum AmazonSSMManagedInstanceCore. If your EBS volumes are encrypted with a customer-managed KMS key, also grant kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey — missing these causes AMI creation to silently fail on volume snapshot.

Execution Flow

The automation runs as a chain of discrete, inspectable steps. Here's how the workflow progresses from trigger to post-upgrade AMI:

📷
Pre-upgrade AMI created
Snapshot of the source instance — your rollback insurance
Worker instance launched
Temporary EC2 cloned from the pre-upgrade AMI — production instance untouched
In-place upgrade runs on the worker
Pre-checks, disk compatibility, 2016 → 2022 upgrade, reboot cycles
Post-upgrade AMI produced
Worker terminated — new AMI ready to launch the replacement instance from
SSM Runbook visual design view showing the automation flow
The runbook design view — the automation has dozens of steps spanning multiple pages.
SSM Runbook design view on mobile
The same runbook on a narrower viewport — steps visible include Loop, Branch, Sleep, Pause, and Approve flow controls.

What a Successful Execution Looks Like

After two failed attempts (more on that below), the third run completed successfully. The execution list tells the story clearly:

SSM Automation executions list showing two failed and one successful execution
Three execution attempts visible — the first two failed, the third completed successfully after fixing the SSM reachability issue.

The successful execution ran 53 steps total — 51 succeeded, 2 failed but were handled by the document's built-in branching logic (the failures were on skipped upgrade paths, not actual errors).

Execution detail showing 53 steps, 51 succeeded, 2 failed
Execution detail for the successful run — 68 total steps across 7 pages, overall status: Success.
First page of executed steps including describeOriginalInstanceDetails, branchOnImdsV2Required, and assertRootVolumeIsEbs
First page of executed steps — the automation starts by inspecting the source instance before any changes are made.

The Output: Post-Upgrade AMI

Execution outputs showing the post-upgrade AMI ID
Execution outputs — the post-upgrade AMI ID is the key artifact. This is what you launch the new production instance from.
AMI naming convention

The automation names the output AMI automatically using the pattern AWSEC2_UPGRADED_AMI_TO_<VERSION>_FOR_INSTANCE_<instance-id>_<execution-id>. Keep note of the AMI ID from the Outputs panel — you'll need it for the new instance launch.

Section 03

The Issue We Hit: SSM Agent Offline

Halfway through the first two attempts, the automation stalled. The worker instance was up, the subnet was public, and yet the SSM Agent never came online. The automation just sat there waiting.

Symptoms we saw
Public IP was not auto-assigned to the worker
Instance couldn't reach SSM endpoints
SSM Agent status: offline
Automation: stuck at the worker health check step
Root cause
Subnet had Auto-assign public IPv4 set to Disabled
A public subnet does not automatically assign a public IP
Fix: enable auto-assign at subnet level, relaunch
SSM came online, automation resumed

Public subnet ≠ public IP. This is the most common cause of "SSM Agent offline" during automation. If you must stay in a public subnet, enable auto-assign public IP. The production-grade answer is SSM VPC Endpoints — the instance reaches SSM over private networking and never needs internet access at all.

Section 04

The Hard Part: Keeping the MAC Address

This is the constraint that makes the workflow genuinely interesting. AWS does not let you manually set a MAC address on an EC2 instance. The MAC is bound to the Elastic Network Interface (ENI), and each ENI has its own MAC for life. So how do you preserve a MAC across an upgrade that effectively launches a new instance?

Answer: reuse the original ENI. An ENI can be detached from one instance and attached to another. The new instance inherits the same MAC address and private IP — no downstream systems need to be updated.

The Exact Steps

Set DeleteOnTermination = false on the primary ENI
Do this BEFORE terminating the original instance · without this, terminating destroys the ENI and the MAC goes with it
Stop or terminate the original instance
The primary ENI cannot be detached while the instance is running · stop or terminate first
Detach the primary ENI
Now detachable since the instance is stopped/terminated · note the ENI ID
Launch the new instance from the post-upgrade AMI
Launch without a default ENI attached · use the post-upgrade AMI produced by the automation
Attach the preserved ENI as the primary network interface
Same MAC, same private IP · no firewall rules, DNS entries, or licensing bindings need updating

The primary ENI (eth0) cannot be detached from a running instance — only from a stopped or terminated one. Plan this into your cutover window. The secondary ENIs (eth1+) can be detached from running instances without this restriction.

Why ENI Reuse Works

The MAC address in AWS is a property of the ENI, not the instance. When you move an ENI to a new instance, you move the MAC with it. The new OS sees the same network interface it would have seen on the old instance — same MAC, same IP, same ARP entries from the network's perspective.

PropertyTied ToSurvives ENI Reuse?
MAC addressENI✓ Yes
Private IP addressENI✓ Yes
Elastic IP (if attached)ENI✓ Yes
Security groupsENI✓ Yes
Instance IDInstance✗ Changes
Instance metadata (IMDSv2 token)Instance✗ Changes
Section 05

Lessons Learned & What We'd Do Differently

1. Public subnet ≠ public IP

The most common cause of SSM Agent going offline during automation. If you stay in a public subnet, verify auto-assign public IPv4 is enabled before you kick off the automation. Better still, move to a private subnet with SSM VPC Endpoints — the instance reaches SSM over private networking with no internet dependency at all.

2. ENI handling is not optional

If you care about MAC and IP retention, the ENI steps are the entire game. The single most important thing: set DeleteOnTermination = false on the primary ENI before you terminate the original instance. Miss that and the MAC is gone permanently.

3. Always have dual backups

The automation creates a pre-upgrade AMI automatically — that's your primary rollback. We also took a manual AMI immediately before kicking off the automation. Belt and suspenders. An extra AMI costs almost nothing and removes all pressure from the cutover decision.

4. Plan the downtime window honestly

The automation is unattended, but it isn't instant. The ~2 hour runtime includes cloning, upgrading, rebooting, and all the intermediate sleep/wait steps. Add time for your own validation after launch before you decommission the old AMIs.

What we'd do differently next time

Run the upgrade entirely inside a private subnet and reach SSM through VPC Interface Endpoints (ssm, ssmmessages, ec2messages). No public IP, no IGW, no NAT required for SSM. It's more secure, removes the entire "public subnet but no public IP" failure mode, and is the pattern AWS recommends for regulated workloads.

TL;DR Checklist

Take a manual AMI before you start
In addition to the automation's built-in pre-upgrade snapshot
Verify IAM has AmazonSSMManagedInstanceCore + KMS permissions
Missing KMS perms causes AMI creation to silently fail on encrypted volumes
Ensure SSM reachability before kicking off
Public IP (if public subnet) OR VPC endpoints (if private subnet)
Set DeleteOnTermination = false on the primary ENI
Do this before terminating — it cannot be undone after the fact
Stop/terminate → detach ENI → launch new instance → attach ENI
In that order — primary ENI can only be detached from a stopped/terminated instance
Validate MAC, IP, and licensing before decommissioning old AMIs
Keep the pre-upgrade and post-upgrade AMIs around until you're confident
Have questions or hit a different edge case?
Have you run MAC-bound licensing across a cloud migration? I'd love to hear how you handled it.
Did you use VPC endpoints instead of public IPs for SSM? Drop a comment with your setup.
Running 2008R2 and need the two-hop upgrade path? Reach out — we've navigated that too.