A real-world in-place upgrade pattern using AWSEC2-CloneInstanceAndUpgradeWindows, ENI reuse, and a few hard-earned lessons.
If you've ever been asked to upgrade a production Windows Server on AWS, you know the real challenge isn't the OS upgrade itself — it's everything around it: keeping the IP, not breaking MAC-bound licensing, having a rollback path, and minimising downtime. This post walks through exactly how we handled all four constraints in a single automated workflow.
The ask came in with four hard constraints attached. The instance was Windows Server 2016, it needed to become 2022, and nothing about its network identity could change. Here's why each constraint mattered:
The total runtime came in at approximately 2 hours end-to-end, including all validation steps. Plan your downtime window accordingly — especially if your instance has large EBS volumes.
AWS ships a purpose-built SSM Automation document for exactly this scenario. Rather than manually orchestrating upgrade steps, we used the pre-built runbook directly from the Systems Manager Documents library.
The document handles the heavy lifting end-to-end: cloning the source instance, spinning up a temporary worker, running the in-place upgrade, and producing a post-upgrade AMI you launch from. The running instance stays untouched throughout.
{
"InstanceId": "i-0abc1234def56789a",
"IamInstanceProfile": "cloudwatch-agent-role",
"SubnetId": "subnet-0aa1bb2cc3dd4ee5f",
"TargetWindowVersion": "2022",
"KeepPreUpgradeImageBackUp": false,
"RebootInstanceBeforeTakingImage": false
}
IAM heads-up: The instance profile needs at minimum AmazonSSMManagedInstanceCore. If your EBS volumes are encrypted with a customer-managed KMS key, also grant kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey — missing these causes AMI creation to silently fail on volume snapshot.
The automation runs as a chain of discrete, inspectable steps. Here's how the workflow progresses from trigger to post-upgrade AMI:
After two failed attempts (more on that below), the third run completed successfully. The execution list tells the story clearly:
The successful execution ran 53 steps total — 51 succeeded, 2 failed but were handled by the document's built-in branching logic (the failures were on skipped upgrade paths, not actual errors).
The automation names the output AMI automatically using the pattern AWSEC2_UPGRADED_AMI_TO_<VERSION>_FOR_INSTANCE_<instance-id>_<execution-id>. Keep note of the AMI ID from the Outputs panel — you'll need it for the new instance launch.
Halfway through the first two attempts, the automation stalled. The worker instance was up, the subnet was public, and yet the SSM Agent never came online. The automation just sat there waiting.
Public subnet ≠ public IP. This is the most common cause of "SSM Agent offline" during automation. If you must stay in a public subnet, enable auto-assign public IP. The production-grade answer is SSM VPC Endpoints — the instance reaches SSM over private networking and never needs internet access at all.
This is the constraint that makes the workflow genuinely interesting. AWS does not let you manually set a MAC address on an EC2 instance. The MAC is bound to the Elastic Network Interface (ENI), and each ENI has its own MAC for life. So how do you preserve a MAC across an upgrade that effectively launches a new instance?
Answer: reuse the original ENI. An ENI can be detached from one instance and attached to another. The new instance inherits the same MAC address and private IP — no downstream systems need to be updated.
The primary ENI (eth0) cannot be detached from a running instance — only from a stopped or terminated one. Plan this into your cutover window. The secondary ENIs (eth1+) can be detached from running instances without this restriction.
The MAC address in AWS is a property of the ENI, not the instance. When you move an ENI to a new instance, you move the MAC with it. The new OS sees the same network interface it would have seen on the old instance — same MAC, same IP, same ARP entries from the network's perspective.
| Property | Tied To | Survives ENI Reuse? |
|---|---|---|
| MAC address | ENI | ✓ Yes |
| Private IP address | ENI | ✓ Yes |
| Elastic IP (if attached) | ENI | ✓ Yes |
| Security groups | ENI | ✓ Yes |
| Instance ID | Instance | ✗ Changes |
| Instance metadata (IMDSv2 token) | Instance | ✗ Changes |
The most common cause of SSM Agent going offline during automation. If you stay in a public subnet, verify auto-assign public IPv4 is enabled before you kick off the automation. Better still, move to a private subnet with SSM VPC Endpoints — the instance reaches SSM over private networking with no internet dependency at all.
If you care about MAC and IP retention, the ENI steps are the entire game. The single most important thing: set DeleteOnTermination = false on the primary ENI before you terminate the original instance. Miss that and the MAC is gone permanently.
The automation creates a pre-upgrade AMI automatically — that's your primary rollback. We also took a manual AMI immediately before kicking off the automation. Belt and suspenders. An extra AMI costs almost nothing and removes all pressure from the cutover decision.
The automation is unattended, but it isn't instant. The ~2 hour runtime includes cloning, upgrading, rebooting, and all the intermediate sleep/wait steps. Add time for your own validation after launch before you decommission the old AMIs.
Run the upgrade entirely inside a private subnet and reach SSM through VPC Interface Endpoints (ssm, ssmmessages, ec2messages). No public IP, no IGW, no NAT required for SSM. It's more secure, removes the entire "public subnet but no public IP" failure mode, and is the pattern AWS recommends for regulated workloads.