Setting up multi-region active–passive DR for EKS using Velero, S3 Cross-Region Replication, Helm, and IRSA — with every command, every error, and every fix documented.
Disaster Recovery isn't optional anymore — it's table stakes. In this guide, I walk through building a production-grade, multi-region DR setup for Amazon EKS using Velero. Everything is installed via Helm, secured with IRSA (no static credentials), and backed by S3 with Cross-Region Replication. I've documented every step including the real errors I hit and how I fixed them.
The setup follows an Active–Passive pattern. The primary cluster in us-east-1 runs all workloads. Velero backs up cluster state to S3, which replicates to a DR bucket in ap-south-1. A standby EKS cluster in Mumbai reads from the replicated bucket and restores workloads on demand.
Two clusters — primary in US East, DR in Mumbai. eksctl handles VPC, subnets, IAM roles, and node groups automatically.
# Primary Cluster
eksctl create cluster --name primary-cluster --region us-east-1 \
--version 1.35 --nodegroup-name primary-nodes \
--node-type t3.medium --nodes 1 --managed
# DR Cluster
eksctl create cluster --name dr-cluster --region ap-south-1 \
--version 1.35 --nodegroup-name dr-nodes \
--node-type t3.medium --nodes 1 --managed
Cost: Two EKS clusters with t3.medium ~ $5–6/day. Tear down after practicing.
Helm values stored in GitHub for version-controlled, repeatable deployments.
configuration:
backupStorageLocation:
- bucket: velero-primary-bucket
provider: aws
config:
region: us-east-1
volumeSnapshotLocation:
- provider: aws
config:
region: us-east-1
serviceAccount:
server:
name: velero
create: false
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.9.0
Gotcha: Velero Helm chart looks for /credentials/cloud by default. With IRSA, set credentials.useSecret=false.
helm upgrade velero vmware-tanzu/velero --namespace velero \
-f values-primary.yaml \
--set credentials.useSecret=false \
--set serviceAccount.server.create=false \
--set serviceAccount.server.name=velero \
--set serviceAccount.server.annotations."eks\.amazonaws\.com/role-arn"="<ROLE_ARN>"
CRR is the heart of DR. Every backup Velero writes to the primary bucket gets automatically replicated to Mumbai.
aws s3 mb s3://velero-primary-bucket --region us-east-1
aws s3 mb s3://velero-dr-bucket --region ap-south-1
aws s3api put-bucket-versioning --bucket velero-primary-bucket \
--versioning-configuration Status=Enabled
aws s3api put-bucket-versioning --bucket velero-dr-bucket \
--versioning-configuration Status=Enabled
CRR only replicates new objects after enabling. Take a fresh backup after setting up CRR.
velero backup create test-backup --include-namespaces demo
Take a fresh backup on primary (after CRR is enabled) so it replicates to the DR bucket:
# On primary cluster
velero backup create dr-backup --include-namespaces demo
velero backup get
# Switch to DR cluster
kubectl config use-context iam-root-account@dr-cluster.ap-south-1.eksctl.io
# Verify backup visible via CRR
velero backup get
# Restore
velero restore create --from-backup dr-backup
DR fully validated! nginx deployment + LoadBalancer service restored in Mumbai from a US-East-1 backup. The app is accessible via the DR region's ELB endpoint.
Fix: Create IRSA before Helm install, or use --override-existing-serviceaccounts then restart.
credentialsFile does not exist: stat /credentials/cloud — Fix with credentials.useSecret=false
Stale backup from when BSL was unavailable. Fix: velero backup delete test-backup --confirm then create fresh.
| Lesson | Detail |
|---|---|
| IRSA before Helm | Create service account first, then install Velero |
| Disable credential Secret | credentials.useSecret=false when using IRSA |
| CRR is forward-only | Only new objects replicate — always take fresh backup after enabling |
| Context discipline | Always check kubectl config current-context |
All Helm values and configs: github.com/aquavis12/velero-config