The CloudFormation Export Lock: How CDK's Simplest Feature Creates Your Hardest Outage
Your CDK stacks pass certificates, hosted zones, and database tables between each other with a single line of code. Under the hood, each of those lines creates an invisible lock that will block you at the worst possible moment.
The Outage That Could Have Been Worse
At 3am on a Thursday, our monitoring lit up. HTTPS was broken across staging. The ACM certificates had expired.
The root cause was simple: our certificates used email validation. AWS sends renewal emails 45 days before expiry. Nobody approved them. The certs expired. Services went down.
The fix was also simple: approve the emails, wait for renewal. We were back up within the hour.
But we got lucky. If our domain’s MX records had changed, or the approval mailbox had been decommissioned, or this had happened during a domain migration - we would have been stuck. Email validation is a human-in-the-loop process for infrastructure that needs to be always-on.
The permanent fix was obvious: switch to DNS validation. AWS creates a CNAME record in Route 53, verifies it automatically, and renews certificates forever without human intervention.
We had no idea what we were about to walk into.
”Cannot Update Export”
We created new DNS-validated certificates alongside the old ones. Each certificate’s ARN is stored in SSM Parameter Store so our CDK stacks can find it. The plan:
- Create new DNS-validated certs (new SSM params at
/certificate/v2/paths) - Update the original SSM params to point to the new cert ARNs
- Redeploy CDK stacks - they pick up the new certs
- Delete old certificate stacks
Step 2 succeeded. Step 3 did not.
UPDATE_ROLLBACK_IN_PROGRESS | DomainStack
Cannot update export DomainStack:ExportsOutputRefCertificateArnParamParameter4277DB65
as it is in use by WebhooksStack.
CloudFormation refused to deploy. The certificate ARN had changed in SSM, which changed a cross-stack export value, which CloudFormation won’t allow while any other stack imports it.
We hadn’t created this export explicitly. CDK created it for us.
The Line of Code That Locks Your Infrastructure
In our CDK app, DomainStack reads the certificate ARN from SSM and exposes it as a property:
// DomainStack
this.certificate = Certificate.fromCertificateArn(this, "Certificate", certArn);
Other stacks receive it as a prop:
// In the CDK app
const webhooksStack = new WebhooksStack(app, "WebhooksStack", {
domainResources: {
certificate: domainStack.certificate, // This line creates the lock
hostedZone: domainStack.hostedZone,
},
});
This looks like passing a variable between two TypeScript objects. It’s not. CDK translates this into:
DomainStack template:
Outputs:
ExportsOutputRefCertificateArnParam4277DB65:
Value: !Ref CertificateArnParam
Export:
Name: DomainStack:ExportsOutputRefCertificateArnParam4277DB65
WebhooksStack template:
Resources:
ApiDomain:
Properties:
CertificateArn:
Fn::ImportValue: DomainStack:ExportsOutputRefCertificateArnParam4277DB65
CloudFormation’s rule is absolute: you cannot change an exported value while any stack imports it. You can’t update the producer. You can’t update the consumer in the same deployment. You’re locked.
The CDK documentation doesn’t warn you about this. The TypeScript compiler doesn’t warn you about this. You find out when you need to rotate a certificate in production.
The Fix: SSM as a Decoupling Layer
The solution is to have each stack read from SSM Parameter Store directly, instead of receiving values as cross-stack props:
// Before: cross-stack prop (creates CloudFormation export/import lock)
const webhooksStack = new WebhooksStack(app, "WebhooksStack", {
domainResources: {
certificate: domainStack.certificate,
},
});
// After: each stack reads SSM independently (no cross-stack dependency)
export class WebhooksStack extends cdk.Stack {
constructor(scope, id, props) {
super(scope, id, props);
const certArn = getCertificateArn(this); // Reads from SSM
const certificate = Certificate.fromCertificateArn(this, "Certificate", certArn);
}
}
With this pattern, changing the SSM parameter value and redeploying just works. No exports, no imports, no locks.
Ironically, some of our other CDK apps already used this pattern. They read from SSM directly and were never locked. Only the shared-core stacks used cross-stack props - and they were the ones that broke.
What We Didn’t Expect
The refactor should have been straightforward. It wasn’t.
Deploying the fix requires deploying in the wrong order
To break a cross-stack dependency, you need to:
- Deploy consumers first (so they stop importing the export)
- Deploy the producer second (so it can remove the export)
But CDK’s --all flag deploys in dependency order - producer first, consumer second. The exact opposite of what you need.
We had to use cdk deploy AuthStack WebhooksStack --exclusively to skip the dependency chain, then deploy DomainStack separately.
CloudFormation cares about template encoding, not just values
When we changed hostedZoneName from a cross-stack import to a local variable, the template changed from:
{ "Fn::ImportValue": "DomainStack:HostedZoneName" }
to:
"staging.example.co.uk"
Same value. Different encoding. CloudFormation treated this as a change to every resource that referenced it - including Cognito UserPoolDomain, which rejects in-place updates. The deployment failed, the stack rolled back, and the rollback failed too.
The fix: keep hostedZoneName as a cross-stack prop (it’s a stable string that never changes), and only decouple the values that actually rotate (certificate ARN, hosted zone ID).
Ghost dependencies from past deployments
When AuthStack’s rollback failed, we discovered it couldn’t roll back because other stacks in other CDK apps were importing its exports. These imports were from old deployments - the current CDK code had long since switched to SSM reads. But the deployed CloudFormation templates still had Fn::ImportValue references.
We had to redeploy those stacks (with current code that doesn’t import) before the rollback could complete.
Orphaned resources block new ones
When we updated our infrastructure-as-code to write SSM parameters at the original paths (instead of /v2/ paths), CloudFormation refused: “Resource of type AWS::SSM::Parameter with identifier /certificate/local/arn already exists.” The parameter existed from a stack that had been deleted months earlier. CloudFormation treats a renamed parameter as a create, not an update.
We had to manually delete the orphaned parameters before the deployment could proceed.
The Pattern: When to Use Props vs SSM
Not all cross-stack references are dangerous. The test is simple:
“Will I ever need to change this value without redeploying the producing stack?”
| Value | Changes? | Use |
|---|---|---|
| Certificate ARN | Yes (rotation, migration) | SSM |
| Hosted zone ID | Rarely, but possible | SSM |
| Domain name string | Never | Props are fine |
| DynamoDB table ARN | Never (table is permanent) | Props are fine |
| S3 bucket name | Never | Props are fine |
If a value might change independently of the stack that creates it - certificates, secrets, feature flags, configuration - read it from SSM. If it’s truly permanent and defined by the stack itself - table ARNs, bucket names, queue URLs - cross-stack props are fine.
The Uncomfortable Question
Open your CDK app. Search for patterns like:
someStack.someProperty // passed to another stack's props
Each one is a CloudFormation export. Each export is a lock. Ask yourself:
- What happens when this certificate expires and I need to replace it?
- What happens when I need to migrate this database?
- What happens when this hosted zone changes?
If the answer is “I’ll just update the value and redeploy” - you can’t. CloudFormation won’t let you. You’ll discover this at 3am, in production, during an incident.
We got lucky. Our outage was a warning. Yours might not be.
This post is based on a real incident involving ACM certificate migration across a multi-account AWS organization using CDK. The infrastructure patterns described - both the problems and the solutions - are common in any CDK codebase with multiple stacks.
The Hidden Lock That Blocks Infrastructure Changes at the Worst Moment
Fixing an expired security certificate should take an hour. It did. But the proper fix — the one that prevents this from happening again — took days and surfaced four separate infrastructure traps we’d unknowingly walked into.
3am on a Thursday
Our monitoring woke us up. HTTPS was broken across staging. The security certificates had expired.
The root cause was embarrassingly simple: our certificates used email validation. AWS sends renewal reminder emails 45 days before expiry. Nobody had approved them. The certificates expired. Services went down.
The fix was also simple: approve the emails, wait for renewal. We were back up within the hour.
But we’d gotten lucky. If the approval inbox had been decommissioned, or if we’d been in the middle of a domain migration, we’d have been genuinely stuck. Email validation is a “human must approve this” process for infrastructure that’s supposed to run itself.
The permanent fix was obvious: switch to DNS validation, where AWS automatically renews certificates forever by verifying a record it controls. No human required.
We had no idea what we were walking into.
The Invisible Locks
Our cloud infrastructure is built using a tool called CDK, which lets you write infrastructure as code. When you have multiple “stacks” (think: self-contained bundles of cloud resources), they often need to share values — a security certificate here, a domain name there.
CDK makes sharing look easy. You write what looks like a simple variable assignment, and the certificate ARN (a unique identifier) flows from one stack to another. The problem is what happens under the hood.
CloudFormation — the AWS service that actually manages all of this — translates that simple variable assignment into something called an export and import. The producing stack exports a value. The consuming stack imports it. And CloudFormation’s rule is absolute: you cannot change an exported value while any stack is still importing it.
We hadn’t created these exports explicitly. CDK had created them for us, silently, from our innocent-looking variable assignments.
So when we tried to update our certificates and push the new values through, CloudFormation refused. We were locked. The documentation doesn’t warn you about this. The code editor doesn’t warn you about this. You find out when you need to rotate a certificate in production at 3am.
Trying to Escape
The fix is conceptually straightforward: instead of stacks passing values to each other directly, each stack reads the value it needs from a central configuration store (AWS SSM Parameter Store). Update the configuration store, redeploy, everything picks up the new value. No exports, no imports, no locks.
Ironically, some of our other infrastructure was already using this pattern. Only the “shared core” stacks used the direct passing approach — and they were exactly the ones that broke.
The refactor should have been simple. It wasn’t.
The deployment order was backwards. To break a dependency chain, you need to update the consumers first (so they stop importing the value), then update the producer (so it can remove the export). But CDK’s automatic deployment tool does it the other way around: producer first, consumers second. We had to manually specify which stacks to deploy in which order.
Same value, different encoding, different outcome. When we changed one value from a cross-stack import to a local variable, the underlying template changed even though the actual value was identical. CloudFormation saw this as a modification to every resource that used it — including one that rejects in-place updates entirely. The deployment failed. The rollback failed too.
Old deployments leave phantom traces. When our rollback failed, we discovered that stacks from completely different parts of our infrastructure were still importing values from old deployments. The current code had long since stopped using those imports. But the deployed templates hadn’t been updated to match. We had to redeploy those stacks (with the current, non-importing code) before our rollback could complete.
Deleted stacks leave orphaned resources. When we tried to recreate certain configuration parameters at their original paths, CloudFormation refused because those parameters already existed — left over from a stack that had been deleted months earlier. We had to manually clean up resources that CloudFormation had forgotten it ever created.
The Rule That Would Have Prevented This
Not all cross-stack references are dangerous. There’s one question that predicts whether you’ll hit this problem:
“Will I ever need to change this value without rebuilding the stack that creates it?”
Security certificates rotate. Domain configuration changes during migrations. These should be read from a central store, not passed between stacks directly.
A database table’s identifier never changes — the table is permanent, defined by the stack that creates it. Bucket names don’t change. These are fine to pass directly.
The failure mode is predictable: things that look stable but aren’t. Certificates feel permanent until you need to replace them. Configuration feels stable until you need to migrate. The lock only bites you when you’re already in an incident and can least afford to discover it.
The Uncomfortable Closer
The pattern we fell into is common because CDK makes it invisible. The simple, readable code hides the CloudFormation machinery underneath. Every time you pass a value from one stack to another, you’re creating a lock. Most of the time, you’ll never notice. The one time you do notice will be at 3am, in production, when you urgently need to change something.
Our outage was a warning. The certificates expired in staging, not production. The fix was annoying but survivable. We had time to find all four of the hidden traps and clean them up properly.
Next time might not be so forgiving.
This is part of an ongoing series about building a startup’s engineering platform. For context on the broader infrastructure setup, see the self-hosted runners post.
← Back to posts