← Back to posts

When ESM Meets CJS: How Silent Bundling Failures Broke 98 Lambdas

· 8 min read

We migrated our Lambda deployments from CDK to OpenTofu. Everything looked green — CI passed, staging deployed, production promoted. Days later, we discovered 98 out of 160 Lambda handlers were silently broken. Here’s how three invisible failures stacked up to create a production incident nobody saw coming.


The Setup

Our backend is a monorepo with ~160 Lambda handlers, all TypeScript, bundled with tsup/esbuild into CommonJS. Large shared dependencies — @aws-sdk, Sentry, ElectroDB, Pino, Zod — live in a Lambda Layer so they’re not duplicated across every handler.

As part of our road to continuous delivery, we moved to pre-built artifacts uploaded to S3 and deployed via OpenTofu. No more CDK’s NodejsFunction bundling at deploy time. Faster deploys, deterministic artifacts, separated concerns.

That separation is what got us.


The Silent Failure

Somewhere around May 7th — eleven days after production launch — a dependency update pulled in @middy/core v6, which is ESM-only. Its package.json has "type": "module". We’d been on v5 (CJS-compatible) until then. A routine version bump. Happens all the time.

Here’s what tsup/esbuild does when you tell it to produce CJS output and it encounters an ESM-only package:

Nothing.

No error. No warning. It silently leaves a require("@middy/core") in the output as if it were an intentional external — like @aws-sdk in our Lambda Layer. The bundle builds. The exit code is 0. The output looks right if you don’t squint.

The result: 98 handler bundles with unresolved require() calls that crash on Lambda cold start.

Runtime.ImportModuleError: Error: Cannot find module '@middy/core'

Every handler that used Middy was broken. We just didn’t know it yet.


The Triple Mask

Three independent systems conspired to hide the problem for over a week.

Mask 1: Turbo Remote Cache

We use Turborepo with S3 remote caching for build artifacts. Old builds — from before Middy went ESM-only — were cached and kept being served. As long as a handler’s source files didn’t change, Turbo returned the old working bundle.

This is the insidious part: the broken build config was already in place, but most handlers never got rebuilt. First cache miss? Broken bundle deployed. But you don’t know which deploy introduced it because the build tool says everything’s fine.

Mask 2: Warm Lambda Instances

Lambda keeps instances warm for roughly 15 minutes after the last invocation. Broken bundles only crash on cold start — when the runtime loads index.js for the first time. Warm instances from before the broken deploy continued serving requests normally.

Gradual cold start failures looked like intermittent issues, not a systemic problem. CloudWatch showed occasional errors mixed with healthy invocations. The kind of noise you monitor but don’t panic about.

Mask 3: CI Passed

pnpm turbo run build    # ✅ Turbo cache hit — returned old working bundle
pnpm turbo run typecheck # ✅ TypeScript doesn't check runtime module resolution
pnpm turbo run test      # ✅ Vitest resolves modules differently than Lambda runtime

Every gate was green. No assertion in our pipeline checked that the built bundles were actually self-contained.


Discovery

The incident surfaced through an unrelated bug. We were fixing verification emails that pointed to the wrong URL — a separate task entirely. That fix pulled in @kitajs/html, another ESM-only package, to replace react-email in a Cognito trigger.

The @kitajs/html import broke the Turbo cache for that handler. Fresh build. esbuild couldn’t resolve @kitajs/html/jsx-runtime. That was the first honest error we’d seen.

Investigating that one handler led us to check the others. CloudWatch confirmed the blast radius: every Lambda deployed via OpenTofu that used @middy/core was crashing on cold start. 98 out of 160 handlers.

Timeline:

TimeEvent
16:10Registration flow tested — customMessage trigger crashes
16:27CloudWatch confirms: Cannot find module @middy/core
17:08Blast radius confirmed: 98/160 handlers broken
18:34PR #1 merged (Layer approach) — fails in staging
19:00PR #2 merged (noExternal approach) — all Lambdas cold-start clean
19:15Staging smoke tests pass, production promotion triggered

Three hours from discovery to production fix. The incident had been silently brewing for eight days.


The Fix: Two Attempts

Attempt 1: Add ESM Packages to the Lambda Layer

The instinct was obvious: put @middy/core in the Lambda Layer alongside @aws-sdk, declare it as an external.

This failed immediately at runtime:

Error [ERR_REQUIRE_ESM]: require() of ES Module not supported.
Instead change the require to a dynamic import() which is available in all CommonJS modules.

CJS handlers can’t require() ESM modules. Full stop. The Layer approach only works for CJS-compatible packages. This was a dead end.

Attempt 2: Force-Bundle via noExternal ✅

The root cause was tsup’s default behaviour: it auto-externalizes everything in node_modules. For CJS-compatible packages, this is fine — they resolve at runtime from the Layer. For ESM-only packages, it’s silent failure.

The fix was explicit: tell tsup to bundle specific packages instead of externalizing them.

// packages/backend/shared/tsup.base.ts
export const baseConfig: Options = {
  format: "cjs",
  noExternal: [
    "@middy/core",
    "@middy/http-json-body-parser",
    "@middy/http-error-handler",
    "@middy/input-output-logger",
    // Any ESM-only package must be listed here
  ],
  // esbuild inlines the ESM code and converts it to CJS
  // This is what esbuild is good at — it just needs to be told
};

esbuild handles ESM-to-CJS conversion perfectly when you explicitly tell it to bundle the package. The problem was never esbuild’s capability — it was tsup’s default of treating all node_modules as externals, and esbuild’s silence when it can’t resolve one.


The CI Safety Net

Fixing the immediate problem was three hours of work. Making sure it never happens again was another hour. We enabled esbuild’s metafile option, which outputs a JSON file listing every dependency and whether it was bundled or left external.

// tsup.base.ts
export const baseConfig: Options = {
  // ...
  esbuildOptions(options) {
    options.metafile = true;
  },
};

Then a CI script that reads every metafile and validates that every external import is either a Node.js builtin or a package in our Lambda Layer:

#!/usr/bin/env bash
# .github/scripts/validate-bundle-deps.sh

LAYER_PACKAGES="@aws-sdk @sentry electrodb pino zod"
ERRORS=0

for metafile in $(find packages/backend -name "metafile-*.json"); do
  # Extract all external imports from esbuild's dependency graph
  externals=$(node -e "
    const meta = require('./$metafile');
    const exts = new Set();
    for (const [, info] of Object.entries(meta.outputs)) {
      for (const [imp, details] of Object.entries(info.imports || {})) {
        if (details.external) exts.add(imp);
      }
    }
    console.log([...exts].join('\n'));
  ")

  while IFS= read -r ext; do
    [[ -z "$ext" ]] && continue
    # Allow Node builtins
    [[ "$ext" =~ ^(node:|fs|path|crypto|util|stream|events|http|https|url|os|child_process) ]] && continue
    # Allow Layer packages
    for layer_pkg in $LAYER_PACKAGES; do
      [[ "$ext" == "$layer_pkg"* ]] && continue 2
    done
    echo "ERROR: Unexpected external '$ext' in $metafile"
    ERRORS=$((ERRORS + 1))
  done <<< "$externals"
done

if [[ $ERRORS -gt 0 ]]; then
  echo "Found $ERRORS unexpected externals. These packages must be bundled (add to noExternal) or added to the Lambda Layer."
  exit 1
fi

169 handlers validated on every PR and deploy. This script would have caught the Middy issue on the very first PR that shipped a broken bundle — days before we discovered it manually.

The key insight: esbuild already knows exactly what it bundled and what it left external. The metafile is a complete dependency graph. No grep heuristics, no regex parsing of bundle output, no false positives. You’re reading esbuild’s own accounting.


Lessons

1. esbuild’s silence is dangerous. It doesn’t distinguish between “I externalized this because you asked” and “I couldn’t bundle this so I gave up.” Both look identical in the output. Both produce exit code 0.

2. CJS can’t require() ESM. Putting ESM-only packages in a Lambda Layer doesn’t work. They must be bundled inline. esbuild handles the ESM-to-CJS conversion — it just needs to be told explicitly.

3. Build caches hide build breakage. If your CI only builds changed files and caches the rest, a broken build config can ship for days before a cache miss reveals it. We learned this lesson differently with the 502s — now it bit us from the other side.

4. Warm instances hide runtime breakage. Lambda cold starts are infrequent enough that broken deploys can survive for days on warm instances. Your monitoring shows intermittent errors, not a systemic outage.

5. Assert on build outputs, not just build success. exit 0 from the build tool doesn’t mean the output is correct. Scan the artifacts. The build succeeded — the bundle was wrong.

6. Metafile is your friend. ~65 lines of bash + inline Node.js. Validates 169 handlers. Catches phantom externals before they reach production. Trivial to add, impossible to justify not having after this incident.


The Uncomfortable Question

Check your tsup or esbuild config. Is format: 'cjs'? Are you relying on node_modules auto-externalization?

Now check your package-lock.json or pnpm-lock.yaml for packages that recently added "type": "module". Middy did it. Chalk did it years ago. More packages are going ESM-only every month.

Your build tool won’t tell you when it happens. Your tests won’t catch it. Your CI will stay green. Your Lambdas will keep running — until they don’t.

The fix takes an hour. The metafile validation takes another hour. The incident that finds you first takes longer.


This post is part of an ongoing series about building a startup’s engineering platform. The Turbo cache 502 post covers an earlier encounter with our build caching setup, and Road to Continuous Delivery covers the pipeline evolution that set the stage for this incident.