← Back to posts

We Rewrote Our Rust Lambdas in TypeScript (And It Was the Right Call)

· 9 min read

Against the Grain

There’s a genre of tech blog post that follows a reliable formula: “We rewrote X in Rust and it’s 47x faster.” The Hacker News crowd loves it. The title alone gets 400 upvotes.

This is not that post.

We had six Lambda functions written in Rust. They worked. Then I decided to rewrite them all in TypeScript. Not because Rust is bad - it’s a genuinely great language. I rewrote them because Rust was solving a problem we didn’t have, while creating problems we couldn’t ignore.

Why I Chose Rust in the First Place

Our operations module handles email processing, vector search, and a chat interface for internal staff. The initial architecture looked like this:

  1. email-processor (2,561 LOC) - receive inbound email, parse it, classify with Bedrock, generate embeddings, store everything
  2. vector-writer (444 LOC) - write embedding vectors to S3-backed storage
  3. contact-enrichment (647 LOC) - cross-account user linking via STS assume-role
  4. vector-search (476 LOC) - semantic search over email history
  5. chat (347 LOC) - streaming chat with email context via Bedrock
  6. email-core (1,071 LOC) - shared types, DynamoDB helpers, AWS clients

I chose Rust for the reasons you’d expect: performance, type safety, low memory footprint on Lambda. The plan was to use cargo-lambda for builds, target aarch64-unknown-linux-musl for ARM64 Lambdas, and enjoy sub-10ms cold starts.

On paper, it was a clean architecture. In practice, it was about to become our biggest CI headache.

The CI Spiral

The first sign of trouble was the PR checks workflow. We use self-hosted ARM64 runners on AWS (to match our Lambda architecture). Getting cargo-lambda working on these runners required the Rust toolchain to be pre-baked into the runner AMI.

That sounds straightforward. It wasn’t.

The first attempt relied on cargo-lambda being available as a cargo subcommand. But cargo’s subcommand discovery has opinions about PATH resolution that don’t align with how AMI baking tools lay down binaries. The fix: invoke cargo-lambda directly instead of cargo lambda. That worked locally, broke on CI.

The second attempt: install cargo-lambda via pip during the CI job. But the runner image didn’t have pip. So we added pip to the AMI. That added 4 minutes to the AMI bake. Then pip installed cargo-lambda to a location that wasn’t on PATH.

The third attempt: download the cargo-lambda binary directly and extract it to /usr/local/bin. This needed sudo. Our runners had sudo, but the PATH still didn’t include the cargo bin directory, so cargo couldn’t find its own subcommands.

The fourth attempt: symlink cargo-lambda into cargo’s bin directory. This worked - until the next AMI bake, which used a different Rust toolchain version with a different bin path.

We were five CI fix commits deep, each one fixing the previous fix:

fix(ci): install cargo-lambda via pip instead of relying on AMI
fix(ci): download cargo-lambda binary directly (no pip on runner)
fix(ci): add sudo for tar extract to /usr/local/bin
fix(ci): use full path for cargo-lambda to avoid shell resolution issues

Every commit was a one-line change followed by a 12-minute wait for CI feedback. We’d already burned a full day on toolchain plumbing.

Then came the real kicker: cargo test caused OOM kills on our runners. Rust’s test harness compiles every test binary in parallel by default, and our 4GB runners couldn’t handle it. We split tests into a per-package matrix with parallelism limited to 2 - which meant test feedback took 8 minutes, for code that ran in 200ms.

The Uncomfortable Question

At some point, between the fifth CI fix and the OOM investigation, I stopped and asked the question we should have asked at the start:

What is Rust actually giving us here?

All six Lambdas were I/O-bound. Every single one. Their job was to call AWS APIs - DynamoDB reads and writes, S3 gets and puts, Bedrock inference calls, EventBridge event publishing, STS assume-role, WorkMail message fetching. The hot path in every function was waiting for a network response.

Rust’s value proposition is compute performance: zero-cost abstractions, no garbage collector, predictable latency. That matters when you’re parsing binary protocols, doing image processing, or running tight loops over large datasets. It does not matter when your Lambda spends 95% of its wallclock time waiting for DynamoDB to respond.

The cold start advantage was real - Rust Lambdas cold-started in ~30ms vs ~300ms for Node.js. But our Lambdas weren’t latency-critical. They processed inbound emails. Nobody notices if email classification takes 300ms longer.

Meanwhile, the costs were very real:

  • CI complexity: the Rust toolchain, cargo-lambda, musl cross-compilation targets, and test parallelism all required special handling on self-hosted runners
  • AMI coupling: every Rust toolchain update required a new AMI bake, tested, and rolled out across runners
  • Team friction: our primary language is TypeScript - the entire rest of the stack (CDK infrastructure, frontend apps, all other backend services) is TypeScript. Context-switching to Rust for one module meant maintaining fluency in a language we used 5% of the time
  • Ecosystem mismatch: the Node.js ecosystem for AWS SDK, AI/ML tooling, and email parsing is vastly larger and more actively maintained than the Rust equivalents

We were paying a real, recurring tax for a theoretical performance benefit that didn’t apply to our workload.

The Migration

I chose to migrate per-Lambda, not big-bang. Each Lambda got its own PR, its own deployment, and its own validation. The Lambdas communicated via EventBridge events, so we could swap implementations one at a time without breaking the pipeline.

The migration order followed the dependency graph, bottom-up:

  1. Scaffold - set up the TypeScript workspace alongside the existing Rust code (ts-src/ directory)
  2. CDK constructs - replace RustLambda custom construct with CDK’s built-in NodejsFunction
  3. Shared library - types, DynamoDB helpers (using ElectroDB), AWS SDK clients
  4. vector-writer - simplest Lambda, good proof of concept
  5. contact-enrichment - cross-account STS logic
  6. email-processor - the big one (2,561 LOC of Rust → ~200 lines of TypeScript)
  7. vector-search - semantic search
  8. chat - streaming responses with tool calling
  9. Remove Rust - delete Cargo.toml, Cargo.lock, all .rs files, simplify CI

The final commit message for step 9 told the whole story:

feat: remove Rust workspace and simplify CI

Removes all Rust code (Cargo.toml, Cargo.lock, src/*.rs, email-core
shared library). Renames ts-src/ to src/ now that Rust is gone.
Simplifies CI to TypeScript-only: typecheck + test + cdk synth.
No more cargo-lambda, Rust toolchain, or musl targets.

That commit deleted 7,948 lines from Cargo.lock alone.

What TypeScript Actually Looks Like Here

The thing that made this migration feel obvious in hindsight was how much cleaner the TypeScript versions were - not because TypeScript is a better language (it isn’t, in the general case), but because we were writing I/O glue code, and TypeScript’s ecosystem is purpose-built for it.

Here’s our email classification function. In Rust, this was ~120 lines of Bedrock client setup, request serialization, response deserialization, and error handling. In TypeScript with the Vercel AI SDK:

import { bedrock } from '@ai-sdk/amazon-bedrock';
import { generateObject } from 'ai';
import { z } from 'zod';

async function classifyEmail(subject: string, from: string, body: string) {
  const { object } = await generateObject({
    model: bedrock('anthropic.claude-haiku-4-5-20251001-v1:0'),
    schema: z.object({
      classification: z.enum(['BUSINESS', 'NOISE', 'OPERATIONAL']),
    }),
    prompt: `Classify this email as BUSINESS, OPERATIONAL, or NOISE.
Subject: ${subject}
From: ${from}
Email content:
${body.slice(0, 8000)}`,
  });
  return object.classification;
}

That’s it. Type-safe structured output from Bedrock in 15 lines. The Zod schema is both the runtime validation and the TypeScript type. The AI SDK handles Bedrock’s API shape, retries, and response parsing.

The chat Lambda went from ~350 lines of Rust (with manual tool-call parsing, Bedrock streaming protocol handling, and careful lifetime management for the streaming response) to this pattern:

import { bedrock } from '@ai-sdk/amazon-bedrock';
import { embed, generateText, tool } from 'ai';
import { z } from 'zod';

const { text } = await generateText({
  model: bedrock('anthropic.claude-haiku-4-5-20251001-v1:0'),
  system: SYSTEM_PROMPT,
  messages: conversationHistory,
  tools: {
    searchContractors: tool({
      description: 'Search accredited utility contractors by region, scope, or scheme',
      parameters: z.object({
        region: z.string().optional(),
        scopeSearch: z.string().optional(),
        scheme: z.string().optional(),
      }),
      execute: async (params) => executeSearchContractors(params),
    }),
    getContractorDetail: tool({
      description: 'Get full details of a specific contractor',
      parameters: z.object({ contractorId: z.string() }),
      execute: async ({ contractorId }) => getContractorDetail(contractorId),
    }),
  },
  maxSteps: 5,
});

Tool calling with automatic multi-step execution. The AI SDK calls the tool, feeds the result back to the model, and repeats up to maxSteps times. In Rust, we were hand-rolling the tool-call loop, parsing Bedrock’s JSON tool-use blocks, dispatching to handler functions, and managing the conversation state manually.

The CDK Simplification

Perhaps the most satisfying change was in the infrastructure code. Our custom RustLambda CDK construct handled cargo-lambda invocation, musl target selection, binary path resolution, and ARM64 cross-compilation. It was ~80 lines of build configuration.

The replacement:

new NodejsFunction(this, 'EmailProcessor', {
  entry: '../lambdas/src/handlers/email-processor.ts',
  runtime: Runtime.NODEJS_22_X,
  architecture: Architecture.ARM_64,
  timeout: Duration.minutes(5),
  memorySize: 512,
});

NodejsFunction uses esbuild under the hood. No special tooling. No binary compilation. No cross-compilation targets. esbuild ships as a single binary and runs everywhere Node.js runs. Our CI went from needing Rust toolchain + cargo-lambda + musl targets to needing… nothing extra. Just Node.js and pnpm, which we already had.

The Results

CI time: PR checks dropped from ~18 minutes (Rust typecheck + build + test + CDK synth) to ~6 minutes (TypeScript typecheck + test + CDK synth). The Rust build step alone was 8 minutes.

AMI complexity: removed entirely. No more Rust toolchain in the runner image. One fewer thing to version, bake, test, and roll out.

Lines of code: ~5,500 LOC of Rust became ~2,800 LOC of TypeScript. Not because we cut features - because the ecosystem did the heavy lifting. The Vercel AI SDK, ElectroDB, and the AWS SDK v3 are exceptionally well-designed libraries.

Cold starts: went from ~30ms to ~300ms. Nobody noticed or cared.

Developer velocity: one language across the entire stack. No context switching. Any engineer can debug any service. New hires need to know TypeScript, not TypeScript-and-also-Rust.

The Lesson

The Rust rewrite meme exists because Rust is genuinely excellent at what it does. If your Lambda is CPU-bound - image processing, data transformation, cryptographic operations, real-time computation - Rust is a fantastic choice and the cold start advantage matters.

But most Lambdas aren’t CPU-bound. Most Lambdas are glue. They receive an event, call some APIs, transform some data, and put the result somewhere. For that workload, the language runtime’s performance is noise compared to network latency. What matters is ecosystem quality, tooling simplicity, and team productivity.

I went against the trend, and the result was less code, faster CI, simpler infrastructure, and a team that could move faster. Sometimes the most pragmatic engineering decision is the unfashionable one.