We Rewrote Our Rust Lambdas in TypeScript (And It Was the Right Call)

2026-03-23 · 9 min read

Against the Grain

There’s a genre of tech blog post that follows a reliable formula: “We rewrote X in Rust and it’s 47x faster.” The Hacker News crowd loves it. The title alone gets 400 upvotes.

This is not that post.

We had six Lambda functions written in Rust. They worked. Then I decided to rewrite them all in TypeScript. Not because Rust is bad - it’s a genuinely great language. I rewrote them because Rust was solving a problem we didn’t have, while creating problems we couldn’t ignore.

Why I Chose Rust in the First Place

Our operations module handles email processing, vector search, and a chat interface for internal staff. The initial architecture looked like this:

email-processor (2,561 LOC) - receive inbound email, parse it, classify with Bedrock, generate embeddings, store everything
vector-writer (444 LOC) - write embedding vectors to S3-backed storage
contact-enrichment (647 LOC) - cross-account user linking via STS assume-role
vector-search (476 LOC) - semantic search over email history
chat (347 LOC) - streaming chat with email context via Bedrock
email-core (1,071 LOC) - shared types, DynamoDB helpers, AWS clients

I chose Rust for the reasons you’d expect: performance, type safety, low memory footprint on Lambda. The plan was to use cargo-lambda for builds, target aarch64-unknown-linux-musl for ARM64 Lambdas, and enjoy sub-10ms cold starts.

On paper, it was a clean architecture. In practice, it was about to become our biggest CI headache.

The CI Spiral

The first sign of trouble was the PR checks workflow. We use self-hosted ARM64 runners on AWS (to match our Lambda architecture). Getting cargo-lambda working on these runners required the Rust toolchain to be pre-baked into the runner AMI.

That sounds straightforward. It wasn’t.

The first attempt relied on cargo-lambda being available as a cargo subcommand. But cargo’s subcommand discovery has opinions about PATH resolution that don’t align with how AMI baking tools lay down binaries. The fix: invoke cargo-lambda directly instead of cargo lambda. That worked locally, broke on CI.

The second attempt: install cargo-lambda via pip during the CI job. But the runner image didn’t have pip. So we added pip to the AMI. That added 4 minutes to the AMI bake. Then pip installed cargo-lambda to a location that wasn’t on PATH.

The third attempt: download the cargo-lambda binary directly and extract it to /usr/local/bin. This needed sudo. Our runners had sudo, but the PATH still didn’t include the cargo bin directory, so cargo couldn’t find its own subcommands.

The fourth attempt: symlink cargo-lambda into cargo’s bin directory. This worked - until the next AMI bake, which used a different Rust toolchain version with a different bin path.

We were five CI fix commits deep, each one fixing the previous fix:

fix(ci): install cargo-lambda via pip instead of relying on AMI
fix(ci): download cargo-lambda binary directly (no pip on runner)
fix(ci): add sudo for tar extract to /usr/local/bin
fix(ci): use full path for cargo-lambda to avoid shell resolution issues

Every commit was a one-line change followed by a 12-minute wait for CI feedback. We’d already burned a full day on toolchain plumbing.

Then came the real kicker: cargo test caused OOM kills on our runners. Rust’s test harness compiles every test binary in parallel by default, and our 4GB runners couldn’t handle it. We split tests into a per-package matrix with parallelism limited to 2 - which meant test feedback took 8 minutes, for code that ran in 200ms.

The Uncomfortable Question

At some point, between the fifth CI fix and the OOM investigation, I stopped and asked the question we should have asked at the start:

What is Rust actually giving us here?

All six Lambdas were I/O-bound. Every single one. Their job was to call AWS APIs - DynamoDB reads and writes, S3 gets and puts, Bedrock inference calls, EventBridge event publishing, STS assume-role, WorkMail message fetching. The hot path in every function was waiting for a network response.

Rust’s value proposition is compute performance: zero-cost abstractions, no garbage collector, predictable latency. That matters when you’re parsing binary protocols, doing image processing, or running tight loops over large datasets. It does not matter when your Lambda spends 95% of its wallclock time waiting for DynamoDB to respond.

The cold start advantage was real - Rust Lambdas cold-started in ~30ms vs ~300ms for Node.js. But our Lambdas weren’t latency-critical. They processed inbound emails. Nobody notices if email classification takes 300ms longer.

Meanwhile, the costs were very real:

CI complexity: the Rust toolchain, cargo-lambda, musl cross-compilation targets, and test parallelism all required special handling on self-hosted runners
AMI coupling: every Rust toolchain update required a new AMI bake, tested, and rolled out across runners
Team friction: our primary language is TypeScript - the entire rest of the stack (CDK infrastructure, frontend apps, all other backend services) is TypeScript. Context-switching to Rust for one module meant maintaining fluency in a language we used 5% of the time
Ecosystem mismatch: the Node.js ecosystem for AWS SDK, AI/ML tooling, and email parsing is vastly larger and more actively maintained than the Rust equivalents

We were paying a real, recurring tax for a theoretical performance benefit that didn’t apply to our workload.

The Migration

I chose to migrate per-Lambda, not big-bang. Each Lambda got its own PR, its own deployment, and its own validation. The Lambdas communicated via EventBridge events, so we could swap implementations one at a time without breaking the pipeline.

The migration order followed the dependency graph, bottom-up:

Scaffold - set up the TypeScript workspace alongside the existing Rust code (ts-src/ directory)
CDK constructs - replace RustLambda custom construct with CDK’s built-in NodejsFunction
Shared library - types, DynamoDB helpers (using ElectroDB), AWS SDK clients
vector-writer - simplest Lambda, good proof of concept
contact-enrichment - cross-account STS logic
email-processor - the big one (2,561 LOC of Rust → ~200 lines of TypeScript)
vector-search - semantic search
chat - streaming responses with tool calling
Remove Rust - delete Cargo.toml, Cargo.lock, all .rs files, simplify CI

The final commit message for step 9 told the whole story:

feat: remove Rust workspace and simplify CI

Removes all Rust code (Cargo.toml, Cargo.lock, src/*.rs, email-core
shared library). Renames ts-src/ to src/ now that Rust is gone.
Simplifies CI to TypeScript-only: typecheck + test + cdk synth.
No more cargo-lambda, Rust toolchain, or musl targets.

That commit deleted 7,948 lines from Cargo.lock alone.

What TypeScript Actually Looks Like Here

The thing that made this migration feel obvious in hindsight was how much cleaner the TypeScript versions were - not because TypeScript is a better language (it isn’t, in the general case), but because we were writing I/O glue code, and TypeScript’s ecosystem is purpose-built for it.

Here’s our email classification function. In Rust, this was ~120 lines of Bedrock client setup, request serialization, response deserialization, and error handling. In TypeScript with the Vercel AI SDK:

import { bedrock } from '@ai-sdk/amazon-bedrock';
import { generateObject } from 'ai';
import { z } from 'zod';

async function classifyEmail(subject: string, from: string, body: string) {
  const { object } = await generateObject({
    model: bedrock('anthropic.claude-haiku-4-5-20251001-v1:0'),
    schema: z.object({
      classification: z.enum(['BUSINESS', 'NOISE', 'OPERATIONAL']),
    }),
    prompt: `Classify this email as BUSINESS, OPERATIONAL, or NOISE.
Subject: ${subject}
From: ${from}
Email content:
${body.slice(0, 8000)}`,
  });
  return object.classification;
}

That’s it. Type-safe structured output from Bedrock in 15 lines. The Zod schema is both the runtime validation and the TypeScript type. The AI SDK handles Bedrock’s API shape, retries, and response parsing.

The chat Lambda went from ~350 lines of Rust (with manual tool-call parsing, Bedrock streaming protocol handling, and careful lifetime management for the streaming response) to this pattern:

import { bedrock } from '@ai-sdk/amazon-bedrock';
import { embed, generateText, tool } from 'ai';
import { z } from 'zod';

const { text } = await generateText({
  model: bedrock('anthropic.claude-haiku-4-5-20251001-v1:0'),
  system: SYSTEM_PROMPT,
  messages: conversationHistory,
  tools: {
    searchContractors: tool({
      description: 'Search accredited utility contractors by region, scope, or scheme',
      parameters: z.object({
        region: z.string().optional(),
        scopeSearch: z.string().optional(),
        scheme: z.string().optional(),
      }),
      execute: async (params) => executeSearchContractors(params),
    }),
    getContractorDetail: tool({
      description: 'Get full details of a specific contractor',
      parameters: z.object({ contractorId: z.string() }),
      execute: async ({ contractorId }) => getContractorDetail(contractorId),
    }),
  },
  maxSteps: 5,
});

Tool calling with automatic multi-step execution. The AI SDK calls the tool, feeds the result back to the model, and repeats up to maxSteps times. In Rust, we were hand-rolling the tool-call loop, parsing Bedrock’s JSON tool-use blocks, dispatching to handler functions, and managing the conversation state manually.

The CDK Simplification

Perhaps the most satisfying change was in the infrastructure code. Our custom RustLambda CDK construct handled cargo-lambda invocation, musl target selection, binary path resolution, and ARM64 cross-compilation. It was ~80 lines of build configuration.

The replacement:

new NodejsFunction(this, 'EmailProcessor', {
  entry: '../lambdas/src/handlers/email-processor.ts',
  runtime: Runtime.NODEJS_22_X,
  architecture: Architecture.ARM_64,
  timeout: Duration.minutes(5),
  memorySize: 512,
});

NodejsFunction uses esbuild under the hood. No special tooling. No binary compilation. No cross-compilation targets. esbuild ships as a single binary and runs everywhere Node.js runs. Our CI went from needing Rust toolchain + cargo-lambda + musl targets to needing… nothing extra. Just Node.js and pnpm, which we already had.

The Results

CI time: PR checks dropped from ~18 minutes (Rust typecheck + build + test + CDK synth) to ~6 minutes (TypeScript typecheck + test + CDK synth). The Rust build step alone was 8 minutes.

AMI complexity: removed entirely. No more Rust toolchain in the runner image. One fewer thing to version, bake, test, and roll out.

Lines of code: ~5,500 LOC of Rust became ~2,800 LOC of TypeScript. Not because we cut features - because the ecosystem did the heavy lifting. The Vercel AI SDK, ElectroDB, and the AWS SDK v3 are exceptionally well-designed libraries.

Cold starts: went from ~30ms to ~300ms. Nobody noticed or cared.

Developer velocity: one language across the entire stack. No context switching. Any engineer can debug any service. New hires need to know TypeScript, not TypeScript-and-also-Rust.

The Lesson

The Rust rewrite meme exists because Rust is genuinely excellent at what it does. If your Lambda is CPU-bound - image processing, data transformation, cryptographic operations, real-time computation - Rust is a fantastic choice and the cold start advantage matters.

But most Lambdas aren’t CPU-bound. Most Lambdas are glue. They receive an event, call some APIs, transform some data, and put the result somewhere. For that workload, the language runtime’s performance is noise compared to network latency. What matters is ecosystem quality, tooling simplicity, and team productivity.

I went against the trend, and the result was less code, faster CI, simpler infrastructure, and a team that could move faster. Sometimes the most pragmatic engineering decision is the unfashionable one.

We Rewrote Rust Code in TypeScript, and I Have No Regrets

2026-03-23 · 6 min read

Against the Trend

There’s a popular kind of engineering blog post with a very predictable headline: “We rewrote our system in Rust and it’s 47 times faster.” Those posts do very well online.

This is not that post.

We had six backend functions written in Rust. They worked fine. I decided to rewrite them all in TypeScript anyway. Not because Rust is bad, but because it was solving a problem we didn’t have, while creating problems we couldn’t ignore.

Why Rust in the First Place

These six functions handled our operations module: incoming email processing, searching through email history, and an AI-powered chat tool for internal staff. When I first built them, I chose Rust for the reasons people always choose Rust: it’s fast, memory-efficient, and type-safe. The plan was to run them as serverless functions on AWS with impressively low cold-start times.

On paper, a clean choice. In practice, a CI nightmare.

The CI Spiral

The trouble started with our build system. We use self-hosted CI runners on AWS, and getting the Rust build tool (cargo-lambda) to work reliably on those runners turned into a multi-day adventure in frustration.

The first attempt relied on a particular way of invoking the tool. It worked locally and broke on CI. The second attempt installed the tool differently. That broke because the installer put it somewhere the system couldn’t find. The third attempt downloaded the tool manually. That needed special permissions and still had path issues. The fourth attempt worked, until the next time we rebuilt our runner machines with a slightly different setup and everything broke again.

We were five “fix the fix” commits deep, each one a one-line change followed by a twelve-minute wait for CI feedback. We’d already burned a full day on this.

Then came the final insult: running the Rust tests caused our CI machines to run out of memory, because Rust compiles all its test programs simultaneously. We had to split the tests into smaller batches with tighter limits, and test feedback ballooned to eight minutes for code that ran in a fraction of a second.

The Uncomfortable Question

Somewhere between the fifth CI fix and the memory investigation, I stopped and asked the question I should have asked at the start:

What is Rust actually giving us here?

All six of these functions were doing the same basic thing: calling other services. Read from the database. Write to storage. Call the AI model. Publish an event. Wait for a network response. Repeat.

Rust is genuinely excellent at compute-intensive work: processing data in tight loops, handling binary protocols, doing calculations as fast as physically possible. None of that applied here. These functions spent 95% of their time waiting for other services to respond. Whether they waited in Rust or TypeScript was completely irrelevant to how fast they were.

The cold-start time advantage was real: Rust functions started up about ten times faster than TypeScript equivalents. But these functions processed incoming emails. Nobody cares if email classification takes an extra 300 milliseconds.

Meanwhile, the costs were very real:

CI complexity: every time we touched the runner infrastructure, we had to re-solve the Rust build tool setup problem
Language switch overhead: the entire rest of our system is TypeScript. Maintaining expertise in Rust for one module meant context-switching into a language we used maybe five percent of the time
Ecosystem gap: the TypeScript world has vastly better libraries for the things these functions actually did: calling AWS services, working with AI models, parsing emails

We were paying a real, recurring tax for a performance benefit that simply didn’t apply.

The Migration

I decided to migrate one function at a time rather than rewriting everything at once and hoping it all worked. Each function got its own pull request and its own deployment. Because the functions communicated through an event system, we could swap them out individually without breaking anything.

We worked through them bottom-up: the simplest function first as a proof of concept, then the shared code library, then progressively more complex functions, and finally the big one, which was the main email processor.

The final step was deleting all the Rust code. That single deletion removed nearly eight thousand lines from the dependency lock file alone.

What TypeScript Actually Looks Like Here

The thing that made this feel obvious in hindsight was how much simpler the TypeScript versions were. Not because TypeScript is a better language in general, but because our specific use case (calling APIs, handling responses, transforming data) is exactly what the TypeScript ecosystem is optimised for.

Our email classification function went from roughly 120 lines of Rust, all carefully handling request serialisation, response parsing, and error management, to around 15 lines of TypeScript using a library that did all of that automatically.

The AI chat function went from 350 lines of Rust, including hand-written code for managing multi-step AI tool calls, to a pattern where the library handled all of that automatically, including running the tool, feeding the result back to the AI, and continuing the conversation.

The infrastructure code simplified just as dramatically. Our custom Rust build setup was about 80 lines of configuration. The TypeScript replacement was five lines using a standard CDK construct that ships with no extra dependencies.

The Results

CI time for our pull request checks dropped from 18 minutes to 6 minutes. The Rust build step alone had been taking 8 minutes.

Lines of code went from roughly 5,500 (Rust) to roughly 2,800 (TypeScript), without removing any features. The libraries we used simply handled more for us.

Cold-start time increased from 30ms to 300ms. Nobody noticed or cared.

Every engineer on the team can now read and debug every part of the system. No more context-switching between languages.

The Lesson

Rust is excellent at what it does. If your code spends most of its time doing computation, Rust is a legitimate choice and the performance benefits are real. But most backend functions are not like that. Most backend functions are glue: receive something, call some services, put the result somewhere.

For that kind of work, what matters is not raw performance but ecosystem quality, tooling simplicity, and how quickly your team can ship and debug. I went against the trend, and ended up with less code, faster CI, simpler infrastructure, and more time to work on things that actually mattered.

Sometimes the most pragmatic decision is the unfashionable one.