Workers runtime mental model: lifecycle, context, limits

TL;DR

The Workers runtime is a V8 isolate executing handlers of the form fetch(request, env, ctx). A request goes through six steps: PoP receives → isolate lookup (warm or ~5ms cold start) → handler runs → subrequests fan out → response returns → waitUntil runs in the background.

The single biggest misconception when coming from Lambda:

CPU time is not wall time. await fetch() doesn’t count against CPU time, but a compute-heavy for loop does. The Free plan gives you 10ms CPU; the Paid plan defaults to 50ms. Exceed it and the request is killed.

This post covers: the detailed lifecycle, the three objects passed into the handler (Request, env, ctx), per-plan limits, waitUntil for background tasks, passThroughOnException for fallback, and six common misconceptions. Code samples come from the Worker currently running this blog.

Who this is for

Developers who’ve read Part 1 and are about to write their first handler.
Anyone building a Worker and hitting Worker exceeded CPU time limit or Too many subrequests.
Anyone who wants to know why console.log in a loop occasionally eats the CPU budget.

Prerequisites: Promises / async-await in JS, the Fetch API (Request, Response, Headers).

After this post you’ll:

Understand the 6-step lifecycle of a request.
Distinguish CPU time from wall time.
Know when to use ctx.waitUntil vs ctx.passThroughOnException.
Avoid six common misconceptions that cause handlers to leak state or hit limits.

What this post isn’t about

Specific bindings (D1, KV, R2): Parts 3-8.
Hono or other router frameworks: Part 9. We use vanilla fetch here.
Wrangler and local dev: Part 4.
Pricing details: Part 19.

Anatomy: the handler is just a function

A standard Worker module exports an object with up to three handlers:

// src/index.ts
export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    return new Response("Hello from the edge");
  },

  async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext): Promise<void> {
    // runs on cron trigger from wrangler.jsonc
  },

  async queue(batch: MessageBatch, env: Env, ctx: ExecutionContext): Promise<void> {
    // consumer for Queues
  }
} satisfies ExportedHandler<Env>;

Three handlers, three different inputs, same pattern:

First argument is the event trigger (Request, ScheduledEvent, MessageBatch).
env holds every binding (DB, KV, R2, secrets).
ctx is the ExecutionContext, giving you waitUntil and passThroughOnException.

No app.listen(port). No framework-level Express middleware chain. No process.env. Just one function called for every request, and the platform handles the rest.

Request lifecycle: 6 steps

Workers request lifecycle: the request hits the nearest PoP via anycast, the router selects a Worker, a V8 isolate is either warm or goes through a cold start, the fetch handler runs, subrequests fan out in parallel to D1/KV/R2/origin/Workers AI, the response returns to the client, and waitUntil runs background tasks after the response has left.

① Request arrives at a PoP

DNS for example.com points to Cloudflare. Anycast routing delivers the request to the nearest PoP (usually < 50ms RTT).

② Isolate lookup

Cloudflare looks for an already-compiled isolate for this Worker:

Warm isolate: reused, handler runs after ~50µs. This is the majority of requests in production.
Cold start: if no isolate exists at this PoP yet, V8 compiles the script. That takes ~5ms.

There’s no “scale to zero” 15-minute wait and another cold start like Lambda. Cloudflare keeps isolates warm for a long time, and cold starts are cheap enough that they aren’t a priority to optimise.

③ The fetch handler runs

The handler receives (request, env, ctx), does something, returns a Response.

async fetch(request, env, ctx) {
  const url = new URL(request.url);
  if (url.pathname === "/api/posts") {
    const rows = await env.DB.prepare("SELECT * FROM posts").all();
    return Response.json(rows);
  }
  return new Response("Not found", { status: 404 });
}

④ Subrequests fan out

Every await env.DB.prepare(...) or await fetch("https://api.com/...") is a subrequest. There’s a limit:

Free plan: 50 subrequests / request.
Paid plan: 1000 subrequests / request.

Bindings (D1, KV, R2, Queues, AI) all count as subrequests. If you loop over 100 D1 rows and each row calls KV, you’ll burn through the budget immediately.

⑤ Response returns

When the handler returns a Response, Cloudflare streams the body back to the client. After this point, the response has left the isolate.

⑥ waitUntil runs in the background

ctx.waitUntil(promise) registers a Promise the platform will await after the response has already gone out. Use it for logging, analytics, cache warmup — anything that doesn’t need to block the response.

async fetch(request, env, ctx) {
  const response = await handleRequest(request, env);

  // Log async, don't block the response
  ctx.waitUntil(
    env.DB.prepare("INSERT INTO views (slug, ts) VALUES (?, ?)")
      .bind(slug, Date.now())
      .run()
  );

  return response;
}

The user sees the response immediately; the log write happens after.

The 3 handler objects

Request

Standard Fetch API, plus a cf property with edge metadata:

async fetch(request: Request, env: Env, ctx: ExecutionContext) {
  const url = new URL(request.url);
  const method = request.method;
  const body = await request.json();

  // Cloudflare-specific
  const country = request.cf?.country;        // 'VN', 'US', ...
  const colo = request.cf?.colo;              // 'SIN', 'HKG' — the PoP code
  const tlsVersion = request.cf?.tlsVersion;  // 'TLSv1.3'
  const botScore = request.cf?.botManagement?.score;
}

For analytics and geo-routing, request.cf is enough without pulling in an external service.

env

Contains every binding + secret + var from wrangler.jsonc:

interface Env {
  DB: D1Database;
  KV: KVNamespace;
  BUCKET: R2Bucket;
  QUEUE: Queue;
  AI: Ai;
  VECTORIZE: VectorizeIndex;
  // secrets
  RESEND_API_KEY: string;
  // plain vars
  SITE_ORIGIN: string;
}

No process.env. No dotenv. Secrets are injected at compile time and protected by Wrangler.

ctx (ExecutionContext)

Two methods:

ctx.waitUntil(promise): extend the lifecycle past the response.
ctx.passThroughOnException(): if the handler throws, Cloudflare forwards the request to the origin (useful when the Worker sits in front of a zone with a real origin).

CPU time vs wall time

This is the single biggest source of confusion.

Wall time: real elapsed time from when the handler starts until it ends, including time spent awaiting I/O. Capped at 30s on every plan.

CPU time: only counts time the CPU is actually busy running your code. await fetch() is waiting on the network → CPU is idle → not counted.

async fetch(request, env) {
  await fetch("https://slow-api.com/data");       // 2s wall, ~0ms CPU
  await new Promise(r => setTimeout(r, 5000));    // 5s wall, ~0ms CPU
  for (let i = 0; i < 1e7; i++) { sum += i; }     // 50ms wall, 50ms CPU
}

The first two lines are fine even though wall time is ~7s. The third line is where danger lives.

Per-plan limits

Workers runtime limits table: CPU time, wall time, memory, subrequest count, request body size, response size, Worker script size, compared across Free plan, Paid Bundled, and Paid Unbound.

Worth memorising:

Free plan 10ms CPU: enough for a simple, I/O-bound API. Parsing a large JSON, heavy regex, RSA-signature JWT verify will blow through it.
Paid Bundled 50ms CPU (default), configurable up to 30s: fine for 99% of use cases.
Paid Unbound 30s CPU: for compute-heavy workloads, but billed per CPU-ms rather than bundled.

This blog runs on Paid Bundled with 50ms. I had to split one handler that was embedding 200 documents in one request because it exceeded 50ms. Now it fans out via a Queue, one message per document.

When you’ll hit the CPU limit

RSA signature JWT verification on every request.
Parsing large CSV or markdown.
Running heavy regex against request body.
Base64 encoding/decoding large binary blobs.
Running Pagefind-like indexing inside the worker.

Fix: cache (KV, Cache API), offload to Queues, or use a specialised binding (AI, Stream).

waitUntil: the most important pattern

waitUntil is the canonical way to run work “after the response has gone out”. Use it for anything that doesn’t block the user:

Log a page view

async fetch(request, env, ctx) {
  const response = await render(request, env);

  ctx.waitUntil(logView(request, env));

  return response;
}

async function logView(request: Request, env: Env) {
  const slug = new URL(request.url).pathname;
  await env.DB.prepare(
    "INSERT INTO post_views_daily (slug, day, count) VALUES (?, ?, 1) " +
    "ON CONFLICT(slug, day) DO UPDATE SET count = count + 1"
  ).bind(slug, today()).run();
}

Cache warmup after a miss

async fetch(request, env, ctx) {
  const cached = await env.KV.get(key);
  if (cached) return new Response(cached);

  const fresh = await computeExpensive();
  ctx.waitUntil(env.KV.put(key, fresh, { expirationTtl: 3600 }));

  return new Response(fresh);
}

The user doesn’t wait for the KV write. The handler returns immediately; the KV write runs in the background.

Analytics event

ctx.waitUntil(
  env.ANALYTICS.writeDataPoint({
    blobs: [slug, userAgent],
    doubles: [readingTime],
    indexes: [country],
  })
);

A big gotcha

waitUntil does not bypass the 30s wall-time limit. If your background task takes a minute, it gets killed. For long tasks, use a Queue, not waitUntil.

passThroughOnException: fallback to origin

When a Worker runs on a zone with a real origin (not workers.dev), you can fall back if the handler throws:

async fetch(request, env, ctx) {
  ctx.passThroughOnException();

  try {
    return await handleRequest(request, env);
  } catch (err) {
    // rethrow so CF falls through to origin
    throw err;
  }
}

Useful in the early days of adding a Worker in front of an origin — a handler crash doesn’t take the site down, the request just goes straight to the origin.

This blog doesn’t use it (no backing origin, everything is Workers Assets). But if you have nginx/Rails behind and the Worker is a CDN middleware, this is a mandatory safety net.

6 common misconceptions

① “A module-level variable is a cache”

// WRONG
let cache = new Map();

export default {
  async fetch(request, env) {
    if (cache.has(key)) return new Response(cache.get(key));
    // ...
  }
};

Isolates can be recycled at any time. This cache isn’t guaranteed to survive between two requests. Sometimes it hits, sometimes it misses. Use KV or the Cache API.

② “Cold starts are a problem”

Lambda developers are used to optimising cold starts via provisioned concurrency. Workers cold starts are ~5ms — don’t bother. Spend that effort on CPU time and subrequest count instead.

③ “`await` inside a `for` loop is fine”

// WRONG — runs sequentially, burns wall time
for (const id of ids) {
  const row = await env.DB.prepare("SELECT * FROM posts WHERE id = ?").bind(id).first();
  results.push(row);
}

// RIGHT — in parallel
const rows = await Promise.all(
  ids.map(id => env.DB.prepare("SELECT * FROM posts WHERE id = ?").bind(id).first())
);

With D1 an IN (?) clause is even better. But the general principle: parallel, not sequential.

④ “`waitUntil` runs forever”

The 30s wall-time limit still applies to waitUntil. Background tasks longer than 30s need to go through a Queue.

⑤ “`env` is global”

env is injected per request — it’s not a global. You can’t import { env } from "..." at module level. Pass it through function arguments.

⑥ “`console.log` is free”

console.log doesn’t cost significant CPU time, but each log line costs you something on Tail Workers if you’ve enabled them. Don’t spam logs in the hot path. Use structured logs on failure paths.

Production checklist

The handler doesn’t touch module-level mutable state (cache, counter, etc.).
Every task that doesn’t block the user is wrapped in ctx.waitUntil().
Loops over collections use Promise.all, not for await.
CPU-heavy work (crypto, large parses, heavy regex) is offloaded via Queues or cached.
Subrequest count is controlled (< 50 Free, < 1000 Paid).
Error handling doesn’t throw back to the platform (return a 5xx JSON instead of letting it crash).
If there’s a backing origin, ctx.passThroughOnException() is enabled.
No setTimeout longer than the wall-time limit.

Wrap-up

The Workers runtime, at its core, is: a function running in a V8 isolate, receiving a request, returning a response, all within 30s wall time and 50ms CPU time (Paid default). Bindings are the only way to talk to the outside world. ctx.waitUntil is the only legitimate way to run background work.

Accept this mental model, and every optimisation and design decision after it has somewhere to land.

Part 3 goes into the 3-binding mental model: Request → Identity → Storage. That’s the common frame for every Worker, from a simple API to a full-stack app.

Workers runtime mental model: lifecycle, context, limits

TL;DR

Who this is for

What this post isn’t about

Anatomy: the handler is just a function

Request lifecycle: 6 steps

① Request arrives at a PoP

② Isolate lookup

③ The fetch handler runs

④ Subrequests fan out

⑤ Response returns

⑥ waitUntil runs in the background

The 3 handler objects

Request

env

ctx (ExecutionContext)

CPU time vs wall time

Per-plan limits

When you’ll hit the CPU limit

waitUntil: the most important pattern

Log a page view

Cache warmup after a miss

Analytics event

A big gotcha

passThroughOnException: fallback to origin

6 common misconceptions

① “A module-level variable is a cache”

② “Cold starts are a problem”

③ “`await` inside a `for` loop is fine”

④ “`waitUntil` runs forever”

⑤ “`env` is global”

⑥ “`console.log` is free”

Production checklist

Wrap-up

References

Mentions from the web

Ask the blog

Sources

TL;DR

Who this is for

What this post isn’t about

Anatomy: the handler is just a function

Request lifecycle: 6 steps

① Request arrives at a PoP

② Isolate lookup

③ The fetch handler runs

④ Subrequests fan out

⑤ Response returns

⑥ waitUntil runs in the background

The 3 handler objects

Request

env

ctx (ExecutionContext)

CPU time vs wall time

Per-plan limits

When you’ll hit the CPU limit

waitUntil: the most important pattern

Log a page view

Cache warmup after a miss

Analytics event

A big gotcha

passThroughOnException: fallback to origin

6 common misconceptions

① “A module-level variable is a cache”

② “Cold starts are a problem”

③ “await inside a for loop is fine”

④ “waitUntil runs forever”

⑤ “env is global”

⑥ “console.log is free”

Production checklist

Wrap-up

References

Related reading

Migrating AWS/Vercel to Cloudflare: a real playbook

Cloudflare Developer Platform cost model: tiers vs AWS

Worker security: secrets, CSP, Bot Management, Turnstile

③ “`await` inside a `for` loop is fine”

④ “`waitUntil` runs forever”

⑤ “`env` is global”

⑥ “`console.log` is free”