Durable Objects for realtime: chat, collab, game state

Durable Objects are Cloudflare's single-writer primitive: 1 roomId = 1 instance, WebSocket Hibernation, persistent storage. 6 patterns, the API, and when DOs are overkill.

· 8 min read · Đọc bản tiếng Việt
Six Durable Object realtime patterns: chat rooms, CRDT collab editor, game tick, presence, rate limiter, single-flight — plus WebSocket Hibernation, SQL storage beta, and alarm scheduling

TL;DR

A Durable Object (DO) is the only primitive on the Workers platform that guarantees single-writer semantics for a given ID. Each DO instance runs at one PoP, holds state in memory + persistent storage, and auto-migrates when a PoP fails.

Main thesis:

DOs are not a database replacement. They’re a state coordination layer. Use them right for: chat rooms (1 room = 1 DO), collab editors (1 doc = 1 DO), game matches (1 match = 1 DO), per-user rate limiters. Use them wrong (typical CRUD, storing a whole user table) = scaling bottleneck.

This post covers: DO mental model, the WebSocket Hibernation API, 6 real-world patterns with code, storage options (KV vs SQL beta), migrations + deploy, alarms for scheduled work, and when DOs are overkill.

Part 8 covered Queues + DO basics quickly. This post goes deep into the realtime side of DOs.


Who this is for

  • Developers building chat, collab, game, or presence features.
  • Anyone on Pusher/Ably/Socket.io who wants to self-host on the edge.
  • Teams needing stateful rate limiters with strict consistency (not eventual).

Recommended prerequisites: Part 2 (runtime), Part 8 (Queues + DO basics).

By the end of this post you will:

  • Build a WebSocket chat room with a DO in under 50 lines of code.
  • Understand the Hibernation API for a 10x cost cut.
  • Know when DO state fits and when you need an external database.
  • Implement a basic CRDT collab editor.

What this post isn’t about

  • Full CRDT implementation: Yjs / Automerge internals are complex, not covered here. Focus is on using DOs as a relay + snapshot server.
  • WebRTC: peer-to-peer video/audio — DOs can handle signaling, but the media plane is a separate topic.
  • Multi-region replication: DOs are single-region instances. Replication needs its own design pattern.

DO mental model

DO realtime architecture: clients A/B/C connect via WebSocket through the Worker, the Worker looks up the DO ID from roomId (env.ROOM.idFromName) and forwards the request. A single DO instance holds a Set of sessions and a messages log. Broadcast sends back to all clients. The Hibernation API avoids charging compute for idle connections. Persistent storage + SQL beta.

Three core properties:

1. Single-writer per ID

1 ID = 1 instance at a time. No races, no locks. The Worker routes to the right instance by ID.

// Worker
const id = env.ROOM.idFromName("general");  // hash ID → consistent
const stub = env.ROOM.get(id);
return stub.fetch(request);  // forward to DO

1000 clients connecting to "general" at the same time → all routed to the same DO instance. The DO processes sequentially (event loop) → no application-level locking needed.

2. In-memory state + persistent storage

The DO keeps state in memory (fast). storage.put/get persist to disk (Cloudflare-managed). When the DO restarts (deploy, hibernation, PoP move) → memory state is lost, storage survives.

export class RoomDO {
  sessions: Set<WebSocket> = new Set();  // memory only

  constructor(readonly state: DurableObjectState) {}

  async initialize() {
    // Load from storage if needed
    const history = await this.state.storage.get<Msg[]>("history") ?? [];
  }
}

3. Sticky location

A DO instance runs at the PoP nearest the first request. Subsequent requests for the same ID are routed back to that PoP. If the PoP fails, the DO migrates to another PoP (automatically).

Implication: a user in Vietnam creates a room → the DO lives in Singapore. Another Singapore user in the same room = fast. A US user in the same room = +~200ms RTT per message. For global apps you may need to shard rooms by region.


First DO setup

wrangler.jsonc

{
  "name": "my-chat",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-01",
  "durable_objects": {
    "bindings": [
      { "name": "ROOM", "class_name": "RoomDO" }
    ]
  },
  "migrations": [
    { "tag": "v1", "new_sqlite_classes": ["RoomDO"] }
  ]
}

new_sqlite_classes — uses the SQLite storage backend (new, recommended). new_classes = KV backend (legacy).

src/index.ts

export { RoomDO } from "./room";

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    const roomId = url.pathname.split("/")[2];  // /room/general/ws

    if (!roomId) return new Response("Missing roomId", { status: 400 });

    const id = env.ROOM.idFromName(roomId);
    const stub = env.ROOM.get(id);
    return stub.fetch(request);
  },
} satisfies ExportedHandler<Env>;

src/room.ts

export class RoomDO {
  sessions: Set<WebSocket> = new Set();

  constructor(readonly state: DurableObjectState, readonly env: Env) {}

  async fetch(request: Request): Promise<Response> {
    const pair = new WebSocketPair();
    const [client, server] = [pair[0], pair[1]];

    await this.handleSession(server);

    return new Response(null, { status: 101, webSocket: client });
  }

  async handleSession(ws: WebSocket) {
    ws.accept();
    this.sessions.add(ws);

    ws.addEventListener("message", (msg) => {
      this.broadcast(msg.data as string);
    });

    ws.addEventListener("close", () => {
      this.sessions.delete(ws);
    });
  }

  broadcast(message: string) {
    for (const session of this.sessions) {
      try {
        session.send(message);
      } catch {
        this.sessions.delete(session);
      }
    }
  }
}

Deploy:

npx wrangler deploy

Connect from the browser:

const ws = new WebSocket("wss://my-chat.<subdomain>.workers.dev/room/general/ws");
ws.onmessage = (e) => console.log(e.data);
ws.send("hello");

Under 50 lines of code, full chat room. N clients connected to “general” all receive each other’s messages.


WebSocket Hibernation API

Standard WebSocket problem: every active connection keeps a DO instance running continuously, charging compute every second. 100 idle users = 100 DOs running.

Hibernation: the DO “sleeps” when there are no events, wakes on a message. Idle connections don’t get charged compute.

How to use it

Replace ws.accept() + addEventListener with:

async fetch(request: Request): Promise<Response> {
  const pair = new WebSocketPair();
  const [client, server] = [pair[0], pair[1]];

  // Instead of ws.accept():
  this.state.acceptWebSocket(server, ["user:123", "room:general"]);
  // tags let the DO find the connection after it wakes

  return new Response(null, { status: 101, webSocket: client });
}

// Handler at the class level instead of in a closure
async webSocketMessage(ws: WebSocket, message: string) {
  this.broadcast(message);
}

async webSocketClose(ws: WebSocket, code: number, reason: string) {
  // cleanup
}

async webSocketError(ws: WebSocket, error: unknown) {
  // log
}

Broadcast uses getWebSockets():

broadcast(message: string) {
  const sockets = this.state.getWebSockets();  // includes hibernated
  for (const ws of sockets) {
    try {
      ws.send(message);
    } catch {
      // socket closed
    }
  }
}

Trade-off

  • Pros: 100 idle connections are only charged storage (~$0.20/GB/month), not compute.
  • Cons: handlers are class methods, not closures. State has to be loaded from this.

Use Hibernation for 99% of cases. Standard WebSocket only when you need closure-captured private variables (rare).


Pattern 1: chat room

6 realtime patterns: chat (broadcast), collab editor (CRDT + snapshot), game state (authoritative tick), presence (heartbeat + expire), rate limiter (per-user counter), single-flight (dedupe expensive calls).

Full implementation with Hibernation + storage:

interface Msg {
  id: string;
  user: string;
  text: string;
  ts: number;
}

export class RoomDO {
  constructor(readonly state: DurableObjectState, readonly env: Env) {}

  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname.endsWith("/ws")) {
      // WebSocket upgrade
      const pair = new WebSocketPair();
      const [client, server] = [pair[0], pair[1]];

      const user = url.searchParams.get("user") ?? "anon";
      this.state.acceptWebSocket(server, [`user:${user}`]);

      // Send history
      const history = await this.state.storage.get<Msg[]>("history") ?? [];
      server.send(JSON.stringify({ type: "history", messages: history }));

      return new Response(null, { status: 101, webSocket: client });
    }

    return new Response("Not found", { status: 404 });
  }

  async webSocketMessage(ws: WebSocket, raw: string) {
    const { text } = JSON.parse(raw);
    const tags = this.state.getTags(ws);
    const user = tags.find((t) => t.startsWith("user:"))?.slice(5) ?? "anon";

    const msg: Msg = {
      id: crypto.randomUUID(),
      user,
      text,
      ts: Date.now(),
    };

    // Persist (keep last 100)
    const history = await this.state.storage.get<Msg[]>("history") ?? [];
    history.push(msg);
    if (history.length > 100) history.splice(0, history.length - 100);
    await this.state.storage.put("history", history);

    // Broadcast
    const payload = JSON.stringify({ type: "message", message: msg });
    for (const socket of this.state.getWebSockets()) {
      try {
        socket.send(payload);
      } catch {}
    }
  }

  async webSocketClose(ws: WebSocket) {
    // Hibernation auto-manages, no manual cleanup needed
  }
}

Pattern 2: collab editor (Yjs)

Yjs is a popular CRDT library. DOs act as the relay + snapshot server.

import * as Y from "yjs";

export class DocDO {
  ydoc: Y.Doc;
  snapshotTimer: number | null = null;

  constructor(readonly state: DurableObjectState) {
    this.ydoc = new Y.Doc();
    state.blockConcurrencyWhile(async () => {
      // Load snapshot from storage
      const snapshot = await state.storage.get<Uint8Array>("snapshot");
      if (snapshot) {
        Y.applyUpdate(this.ydoc, snapshot);
      }
    });
  }

  async fetch(request: Request): Promise<Response> {
    const pair = new WebSocketPair();
    const [client, server] = [pair[0], pair[1]];

    this.state.acceptWebSocket(server);

    // Send current state
    const state = Y.encodeStateAsUpdate(this.ydoc);
    server.send(state);

    return new Response(null, { status: 101, webSocket: client });
  }

  async webSocketMessage(ws: WebSocket, data: ArrayBuffer) {
    const update = new Uint8Array(data);
    Y.applyUpdate(this.ydoc, update);

    // Broadcast to other clients
    for (const socket of this.state.getWebSockets()) {
      if (socket !== ws) {
        try {
          socket.send(update);
        } catch {}
      }
    }

    // Schedule snapshot (debounce)
    this.scheduleSnapshot();
  }

  scheduleSnapshot() {
    if (this.snapshotTimer) return;
    this.snapshotTimer = setTimeout(async () => {
      const snapshot = Y.encodeStateAsUpdate(this.ydoc);
      await this.state.storage.put("snapshot", snapshot);
      this.snapshotTimer = null;
    }, 5000) as unknown as number;
  }
}

CRDT property: merge operations don’t conflict, order doesn’t matter. The DO holds the canonical version and broadcasts updates to every client.

Snapshots are debounced to avoid writing storage too often.


Pattern 3: game state (authoritative server)

Games need regular ticks. DO alarms let you schedule future work.

export class GameDO {
  state: GameState = { players: {}, ballPos: { x: 0, y: 0 } };

  constructor(readonly doState: DurableObjectState) {
    doState.blockConcurrencyWhile(async () => {
      const saved = await doState.storage.get<GameState>("state");
      if (saved) this.state = saved;
    });
  }

  async fetch(request: Request): Promise<Response> {
    const pair = new WebSocketPair();
    const [client, server] = [pair[0], pair[1]];

    const userId = new URL(request.url).searchParams.get("userId")!;
    this.doState.acceptWebSocket(server, [`player:${userId}`]);

    // Start tick loop (alarm)
    const alarm = await this.doState.storage.getAlarm();
    if (!alarm) {
      await this.doState.storage.setAlarm(Date.now() + 16);  // 60fps
    }

    return new Response(null, { status: 101, webSocket: client });
  }

  async webSocketMessage(ws: WebSocket, raw: string) {
    const input = JSON.parse(raw);
    const userId = this.doState.getTags(ws).find((t) => t.startsWith("player:"))?.slice(7);
    if (!userId) return;

    // Validate + apply input to game state
    this.applyInput(userId, input);
  }

  async alarm() {
    // Tick: update physics, check collisions
    this.tick();

    // Broadcast state
    const payload = JSON.stringify({ type: "state", state: this.state });
    for (const ws of this.doState.getWebSockets()) {
      try {
        ws.send(payload);
      } catch {}
    }

    // Schedule next tick
    if (this.doState.getWebSockets().length > 0) {
      await this.doState.storage.setAlarm(Date.now() + 16);
    }
  }

  tick() { /* physics update */ }
  applyInput(userId: string, input: any) { /* ... */ }
}

Alarms are the idiomatic way to schedule. Don’t use setInterval — DOs can hibernate, and intervals get dropped.


Pattern 4: presence

User online/offline, typing indicators.

export class PresenceDO {
  users: Map<string, { lastSeen: number; status: string }> = new Map();

  constructor(readonly state: DurableObjectState) {
    state.blockConcurrencyWhile(async () => {
      const saved = await state.storage.get<any>("users");
      if (saved) this.users = new Map(Object.entries(saved));
    });
  }

  async fetch(request: Request): Promise<Response> {
    const pair = new WebSocketPair();
    const [client, server] = [pair[0], pair[1]];

    const userId = new URL(request.url).searchParams.get("userId")!;
    this.state.acceptWebSocket(server, [`user:${userId}`]);

    this.users.set(userId, { lastSeen: Date.now(), status: "online" });
    this.broadcastPresence();

    // Schedule cleanup
    const alarm = await this.state.storage.getAlarm();
    if (!alarm) {
      await this.state.storage.setAlarm(Date.now() + 60_000);
    }

    return new Response(null, { status: 101, webSocket: client });
  }

  async webSocketMessage(ws: WebSocket, raw: string) {
    const userId = this.state.getTags(ws).find((t) => t.startsWith("user:"))?.slice(5);
    if (!userId) return;

    const { type } = JSON.parse(raw);
    if (type === "heartbeat") {
      const user = this.users.get(userId);
      if (user) user.lastSeen = Date.now();
    } else if (type === "typing") {
      const user = this.users.get(userId);
      if (user) user.status = "typing";
      this.broadcastPresence();
    }
  }

  async alarm() {
    // Mark stale users offline
    const now = Date.now();
    let changed = false;
    for (const [id, user] of this.users) {
      if (now - user.lastSeen > 60_000 && user.status !== "offline") {
        user.status = "offline";
        changed = true;
      }
    }
    if (changed) this.broadcastPresence();

    // Reschedule
    if (this.users.size > 0) {
      await this.state.storage.setAlarm(Date.now() + 60_000);
    }
  }

  broadcastPresence() {
    const payload = JSON.stringify({
      type: "presence",
      users: Object.fromEntries(this.users),
    });
    for (const ws of this.state.getWebSockets()) {
      try {
        ws.send(payload);
      } catch {}
    }
  }
}

Pattern 5: rate limiter

Strict per-user rate limiting (not eventual like KV).

export class RateLimiter {
  constructor(readonly state: DurableObjectState) {}

  async fetch(request: Request): Promise<Response> {
    const { limit, windowSec } = await request.json();

    const now = Date.now();
    const windowStart = await this.state.storage.get<number>("windowStart") ?? now;
    const count = await this.state.storage.get<number>("count") ?? 0;

    if (now - windowStart > windowSec * 1000) {
      // Reset window
      await this.state.storage.put("windowStart", now);
      await this.state.storage.put("count", 1);
      return Response.json({ allowed: true, remaining: limit - 1 });
    }

    if (count >= limit) {
      return Response.json({ allowed: false, remaining: 0 }, { status: 429 });
    }

    await this.state.storage.put("count", count + 1);
    return Response.json({ allowed: true, remaining: limit - count - 1 });
  }
}

Worker side:

const userId = getUserId(request);
const id = env.RATE_LIMITER.idFromName(`user:${userId}`);
const stub = env.RATE_LIMITER.get(id);
const result = await stub.fetch(new Request("https://x", {
  method: "POST",
  body: JSON.stringify({ limit: 100, windowSec: 60 }),
}));

const { allowed } = await result.json();
if (!allowed) return new Response("Rate limited", { status: 429 });

Note: Cloudflare has native WAF rate-limiting rules for HTTP traffic — faster, no DO cost. Use a DO rate limiter when you need complex logic (sliding window, per-endpoint, business logic).


Pattern 6: single-flight (dedupe)

Prevent concurrent duplicate work. Example: 100 requests hitting the same LLM prompt simultaneously → call once, share the result.

export class SingleFlight {
  inFlight: Map<string, Promise<any>> = new Map();

  constructor(readonly state: DurableObjectState, readonly env: Env) {}

  async fetch(request: Request): Promise<Response> {
    const { key, work } = await request.json();

    if (!this.inFlight.has(key)) {
      this.inFlight.set(key, this.doExpensiveWork(work).finally(() => {
        this.inFlight.delete(key);
      }));
    }

    const result = await this.inFlight.get(key)!;
    return Response.json(result);
  }

  async doExpensiveWork(work: any): Promise<any> {
    // LLM call, heavy computation, external API
    return await this.env.AI.run(work.model, { messages: work.messages });
  }
}

DO single-writer guarantees that 100 concurrent requests to the same DO ID → 99 await the promise, 1 does the work. Result returns to all.


Storage: KV vs SQL

DOs have 2 storage backends:

KV storage (original)

await state.storage.put("key", value);
const v = await state.storage.get("key");
await state.storage.delete("key");
await state.storage.list({ prefix: "foo:" });

Simple, fast for key-value. ~128KB/value limit.

state.storage.sql.exec(`
  CREATE TABLE IF NOT EXISTS messages (
    id TEXT PRIMARY KEY,
    user TEXT,
    text TEXT,
    ts INTEGER
  )
`);

state.storage.sql.exec(
  "INSERT INTO messages VALUES (?, ?, ?, ?)",
  id, user, text, ts
);

const results = state.storage.sql.exec(
  "SELECT * FROM messages ORDER BY ts DESC LIMIT 50"
).toArray();

SQL storage:

  • SQLite per-DO, ~10GB limit.
  • Richer queries (JOIN, aggregate, index).
  • Better for collab state or complex game state.
  • new_sqlite_classes in migrations.

Recommend SQL storage for every new DO unless the use case is simple key-value.


Migrations between versions

DOs have a migration system separate from D1. Changing the class name or backend → a migration entry:

{
  "migrations": [
    { "tag": "v1", "new_sqlite_classes": ["RoomDO"] },
    { "tag": "v2", "new_sqlite_classes": ["GameDO"] },
    { "tag": "v3", "renamed_classes": [{ "from": "OldName", "to": "NewName" }] },
    { "tag": "v4", "deleted_classes": ["DeadDO"] }
  ]
}

Migrations apply at deploy. Existing DOs keep running, new DOs use the new class.

Careful: deleting a class = losing that DO’s data. There’s a warning at deploy time.


Alarm scheduling

setAlarm(timestamp) schedules alarm() to run at that timestamp. Single-flight (only one alarm pending). Persists across restarts.

await this.state.storage.setAlarm(Date.now() + 60_000);

async alarm() {
  // runs after 60s
  doCleanup();
  // reschedule if needed
  await this.state.storage.setAlarm(Date.now() + 60_000);
}

Use cases: cleanup stale data, game tick loop, send reminder emails, expire sessions.

Max 1 alarm per DO. Need multiple schedules → store a queue in storage, dispatch from alarm.


When NOT to use DOs

DOs are powerful but not for everything:

❌ User table with 1M+ rows

DO-per-user = 1M instances. Storage cost + management nightmare. Use a D1 table.

❌ Simple CRUD API

POST /posts, GET /posts — no coordination needed. D1 + a Worker handler is plenty.

❌ Global counter

Global state (app-wide view count) — a single DO instance becomes a bottleneck. Use Analytics Engine or a sharded D1.

❌ Queue processing

Fan-out background jobs — use Queues (Part 8).

❌ Read-heavy cache

Data read often, rarely written — KV is cheaper and faster.

✅ When DOs are right

  • Coordination: lock, serialize, dedupe.
  • Stateful WebSockets: chat, collab, game.
  • Per-entity state: per-user session, per-room, per-document.
  • Alarm scheduling: deadline, reminder, retry.

If you can’t see “coordination” or “WebSocket” or “per-entity stateful”, reconsider whether you need a DO.


Gotchas

① DO location is non-deterministic

DOs run at the PoP closest to the first request. A VN user creating a room → DO in Singapore. A US user joining → extra RTT. For global apps, consider:

  • Sharding IDs by region prefix (us-general, asia-general).
  • Accepting latency for rare cross-region use cases.

② Concurrent fetch vs event loop

DOs process requests sequentially (single-threaded event loop). 100 concurrent fetches to the same DO = queue. A slow fetch blocks others. Async I/O yields, but CPU-bound work blocks.

③ blockConcurrencyWhile for init

Loading state from storage in the constructor needs blockConcurrencyWhile so requests don’t execute before state is ready.

constructor(readonly state: DurableObjectState) {
  state.blockConcurrencyWhile(async () => {
    this.data = await state.storage.get("data");
  });
}

④ Hibernation loses closures

Standard WebSocket: closures capture variables. Hibernation: handlers are class methods, no capture. Refactor to use this.

⑤ Max 32768 connections per DO

A single DO instance caps around 32k WebSockets. Very large chat rooms need sharding across multiple DOs.

⑥ Storage transactions

SQL storage auto-commits per exec. Multiple execs that need atomicity:

state.storage.transaction(() => {
  state.storage.sql.exec("UPDATE ...");
  state.storage.sql.exec("INSERT ...");
});

⑦ Deleting a DO is hard

state.storage.deleteAll() clears storage, but the DO instance still exists. Truly delete = migrate with deleted_classes.

⑧ Local dev with DOs

wrangler dev emulates DOs through Miniflare. Hibernation API is supported from Miniflare v4+. A few behaviors (location, cross-PoP) can’t be replicated locally.


Observability

DO metrics:

  • Active DO count: how many DOs are running.
  • Duration (GB-s): compute time.
  • Storage (GB-month): storage size.
  • WebSocket connection count: active connections (hibernated don’t count compute).
  • Requests: fetches into DOs.
  • Error rate: % of DOs returning 5xx.

Dashboard: Workers & Pages → pick the Worker → Durable Objects tab.

Logs: tail on Workers logs includes DO logs. Details in Part 17.


Pricing snapshot

  • Requests: $0.15/1M requests (same as Workers).
  • Duration: $12.50/million GB-s (while DO is active).
  • Storage: $0.20/GB/month.
  • Hibernated WebSockets: no duration charge, storage only.

Example: 1000 chat rooms, avg 50 users/room, users active 1h/day, hibernation enabled:

  • Active duration: 1000 × 1h = 1000 DO-hours/day = ~3600 GB-s/day × $12.5/1M = $0.05/day = ~$1.5/month.
  • Storage: 1000 × 1MB = 1GB = $0.20/month.
  • Requests: negligible.

~$2/month for 1000 active chat rooms. Self-hosting Socket.io on the smallest AWS EC2 (t3.small) = $15/month, no auto-scaling.


Production checklist

  • new_sqlite_classes for every new DO.
  • Hibernation API (acceptWebSocket) for WebSocket use cases.
  • blockConcurrencyWhile for initializing state from storage.
  • Broadcast via getWebSockets(), don’t manually track sessions.
  • Error handling around send() (sockets may have closed).
  • Optimize storage writes (batch, debounce snapshots).
  • Migration entries for every class change.
  • Reschedule alarms when work remains, stop when empty.
  • Tag WebSockets to identify users after hibernation.
  • Monitor DO count + storage to detect DO leaks.
  • Don’t use DOs for use cases that don’t need coordination.

Wrap-up

Durable Objects are a unique primitive: single-writer coordination + WebSocket + persistent state + alarms. That combo lets you build chat, collab, and games at the edge when most other platforms need external infra (Redis + a cluster + Socket.io).

But DOs aren’t a database. The right tool for a coordination layer, the wrong tool for ordinary CRUD.

Part 16: Stream + Images — video streaming, on-the-fly image resizing, and CDN patterns for media. Closes Block 4.


References