cloudflare/agents trên Workers + Durable Objects — production patterns

TL;DR

cloudflare/agents là framework dùng cho AI agent chạy trên Workers + Durable Objects. Mỗi instance agent = 1 DO. State sống qua hibernation, tool gọi đồng bộ, WebSocket cho streaming.

Khác với LangGraph hay CrewAI chạy trên server stateful: Agents framework hibernate khi idle (không tính tiền CPU), wake bằng request/alarm/WebSocket. 1M agent-minutes/tháng ~ $0.04/agent trên Workers Paid plan.

Pattern chính: Agent class với onMessage, onStateUpdate, schedule(). State được persist tự động khi this.setState({...}). Tool đăng ký qua decorator hoặc registry, gọi Claude/OpenAI bằng built-in adapter.

Multi-agent coordination: agent A fetch('rpc://agent-b/method') hoặc qua message bus DO. Tránh fan-out > 32 agent đồng thời cho 1 task — DO limit subrequest 50.

Đừng dùng Agents framework cho one-shot completion (Workers thuần đủ). Nó chỉ trả giá cho long-running, multi-turn, có schedule/cron, hoặc cần persistent context > 5 phút.

Pair với Workers AI để tránh egress cost LLM. Pair với Vectorize cho RAG. Pair với D1 cho audit log.

Vì sao tôi chọn cloudflare/agents thay vì self-host LangGraph

Team Platform có yêu cầu agent monitor 500 service health, mỗi 15 phút gọi API check, lưu trạng thái, gửi alert nếu thay đổi. Phương án đầu tiên là LangGraph trên ECS Fargate — 2 task t3.small, RDS Postgres cho state. Cost ước tính $180/tháng cho 500 agent (1 task xoay vòng), nhưng latency cold start ECS ~3-5s mỗi khi scale.

Phương án Cloudflare Agents: mỗi service = 1 agent = 1 Durable Object. 500 DO instance, mỗi cái idle 99% thời gian (chỉ wake mỗi 15 phút trong 200ms). Tính theo công thức Workers Paid:

500 agent × 4 wake/giờ × 24h × 30d = 1.44M invocation/tháng
Mỗi invocation ~ 50ms CPU = 72,000 CPU-second
Workers Paid: $5/tháng (10M req included) + $0.02/M req thêm + $12.50/M GB-second wall time

Tổng dưới $15/tháng cho 500 agent. Đó là 90% saving. Sau 4 tháng vận hành 500 service health agent, viết bài này.

Cloudflare Agents là gì — không phải LangChain trên Workers

cloudflare/agents là TypeScript framework Cloudflare publish 2025 sau khi Durable Objects (DO) trưởng thành. Bản chất:

Mỗi agent = 1 DO với ID stable. State được persist qua DO storage tự động.
Hibernation API từ WebSocket Hibernation: DO unload khỏi memory khi idle nhưng vẫn giữ WebSocket connection ở edge. Wake khi có message.
Tool registry: hàm TypeScript với JSON schema, agent gọi bằng RPC tới Worker khác hoặc external API.
Schedule: this.schedule(cronExpr | duration, callback) dùng DO Alarm dưới capo.
State sync: this.setState() triggers persist + broadcast tới WebSocket subscriber.

So với LangGraph: Agents framework không có node/edge graph DSL. Bạn viết code TypeScript thuần. Đó là feature, không phải bug — debug stack trace JS dễ hơn trace JSON state machine.

So với CrewAI: không có “role-playing” abstraction. Agent chỉ là class với handler. Tự build crew bằng cách 1 orchestrator agent gọi N worker agent qua RPC.

Agent đầu tiên — boilerplate tối thiểu

Đây là agent đơn giản nhất, monitor một URL HTTP:

// src/agents/health-monitor.ts
import { Agent } from "agents";

interface HealthState {
  url: string;
  lastStatus: number | null;
  lastChecked: number;
  consecutiveFailures: number;
}

export class HealthMonitor extends Agent<Env, HealthState> {
  initialState: HealthState = {
    url: "",
    lastStatus: null,
    lastChecked: 0,
    consecutiveFailures: 0,
  };

  async onStart() {
    // Đăng ký schedule khi agent được tạo lần đầu
    await this.schedule("*/15 * * * *", "checkHealth");
  }

  async checkHealth() {
    if (!this.state.url) return;

    const res = await fetch(this.state.url, {
      cf: { cacheTtl: 0 },
      signal: AbortSignal.timeout(5000),
    });

    const status = res.status;
    const failed = status >= 500;
    const consecutiveFailures = failed
      ? this.state.consecutiveFailures + 1
      : 0;

    this.setState({
      ...this.state,
      lastStatus: status,
      lastChecked: Date.now(),
      consecutiveFailures,
    });

    if (consecutiveFailures >= 3) {
      await this.alertOps(status);
    }
  }

  async alertOps(status: number) {
    await this.env.ALERT_QUEUE.send({
      agentId: this.name,
      url: this.state.url,
      status,
      failures: this.state.consecutiveFailures,
    });
  }
}

Trong wrangler.jsonc:

{
  "durable_objects": {
    "bindings": [
      { "name": "HEALTH_MONITOR", "class_name": "HealthMonitor" }
    ]
  },
  "migrations": [
    { "tag": "v1", "new_sqlite_classes": ["HealthMonitor"] }
  ]
}

new_sqlite_classes quan trọng — DO SQL storage (GA từ 2025) cho phép query state phức tạp hơn KV. Agents framework dùng SQLite backend mặc định.

Hibernation — khi nào agent “ngủ”

Đây là cơ chế cost-saving chính. DO instance unload khỏi memory sau ~30s không có request. State trên disk. Lần wake tiếp theo:

Cold wake từ alarm: ~5-10ms
Wake từ HTTP fetch: ~10-15ms
Wake từ hibernated WebSocket: ~3-5ms (DO chưa unload hoàn toàn, chỉ pause event loop)

So sánh với ECS task: cold start container 3-5 giây, chưa kể time để app khởi tạo. DO hibernation thắng 2-3 order of magnitude cho latency wake.

Nhưng có catch: trong hibernation, timer JavaScript bị mất. setTimeout/setInterval không sống qua hibernation. Phải dùng this.schedule() — đó là DO Alarm dưới lớp.

// SAI — timer mất khi hibernate
async onStart() {
  setInterval(() => this.checkHealth(), 15 * 60 * 1000);
}

// ĐÚNG — schedule qua DO alarm
async onStart() {
  await this.schedule("*/15 * * * *", "checkHealth");
}

Tool calling — gọi LLM với function

Agents framework có adapter cho Anthropic và OpenAI. Pattern điển hình cho agent có “intelligence”:

import { Agent, type ToolDefinition } from "agents";
import Anthropic from "@anthropic-ai/sdk";

interface ChatState {
  messages: Array<{ role: string; content: string }>;
  context: Record<string, unknown>;
}

export class TriageAgent extends Agent<Env, ChatState> {
  initialState: ChatState = { messages: [], context: {} };

  tools: ToolDefinition[] = [
    {
      name: "search_kb",
      description: "Search internal knowledge base",
      input_schema: {
        type: "object",
        properties: { query: { type: "string" } },
        required: ["query"],
      },
      handler: async ({ query }: { query: string }) => {
        const vec = await this.env.AI.run("@cf/baai/bge-base-en-v1.5", {
          text: query,
        });
        const results = await this.env.VECTORIZE_KB.query(vec.data[0], {
          topK: 5,
        });
        return results.matches.map((m) => m.metadata);
      },
    },
    {
      name: "create_ticket",
      description: "Create a support ticket in Linear",
      input_schema: {
        type: "object",
        properties: {
          title: { type: "string" },
          severity: { enum: ["low", "medium", "high"] },
        },
        required: ["title", "severity"],
      },
      handler: async (args) => {
        const ticket = await this.env.LINEAR.createIssue(args);
        this.setState({
          ...this.state,
          context: { ...this.state.context, lastTicket: ticket.id },
        });
        return { ticketId: ticket.id };
      },
    },
  ];

  async onMessage(message: string) {
    const client = new Anthropic({ apiKey: this.env.ANTHROPIC_KEY });
    const history = [
      ...this.state.messages,
      { role: "user", content: message },
    ];

    let response = await client.messages.create({
      model: "claude-sonnet-4-7",
      max_tokens: 2048,
      tools: this.tools.map(({ handler, ...t }) => t),
      messages: history,
    });

    // Tool loop
    while (response.stop_reason === "tool_use") {
      const toolUses = response.content.filter((b) => b.type === "tool_use");
      const results = await Promise.all(
        toolUses.map(async (tu: any) => {
          const tool = this.tools.find((t) => t.name === tu.name)!;
          const result = await tool.handler(tu.input);
          return {
            type: "tool_result" as const,
            tool_use_id: tu.id,
            content: JSON.stringify(result),
          };
        }),
      );

      history.push({ role: "assistant", content: response.content as any });
      history.push({ role: "user", content: results as any });

      response = await client.messages.create({
        model: "claude-sonnet-4-7",
        max_tokens: 2048,
        tools: this.tools.map(({ handler, ...t }) => t),
        messages: history,
      });
    }

    this.setState({ ...this.state, messages: history });
    return response;
  }
}

Vài điều cần chú ý:

Tool handler trả về plain object — framework lo serialize cho LLM.
this.setState trong handler persist tự động. Không cần await this.ctx.storage.put().
Tool loop tự manage — Claude trả stop_reason: tool_use thì loop tiếp.
Subrequest limit — DO có 50 subrequest/invocation. Nếu tool gọi > 30 API thì cần chia agent.

Multi-agent coordination qua WebSocket

Pattern phức tạp hơn: agent A là orchestrator, fan-out tới agent B/C/D, gom kết quả. Mỗi agent có endpoint RPC qua DO stub:

export class OrchestratorAgent extends Agent<Env, OrchestratorState> {
  async runPipeline(taskId: string) {
    const subtasks = await this.planSubtasks(taskId);

    // Spawn agent con qua DO namespace
    const results = await Promise.all(
      subtasks.map(async (subtask) => {
        const id = this.env.WORKER_AGENT.idFromName(subtask.id);
        const stub = this.env.WORKER_AGENT.get(id);
        return stub.fetch("rpc://run", {
          method: "POST",
          body: JSON.stringify(subtask),
        }).then((r) => r.json());
      }),
    );

    // Stream progress qua WebSocket subscriber
    this.broadcast({ type: "pipeline.complete", taskId, results });
    return results;
  }
}

this.broadcast() push tới mọi WebSocket attached. Client dashboard kết nối qua:

const ws = new WebSocket(
  `wss://agent.example.com/agents/orchestrator/task-123/ws`,
);
ws.onmessage = (e) => {
  const msg = JSON.parse(e.data);
  if (msg.type === "pipeline.complete") render(msg.results);
};

Limit thực tế: đừng fan-out > 32 agent đồng thời cho 1 task. DO subrequest limit 50, và mỗi RPC tốn 1 subrequest. Nếu cần fan-out lớn hơn, dùng Cloudflare Queue làm bus.

Schedule và cron — dùng DO Alarm đúng cách

this.schedule() có 3 dạng:

// Cron expression
await this.schedule("0 */6 * * *", "syncFromAPI");

// Duration (relative)
await this.schedule({ in: "5m" }, "retryFailed");

// Absolute timestamp
await this.schedule({ at: new Date("2026-01-01T00:00:00Z") }, "yearEndJob");

Cron resolution là 1 phút. Nếu cần < 1 phút, dùng duration. DO Alarm chỉ giữ 1 alarm/instance — nếu set alarm mới, alarm cũ bị overwrite. Agents framework dưới lớp dùng SQLite table để track multiple schedule.

Khi alarm fire, DO wake, chạy handler. Nếu handler throw, DO retry với exponential backoff lên tới 6 lần. Sau 6 lần, alarm bị drop — framework log vào agent_audit_log table.

Cost thực tế — 1M agent-minutes ~ $0.04/agent

Workers Paid plan ($5/tháng base):

10M request included, $0.02/M sau đó
30M CPU-second included, $0.02/M GB-second sau đó
DO request: $0.15/M, storage $0.20/GB/tháng

Tính cho 500 health agent, mỗi cái 4 wake/giờ × 24h × 30d:

Request: 1.44M (trong free tier) → $0
CPU-second: 72k (trong free tier) → $0
DO request: 1.44M × $0.15 / 1M → $0.22
Storage: 500 × 10KB → 5MB → ~$0

Tổng: $5 base + $0.22 = $5.22 cho 500 agent. ~$0.01/agent/tháng.

Với 1M agent-minute (tức 1 agent chạy liên tục 23 ngày), cost tăng nhưng vẫn dưới $50/tháng nếu logic agent đơn giản. Số trong title là khi agent idle 99% (hibernation working).

So sánh với LangGraph, CrewAI, OpenAI Assistants

Aspect	cloudflare/agents	LangGraph	CrewAI	OpenAI Assistants
Runtime	Workers + DO	Self-host Python	Self-host Python	OpenAI hosted
State	DO storage tự động	Tự build (Postgres)	In-memory hoặc Redis	OpenAI managed
Hibernation	Yes (0 CPU khi idle)	No (process chạy)	No	No
Cold start	5-15ms	1-3s (container)	1-3s	200-500ms API
Cost mô hình 500 agent idle	~$5-15/tháng	$100-200/tháng infra	$100-200/tháng infra	$0.03/run × N
Tool calling	TypeScript + JSON schema	LangChain tool	CrewAI tool	OpenAI tool
Multi-agent	DO RPC + WebSocket	Graph DSL	Crew abstraction	Thread
Vendor lock	Cloudflare	Tự host hoặc LangSmith	Tự host	OpenAI

Stance: chọn cloudflare/agents nếu (a) agent dài-hạn, idle nhiều, (b) đã ở Cloudflare ecosystem, (c) muốn debug TypeScript thuần. Chọn LangGraph nếu cần graph DSL phức tạp và team comfort với Python. OpenAI Assistants chỉ hợp prototype — vendor lock và cost cao khi scale.

Observability — chỗ Agents framework yếu

Framework chưa có built-in tracing tốt. Workarounds:

// Wrap mỗi tool handler để emit log
const wrapTool = (tool: ToolDefinition): ToolDefinition => ({
  ...tool,
  handler: async (args) => {
    const start = Date.now();
    const result = await tool.handler(args);
    await this.env.AUDIT_DB.prepare(
      "INSERT INTO tool_calls (agent_id, tool, args, duration_ms, ts) VALUES (?,?,?,?,?)",
    )
      .bind(this.name, tool.name, JSON.stringify(args), Date.now() - start, Date.now())
      .run();
    return result;
  },
});

Pair với Workers Logpush hoặc Tail Worker để stream log realtime. Workers Trace Worker (beta 2025) cho OpenTelemetry export.

Vận hành — checklist sau 4 tháng

Cạm bẫy thường gặp

1. setInterval mất sau hibernation. Đã nói ở trên. Bài học cứ phải gặp 1 lần.

2. State quá lớn. SQL row giới hạn ~128KB. Nếu cần lưu LLM message history dài, lưu metadata trong DO + body trong R2/D1.

3. Fan-out > 32 agent không qua queue. DO subrequest limit 50 sẽ throw lỗi sau 50th call. Symptom: orchestrator agent crash giữa pipeline.

4. Tool handler không có timeout. LLM gọi tool chậm → DO bị block → mọi message khác tới agent đó queue lại. Luôn dùng AbortSignal.timeout().

5. Đăng ký tên DO class sai trong migration. Đổi tên class TypeScript phải có migration renamed_classes, không thì DO instance cũ orphan.

6. Anthropic SDK timeout default 10 phút. DO request có wall time limit 30s (free) hoặc 5 phút (paid). Phải set client timeout < DO timeout.

Bottom line

Cloudflare Agents framework là cách rẻ nhất chạy agent dài-hạn có state nếu bạn đã ở Workers ecosystem. Hibernation cắt cost ~90% so với container-based agent. Trade-off: vendor lock vào Cloudflare, debug WebSocket Hibernation lúc đầu hơi lạ, và framework chưa trưởng thành bằng LangGraph cho graph workflow phức tạp. Với 500-agent monitor pipeline của tôi, đó là trade-off đáng giá — $5/tháng vs $180/tháng cộng với latency wake nhanh hơn 100×. Khi scale lên 5,000 agent hoặc cần graph DSL phức tạp, sẽ re-evaluate, nhưng kiến trúc agent-per-DO giữ nguyên.

cloudflare/agents trên Workers + Durable Objects — production patterns

Vì sao tôi chọn cloudflare/agents thay vì self-host LangGraph

Cloudflare Agents là gì — không phải LangChain trên Workers

Agent đầu tiên — boilerplate tối thiểu

Hibernation — khi nào agent “ngủ”

Tool calling — gọi LLM với function

Multi-agent coordination qua WebSocket

Schedule và cron — dùng DO Alarm đúng cách

Cost thực tế — 1M agent-minutes ~ $0.04/agent

So sánh với LangGraph, CrewAI, OpenAI Assistants

Observability — chỗ Agents framework yếu

Vận hành — checklist sau 4 tháng

Cạm bẫy thường gặp

Bottom line

Tham chiếu

Phản hồi từ web

Hỏi blog

Nguồn

Vì sao tôi chọn cloudflare/agents thay vì self-host LangGraph

Cloudflare Agents là gì — không phải LangChain trên Workers

Agent đầu tiên — boilerplate tối thiểu

Hibernation — khi nào agent “ngủ”

Tool calling — gọi LLM với function

Multi-agent coordination qua WebSocket

Schedule và cron — dùng DO Alarm đúng cách

Cost thực tế — 1M agent-minutes ~ $0.04/agent

So sánh với LangGraph, CrewAI, OpenAI Assistants

Observability — chỗ Agents framework yếu

Vận hành — checklist sau 4 tháng

Cạm bẫy thường gặp

Bottom line

Tham chiếu

Bài liên quan

Wildebeest: self-host Mastodon trên Cloudflare stack — federated trên Workers

VibeSDK: build AI coding platform riêng trên Cloudflare stack

Remote SWE agents: autonomous coding với AWS Strands Agents