Worker observability: Logs, Tail Workers, Analytics

TL;DR

A Worker has no SSH, no /var/log. Observability lives in 4 layers:

Workers Logs — built-in dashboard, 3-day retention, zero config, $0.60/1M invocations. Enough for 90% of daily debugging.
Tail Workers — real-time stream via wrangler tail or a custom Tail Worker that forwards to Sentry/Datadog.
Logpush — batch export request logs to R2, S3, Splunk, Elastic. Enterprise, typically for compliance.
Analytics Engine — a custom event store. Write from the Worker, query via SQL API. 90-day retention. For app-specific custom metrics.

Main thesis:

Worker logs aren’t Linux logs. No files, no SSH. Debugging production means structured logs + request IDs + Tail Worker streaming + Analytics Engine metrics. Set it up right from day one = 80% of incidents resolved in 5 minutes instead of 5 hours.

This post covers: the 4 layers with real code, structured logging patterns, Analytics Engine schema + SQL queries, email/Slack/PagerDuty alerts, Sentry integration, and a real incident debug playbook.

This post opens Block 5 (Production). Part 18 goes into Security.

Who this is for

Developers who just deployed a Worker to production and want to know how it’s doing.
Teams debugging incidents: 5xx spikes, rising latency, missing data.
Anyone who needs custom metrics (feature usage, conversion funnels) but doesn’t want to set up Prometheus + Grafana.

Recommended prerequisites: Part 2 (runtime), Part 12 (CI/CD).

By the end of this post you will:

Implement structured logging with request IDs.
Set up a Tail Worker forwarding to Sentry in under 30 minutes.
Write custom metrics via Analytics Engine + query via SQL.
Alert when error rate > 1% or p95 latency > 500ms.

What this post isn’t about

Full-featured APM (Datadog, New Relic): integrations exist but aren’t native Cloudflare. Focus is on the native stack + Tail Worker bridges.
Compliance log retention: if you need it seriously, use Logpush → R2 and policy rules there. This post doesn’t cover GDPR/HIPAA details.
Complex distributed tracing (Jaeger, Zipkin): Workers are single-hop stateless, so full tracing isn’t first-class. Request ID patterns cover most edge-function needs.

The 4 layers at a glance

Observability stack: Workers Logs (dashboard, 3-day), Tail Workers (real-time stream), Logpush (R2 / SIEM), Analytics Engine (custom events), built-in dashboard (CPU/errors metrics), alerts (email/Slack/PagerDuty). The Worker sits in the center, forwarding data to each of the 4 layers as needed.

When to use which

Use case	Layer
Debug “why did this request 500?”	Workers Logs
Stream logs in real time during an incident	Tail Worker / `wrangler tail`
Forward every error to Sentry	Custom Tail Worker
Compliance — keep every request log for 1 year	Logpush → R2
Custom metrics (feature usage, conversion)	Analytics Engine
Alert when error rate > 1%	Cloudflare Notifications + Analytics Engine

You don’t need all 4. Most teams start with 2 (Workers Logs + Analytics Engine) and add Logpush when compliance requires it.

Layer 1: Workers Logs

console.log/warn/error inside a Worker is auto-captured and viewable in the dashboard.

Dashboard access

Dashboard → Workers & Pages → Select Worker → Logs tab

Filter by:

Time range (last 15min, 1h, 6h, 24h, 3day).
Status code (2xx, 4xx, 5xx).
Log level (info, warn, error).
Substring search in the message.

Enable in wrangler.jsonc

Observability is off by default on Free, on from Paid. Enable:

{
  "observability": {
    "enabled": true,
    "head_sampling_rate": 1.0  // 100% of requests are logged
  }
}

head_sampling_rate: 0.1 = 10% of requests logged (reduces cost for high-traffic sites).

Structured logging

console.log("user 123 logged in") is hard to query. Use JSON:

function log(level: string, message: string, context: Record<string, unknown> = {}) {
  console.log(JSON.stringify({
    level,
    message,
    timestamp: new Date().toISOString(),
    ...context,
  }));
}

// Usage
log("info", "user logged in", { userId: "abc-123", method: "oidc" });
log("error", "payment failed", { userId: "abc-123", orderId: "ord-1", reason: "card_declined" });

The dashboard can filter JSON fields (with Workers Logs v2). Searching for “userId:abc-123” finds every log for that user.

Request ID pattern

Every request gets an ID, which is included in every log and the response header.

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const requestId = crypto.randomUUID();

    // Wrap log to auto-include requestId
    const log = (level: string, msg: string, ctx: Record<string, unknown> = {}) =>
      console.log(JSON.stringify({ level, requestId, msg, ...ctx, ts: Date.now() }));

    log("info", "request start", { path: new URL(request.url).pathname });

    try {
      const response = await handleRequest(request, env, log);
      response.headers.set("x-request-id", requestId);
      log("info", "request done", { status: response.status });
      return response;
    } catch (err) {
      log("error", "request failed", { error: err.message, stack: err.stack });
      return new Response("Internal error", {
        status: 500,
        headers: { "x-request-id": requestId },
      });
    }
  },
};

Users see x-request-id: abc-123 in the response header. Support tickets include the ID → faster debug.

Pricing

Workers Logs: $0.60/1M log invocations beyond the free tier. A high-traffic site at 1B req/month × 10% sampling = 100M logs × $0.60/1M = $60/month. Sampling rate matters.

Layer 2: Tail Workers

Real-time log stream while debugging live.

wrangler tail

npx wrangler tail my-worker

Streams every log in real time. Filters:

npx wrangler tail my-worker --status=error
npx wrangler tail my-worker --search="user-123"
npx wrangler tail my-worker --sampling-rate=0.1

Use it during active incidents. No persistence — Ctrl+C and everything is gone.

Custom Tail Worker

A Tail Worker is a special Worker that receives events from a production Worker. You forward logs wherever you want.

my-logger/src/index.ts:

export default {
  async tail(events: TraceItem[], env: Env): Promise<void> {
    for (const event of events) {
      // event.scriptName, event.outcome, event.logs, event.exceptions
      if (event.outcome === "exception" || event.exceptions.length > 0) {
        await sendToSentry(event, env);
      }

      // Forward all error-level logs to Datadog
      for (const log of event.logs) {
        if (log.level === "error") {
          await sendToDatadog(log, env);
        }
      }
    }
  },
} satisfies ExportedHandler<Env>;

async function sendToSentry(event: TraceItem, env: Env) {
  await fetch(env.SENTRY_DSN, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      message: event.exceptions[0]?.message,
      request: event.event,
      tags: { worker: event.scriptName },
    }),
  });
}

my-logger/wrangler.jsonc:

{
  "name": "my-logger",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-01"
}

Deploy:

cd my-logger && npx wrangler deploy

Attach to the production Worker (my-app/wrangler.jsonc):

{
  "name": "my-app",
  "tail_consumers": [
    { "service": "my-logger" }
  ]
}

Deploy my-app. Now every my-app request emits an event to my-logger, which forwards to Sentry.

Sentry integration

The toucan-js library is optimized for Workers:

import Toucan from "toucan-js";

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const sentry = new Toucan({
      dsn: env.SENTRY_DSN,
      context: ctx,
      request,
      environment: env.ENVIRONMENT,
    });

    try {
      return await handleRequest(request, env);
    } catch (err) {
      sentry.captureException(err);
      return new Response("Internal error", { status: 500 });
    }
  },
};

Inline capture differs from a Tail Worker: inline blocks the request until Sentry acks. Tail Workers are async and don’t impact production latency.

Recommendation: Tail Worker for production, toucan-js only when you need per-request stack traces.

Layer 3: Logpush

Batch export request logs to R2, S3, Splunk, Elastic, Datadog.

Setup (Enterprise feature)

Dashboard → Analytics & Logs → Logpush → Create job.

Config:

Dataset: HTTP requests, Workers traces, Spectrum, DNS firewall, etc.
Destination: R2 bucket, S3 bucket, HTTP endpoint.
Fields: ClientIP, Datetime, EdgeResponseStatus, etc.
Sampling: 0.01 = 1% of requests.
Format: JSON, NDJSON, CSV.

R2 destination:

{
  "destination_conf": "r2://my-bucket?account-id=xxx&access-key-id=xxx&secret-access-key=xxx",
  "dataset": "workers_trace_events",
  "fields": "Event,EventTimestampMs,Outcome,ScriptName,Logs,Exceptions",
  "kind": "instant-logs"
}

Use cases

Compliance: keep logs for 1 year (PCI, HIPAA, SOC2).
Security analysis: WAF logs → SIEM (Splunk/Elastic).
Long-term trends: metrics beyond 90 days (Analytics Engine limit).
Cross-system correlation: Worker logs + AWS logs in a single Datadog.

Cost

Logpush is an Enterprise feature. Contact sales. Consider alternatives:

Workers Logs (3-day) + Analytics Engine (90-day) for 99% of cases.
Custom Tail Worker → R2 for budget-conscious teams.

Poor-person’s Logpush

// Tail Worker writes events to R2
export default {
  async tail(events: TraceItem[], env: Env): Promise<void> {
    const ndjson = events.map((e) => JSON.stringify(e)).join("\n");
    const key = `logs/${new Date().toISOString().slice(0, 13)}/${crypto.randomUUID()}.ndjson`;
    await env.R2.put(key, ndjson);
  },
};

One prefix per hour, one file per batch. A daily Scheduled Worker merges small files into bigger ones.

Cost: R2 storage $0.015/GB. 100M events × ~500 bytes each = 50GB = $0.75/month. Much cheaper than Logpush.

Layer 4: Analytics Engine

A custom event store. The Worker writes datapoints, and you query via SQL API.

Analytics Engine flow: a Worker calls writeDataPoint (blobs + doubles), the time-series column store keeps 90 days of data, query via SQL API (count, quantileWeighted, groupBy), dashboards via Grafana or a custom Worker admin page.

Setup

wrangler.jsonc:

{
  "analytics_engine_datasets": [
    { "binding": "AE", "dataset": "my_app_events" }
  ]
}

Write

env.AE.writeDataPoint({
  indexes: ["user:abc-123"],
  blobs: [
    "/api/search",       // blob1: path
    "vn",                // blob2: country
    "claude-3.5",        // blob3: model used
  ],
  doubles: [
    250.5,               // double1: duration ms
    1024,                // double2: response size bytes
  ],
});

Schema:

indexes: up to 1, string, high-cardinality filter field.
blobs: up to 20, strings, low-cardinality filter + groupBy fields.
doubles: up to 20, numbers, aggregate fields.

Writing doesn’t charge Worker CPU. Fire-and-forget.

Query via SQL API

POST to https://api.cloudflare.com/client/v4/accounts/<account-id>/analytics_engine/sql:

SELECT
  blob1 AS path,
  count() AS hits,
  quantileWeighted(0.5, double1) AS p50,
  quantileWeighted(0.95, double1) AS p95,
  quantileWeighted(0.99, double1) AS p99
FROM my_app_events
WHERE timestamp > NOW() - INTERVAL '1' HOUR
GROUP BY path
ORDER BY hits DESC
LIMIT 20

Auth: Authorization: Bearer <scoped-api-token>.

// Worker: log each pageview
async function fetch(request: Request, env: Env) {
  const response = await handleRequest(request, env);

  if (response.ok && request.url.includes("/blog/")) {
    env.AE.writeDataPoint({
      indexes: [request.cf?.country ?? "unknown"],
      blobs: [
        new URL(request.url).pathname,
        request.headers.get("user-agent") ?? "",
        request.headers.get("referer") ?? "",
      ],
      doubles: [1],  // placeholder, use count() instead
    });
  }

  return response;
}

Query top posts for the past week:

async function getPopularPosts(env: Env): Promise<PopularPost[]> {
  const sql = `
    SELECT blob1 AS path, count() AS views
    FROM my_app_events
    WHERE timestamp > NOW() - INTERVAL '7' DAY
      AND blob1 LIKE '/blog/%'
    GROUP BY blob1
    ORDER BY views DESC
    LIMIT 10
  `;

  const response = await fetch(
    `https://api.cloudflare.com/client/v4/accounts/${env.CF_ACCOUNT_ID}/analytics_engine/sql`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${env.AE_API_TOKEN}`,
        "Content-Type": "application/json",
      },
      body: sql,
    }
  );

  const { data } = await response.json();
  return data;
}

This blog’s /api/popular endpoint uses exactly this pattern.

Pricing

~25M data points/month free tier.
$0.25 per 1M data points after that.
SQL queries: free (at reasonable usage).

Example: 1M page views × 2 writes per view (page + api) = 2M data points/month = free.

Server-side sampling

Large datasets (>1B) → Cloudflare auto-samples. The _sample_interval field on each row tells you how many real events that row represents.

SELECT sum(_sample_interval) AS real_count
FROM my_dataset

count() returns the row count (sampled). sum(_sample_interval) returns the estimated real event count.

Alert setup

Cloudflare Notifications

Dashboard → Notifications → Add. Notification types:

Worker Errors: error rate above a threshold.
Worker CPU: CPU time exceeded.
HTTP 5xx rate: zone-level.
Billing: cost > $X.

Destinations: Email, Webhook, PagerDuty, Slack.

Simple thresholds. No complex aggregation. Enough for 80% of needs.

Alerts with Analytics Engine + Scheduled Worker

More complex: a scheduled Worker queries AE every 5 minutes and posts to Slack when a threshold is breached.

// scheduled handler
export default {
  async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) {
    const result = await querySQL(env, `
      SELECT
        countIf(double1 >= 500) AS errors,
        count() AS total
      FROM my_app_events
      WHERE timestamp > NOW() - INTERVAL '5' MINUTE
    `);

    const { errors, total } = result.data[0];
    const errorRate = errors / total;

    if (errorRate > 0.01) {  // > 1%
      await fetch(env.SLACK_WEBHOOK, {
        method: "POST",
        body: JSON.stringify({
          text: `🚨 Error rate ${(errorRate * 100).toFixed(2)}% (${errors}/${total})`,
        }),
      });
    }
  },
};

wrangler.jsonc:

{
  "triggers": {
    "crons": ["*/5 * * * *"]
  }
}

Runs every 5 minutes. More detailed than built-in alerts.

Debug playbook: a real incident

Real scenario: a user reports “5xx when subscribing to the newsletter”. 10:30 AM, mid-meeting.

Minute 1: Workers Logs

Dashboard → Worker my-app → Logs. Filter status: 5xx, last 30 min.

See 15 error logs, all between 10:20-10:28. Every log has:

{
  "level": "error",
  "msg": "request failed",
  "requestId": "...",
  "error": "D1_ERROR: too many requests"
}

Root cause: D1 rate limit.

Minute 2: check D1 metrics

Dashboard → D1 → Metrics. Query count: spike from 50/s to 300/s at 10:15-10:28. Someone’s hitting /api/subscribe hard.

Minute 3: Tail Worker checks abuse

wrangler tail my-app --search="/api/subscribe"

200 requests/minute from the same IP. Bot attack.

Minute 4: mitigate

Deploy a rate-limit rule:

// Add to Worker
const subscribeLimiter = env.RATE_LIMITER.get(env.RATE_LIMITER.idFromName(`ip:${clientIP}`));
const { allowed } = await subscribeLimiter.fetch(...).json();
if (!allowed) return new Response("Too many", { status: 429 });

Push → CI → deploy.

Minute 5: verify

wrangler tail shows 429s returning to the bot. D1 query count drops back to baseline.

Post-incident

Analytics Engine query:

SELECT blob1 AS ip, count() AS req
FROM subscribe_events
WHERE timestamp > NOW() - INTERVAL '1' HOUR
GROUP BY ip
ORDER BY req DESC
LIMIT 20

Confirms the attack scope, files abuse report. Permanent rule via WAF.

Total time: 5 minutes from report → mitigation. Thanks to having all 4 observability layers ready.

Gotchas

① console.log doesn’t format in the browser console

console.log(obj) in the Worker dashboard shows [object Object]. Use console.log(JSON.stringify(obj)). Workers Logs v2 auto-parses JSON.

② waitUntil for async logs

A log call to an external service (Sentry, Datadog) without await → the request returns before the log is sent. Use ctx.waitUntil():

ctx.waitUntil(sendToSentry(error));
return response;

③ Analytics Engine `_sample_interval` is easy to forget

Datasets > 1B datapoints get sampled. Queries using count() underreport. Always use sum(_sample_interval) for totals:

-- Wrong: count() only counts rows
SELECT blob1, count() FROM ae GROUP BY blob1

-- Right: scale by _sample_interval
SELECT blob1, sum(_sample_interval) FROM ae GROUP BY blob1

④ Request logs blow up cost

1B requests/month × Workers Logs $0.60/1M = $600/month for logs alone. Sampling at 10% cuts that to $60. Set head_sampling_rate from day one.

⑤ Tail Worker infinite loops

Tail Worker log = one log per production Worker request = one log. If Tail Worker logs itself = infinite log loop. Don’t log inside the Tail Worker unless necessary.

⑥ Sensitive data in logs

console.log(request.headers) dumps the Authorization token. Dangerous PII. Redact:

function redact(headers: Headers): Record<string, string> {
  const obj = Object.fromEntries(headers);
  delete obj.authorization;
  delete obj.cookie;
  if (obj["x-api-key"]) obj["x-api-key"] = "***";
  return obj;
}

⑦ Log buffer limits

Workers cap at 128 log entries/request in the dashboard. High-throughput verbose-log services = rotate through Tail Worker → R2.

⑧ Timezones

Cloudflare log timestamps are UTC. The dashboard can convert to local, the API returns UTC. Document it clearly for your team.

Setup from scratch: 30 minutes

Minutes 0-5: enable observability in wrangler.jsonc, deploy. Logs appear in the dashboard immediately.

Minutes 5-15: structured logging helper + request IDs. Every log = JSON with requestId.

Minutes 15-25: Analytics Engine dataset + writeDataPoint per request. Key metrics: path, duration, status.

Minutes 25-30: Cloudflare Notifications alerts for error rate + Slack webhook.

30 minutes = full observability stack for a small/medium app.

Production checklist

Wrap-up

Observability isn’t optional. Production Workers have no SSH — if you don’t log, you know nothing. Cloudflare’s 4 layers cover: daily debug (Workers Logs), incident streaming (Tail Workers), compliance (Logpush), custom metrics (Analytics Engine).

Set it up right in 30 minutes = save dozens of hours of debugging. Skip it = fly blind in production.

Part 18: Security — secret management, CSP headers, Bot Management, Turnstile, Cloudflare Access, signed cookie patterns, and defense-in-depth for Workers.

TL;DR

Who this is for

What this post isn’t about

The 4 layers at a glance

When to use which

Layer 1: Workers Logs

Dashboard access

Enable in wrangler.jsonc

Structured logging

Request ID pattern

Pricing

Layer 2: Tail Workers

wrangler tail

Custom Tail Worker

Sentry integration

Layer 3: Logpush

Setup (Enterprise feature)

Use cases

Cost

Poor-person’s Logpush

Layer 4: Analytics Engine

Setup

Write

Query via SQL API

Practical example: popular posts

Pricing

Server-side sampling

Alert setup

Cloudflare Notifications

Alerts with Analytics Engine + Scheduled Worker

Debug playbook: a real incident

Minute 1: Workers Logs

Minute 2: check D1 metrics

Minute 3: Tail Worker checks abuse

Minute 4: mitigate

Minute 5: verify

Post-incident

Gotchas

① console.log doesn’t format in the browser console

② waitUntil for async logs

③ Analytics Engine _sample_interval is easy to forget

④ Request logs blow up cost

⑤ Tail Worker infinite loops

⑥ Sensitive data in logs

⑦ Log buffer limits

⑧ Timezones

Setup from scratch: 30 minutes

Production checklist

Wrap-up

References

Related reading

Migrating AWS/Vercel to Cloudflare: a real playbook

Cloudflare Developer Platform cost model: tiers vs AWS

Worker security: secrets, CSP, Bot Management, Turnstile

③ Analytics Engine `_sample_interval` is easy to forget