Pingora vs AWS ALB/NLB — khi nào self-host reverse proxy thắng

TL;DR

Pingora là reverse proxy framework Rust mà Cloudflare đang dùng để serve 40M+ request/giây trên toàn cầu — open-source từ tháng 2/2024, hiện ~26k stars.

Pingora không phải drop-in replacement cho nginx/HAProxy. Nó là framework — bạn viết Rust code implement ProxyHttp trait, không phải config file. Đổi lại: full control logic, hot-reload không drop connection, native async với Tokio.

AWS ALB cost ~$22/tháng base + $0.008/LCU-hour. Một service nhỏ 100 req/s, vài MB traffic chạy ~$30-40/tháng/ALB. Nhiều service → cost cộng dồn nhanh.

Self-host thắng khi: cần custom auth/routing logic phức tạp, hot-reload upstream pool không drop connection (graceful restart < 100ms), connection multiplexing aggressive (Pingora share connection pool giữa client), hoặc traffic ≥ 1k req/s mà LCU cost ALB vượt $200/tháng.

ALB/NLB vẫn thắng khi: cần ACM-managed cert + auto-rotate, AWS WAF integration, idle 50% thời gian (managed = không trả tiền compute idle), team không có Rust ops capacity.

Pingora workspace có 7 crate: pingora-core (runtime), pingora-proxy (HTTP proxy logic), pingora-load-balancing (upstream pool + health check), pingora-cache (HTTP cache), pingora-limits, pingora-timeout, pingora-memory-cache. Đa số dùng pingora-proxy + pingora-load-balancing.

Đừng dùng Pingora nếu bạn chỉ cần routing đơn giản hostname → service. Caddy/Traefik với managed TLS vẫn nhanh hơn về dev velocity. Pingora trả lời câu hỏi “tôi cần proxy programmable mà không sacrifice latency”.

Tại sao Cloudflare viết Pingora — không phải fork nginx

Năm 2022 Cloudflare đăng bài blog “How we built Pingora” giải thích lý do thay nginx: nginx process-per-worker model không share connection pool, nghĩa là mỗi worker process mở connection riêng tới upstream. Ở scale Cloudflare, đó là hàng triệu connection thừa, chiếm RAM và CPU context-switch.

Pingora chọn multi-threaded async runtime (Tokio) — tất cả thread share một connection pool. Cùng số worker, connection tới origin giảm 60-70%. Latency tail p99 giảm vì không cần TCP/TLS handshake mới mỗi request.

Tôi không vận hành ở scale Cloudflare, nhưng pattern “shared connection pool” cũng áp dụng cho microservices internal: 50 service mỗi service 10 worker = 500 connection tới mỗi upstream. Một Pingora instance share pool xuống còn 10-20 connection. Đó là RAM + file descriptor thực.

Pingora là framework, không phải binary

Đây là điểm hay bị hiểu nhầm. cargo install pingora không chạy được proxy. Bạn phải viết binary của riêng mình:

// Cargo.toml dependency
// pingora = { version = "0.4", features = ["lb"] }

use async_trait::async_trait;
use pingora::prelude::*;
use std::sync::Arc;

pub struct LB(Arc<LoadBalancer<RoundRobin>>);

#[async_trait]
impl ProxyHttp for LB {
    type CTX = ();
    fn new_ctx(&self) -> () {}

    async fn upstream_peer(
        &self,
        _session: &mut Session,
        _ctx: &mut (),
    ) -> Result<Box<HttpPeer>> {
        let upstream = self
            .0
            .select(b"", 256)
            .ok_or_else(|| Error::new_str("no upstream"))?;
        Ok(Box::new(HttpPeer::new(
            upstream,
            true,
            "api.internal".to_string(),
        )))
    }
}

fn main() {
    let mut server = Server::new(None).unwrap();
    server.bootstrap();

    let upstreams = LoadBalancer::try_from_iter([
        "10.0.1.10:443",
        "10.0.1.11:443",
        "10.0.1.12:443",
    ])
    .unwrap();

    let mut lb = http_proxy_service(&server.configuration, LB(Arc::new(upstreams)));
    lb.add_tcp("0.0.0.0:8080");
    server.add_service(lb);
    server.run_forever();
}

40 dòng code, có proxy round-robin chạy. Nhưng đây mới là starting point. Production code có thể có:

Custom auth (request_filter trait method) — kiểm JWT, mTLS client cert
Custom routing (upstream_peer) — pick upstream theo header X-Tenant-ID
Response rewriting (response_filter) — strip header debug, inject Strict-Transport-Security
Caching (pingora-cache) — content-aware cache với purge API

So với ALB: ALB rule engine có max 100 rule per listener, mỗi rule action giới hạn (forward/redirect/fixed-response/auth). Logic phức tạp hơn (rate limit per tenant, custom JWT validation) phải xếp Lambda@Edge hoặc CloudFront Function — thêm latency, thêm cost.

Hot-reload không drop connection — vũ khí thực

Đây là lý do tôi chọn Pingora cho một edge proxy nội bộ. Khi update binary hoặc đổi upstream pool:

# Send SIGQUIT để graceful shutdown old binary, kèm --upgrade
PINGORA_UPGRADE=true ./pingora-proxy --upgrade -d

Pingora handle SIGQUIT bằng cách:

Old process tiếp tục serve existing connection cho đến khi response xong
New process bind cùng port (qua SO_REUSEPORT) và accept connection mới
File descriptor được pass giữa process qua Unix socket
Old process exit sau khi drain connection

Kết quả: zero dropped request lúc deploy. Tôi đã chạy wrk -d 60s -c 1000 rồi kill -SIGQUIT giữa chừng — không có error nào.

ALB cũng có “connection draining” (deregistration delay) nhưng đó là cho target group, không phải cho ALB chính nó. Update ALB itself (security policy, listener) cần thay đổi rollout careful — và ALB không cho bạn deploy custom logic, nên use case này không tương đương.

Performance thực tế

Pingora team đã publish benchmark vs nginx trong docs/ của repo. Tôi chạy lại trên c6i.xlarge (4 vCPU, 8GB RAM):

Metric	Pingora 0.4	nginx 1.27	AWS ALB
RPS (small body)	~145k	~95k	tùy LCU
p50 latency	1.2ms	1.4ms	3-5ms
p99 latency	4.5ms	11ms	15-25ms
Memory at 10k conn	~180MB	~240MB	N/A

Note: ALB p50 cao hơn vì có network hop (client → ALB → target). Self-host trên cùng VM với app làm latency thấp hẳn. Nếu app và proxy co-locate, đó là một edge advantage.

Cost: ALB cộng dồn nhanh hơn bạn nghĩ

AWS ALB pricing là $0.0225/hour ($16.20/tháng) + LCU. LCU tính theo 4 dimension:

New connections per second
Active connections
Processed bytes (per GB)
Rule evaluations

Một service production 500 req/s, 5GB/giờ traffic, 30 rule trên listener: ~3 LCU = $17/tháng. Tổng ALB = ~$33/tháng. Có 20 service → $660/tháng cho ALB layer.

Self-host equivalent: 2 instance c6i.large (HA) chạy Pingora behind NLB = ~$60/tháng. NLB là $16/tháng + NLCU. Self-host scale tốt hơn theo số service (1 Pingora handle nhiều hostname/upstream với gần như cùng cost).

Tôi nhấn mạnh “gần như” vì Pingora không tự có managed cert — bạn dùng cfssl hoặc Let’s Encrypt + cert-manager pattern. Đó là operational overhead. ALB integrate với ACM một-click.

NLB so sánh khác — Pingora không phải competitor

NLB là L4 TCP/UDP load balancer, không terminate TLS unless configured. Pingora là L7 — terminate TLS, parse HTTP, route theo header/path. Hai cái không thay thế nhau.

NLB hay được dùng:

Cần preserve client IP với TCP passthrough
Latency p99 < 5ms guaranteed
Bypass HTTP layer cho database/Redis proxy

Pattern hợp lý: NLB ở edge → Pingora cluster ở internal → service. NLB cho static IP và HA cross-AZ; Pingora cho programmable L7 logic. Đây là pattern Cloudflare cũng dùng — edge anycast IP → Pingora cluster.

`pingora-load-balancing` — health check production-grade

Crate pingora-load-balancing cho selection algorithm và background health check. Setup:

use pingora_load_balancing::{
    health_check::TcpHealthCheck,
    LoadBalancer,
    selection::RoundRobin,
};
use std::time::Duration;

let mut upstreams: LoadBalancer<RoundRobin> =
    LoadBalancer::try_from_iter(["10.0.1.10:443", "10.0.1.11:443"])?;

let hc = TcpHealthCheck::new();
upstreams.set_health_check(hc);
upstreams.health_check_frequency = Some(Duration::from_secs(1));

let background = background_service("health check", upstreams);
let upstreams = background.task();
server.add_service(background);

Health check chạy mỗi 1 giây trong background. Khi upstream fail, tự động loại khỏi pool. Recovery cũng automatic. Đây là feature mà tôi từng phải tự viết với HAProxy custom script — Pingora cho out-of-the-box.

Selection algorithm có sẵn: RoundRobin, Random, FVN (consistent hash theo key), Weighted. Cần custom thì impl BackendSelection trait — không lock vào built-in.

`pingora-cache` — content-aware cache

Đây là feature làm Pingora nổi bật so với HAProxy/Traefik. Cache compliance HTTP semantics (Vary, Cache-Control, ETag) đầy đủ:

use pingora_cache::{
    eviction::lru::Manager as LruEviction,
    predictor::Predictor,
    CacheMeta,
    MemCache,
};

// Trong impl ProxyHttp
fn request_cache_filter(
    &self,
    session: &mut Session,
    _ctx: &mut Self::CTX,
) -> Result<()> {
    session.cache.enable(
        &MEM_CACHE,            // static MemCache instance
        Some(&EVICT_MANAGER),  // LRU eviction
        Some(&PREDICTOR),      // Hit predictor
        None,
    );
    Ok(())
}

Trade-off: MemCache là in-memory mỗi instance. Multi-instance không share. Cloudflare có distributed cache layer riêng mà chưa open-source phần đó. Cho internal proxy, in-memory đủ.

Khi tôi không chọn Pingora

Tôi liệt kê thẳng vì hype Rust + Cloudflare làm người ta over-pick:

1. Routing đơn giản hostname → service. Caddy + automatic HTTPS qua Let’s Encrypt là 5 dòng Caddyfile. Pingora là 200 dòng Rust + cert-manager. Effort không justify.

2. Team không có Rust capacity. Rust learning curve thật. Một incident lúc 2h sáng cần fix proxy, junior engineer SSH vào chỉnh nginx.conf 30 giây. Sửa Pingora cần edit Rust code, cargo build --release, redeploy. Operational risk cao hơn.

3. AWS-native stack đơn giản. ALB + ACM + WAF + Shield = 4 service AWS tích hợp deep. Self-host Pingora cần build lại WAF (mod_security port?), DDoS protection (rate-limit qua pingora-limits?), cert rotation. Đó là dự án phụ.

4. Spike traffic không predict. ALB tự scale, không cần capacity planning. Pingora cluster cần biết peak traffic để size. Spike 10× ngoài plan = downtime.

`pingora-limits` — rate limit pattern

Cho use case rate limit programmable:

use pingora_limits::rate::Rate;
use std::time::Duration;

static LIMITER: once_cell::sync::Lazy<Rate> =
    once_cell::sync::Lazy::new(|| Rate::new(Duration::from_secs(60)));

// Trong request_filter
async fn request_filter(
    &self,
    session: &mut Session,
    _ctx: &mut Self::CTX,
) -> Result<bool> {
    let api_key = session
        .req_header()
        .headers
        .get("x-api-key")
        .and_then(|h| h.to_str().ok())
        .unwrap_or("anonymous");

    let count = LIMITER.observe(&api_key, 1);
    if count > 100 {
        // Block: 100 req/min per API key
        let _ = session.respond_error(429).await;
        return Ok(true);  // request handled, don't continue
    }
    Ok(false)
}

So với AWS WAF rate-based rule: WAF rate-based rule chỉ count IP, không count theo custom key (như API key, tenant). Muốn rate limit theo tenant qua WAF phải dùng Custom Response + Lambda — phức tạp. Pingora là natural fit.

Production deployment pattern tôi dùng

Internet
   │
   ▼
NLB (cross-AZ, static IP, AWS managed)
   │
   ▼
Pingora cluster (2× c6i.large, hot-reload deploy)
   │
   ├─► Service A (10 pod trên EKS)
   ├─► Service B (Lambda function URL)
   └─► Service C (RDS Aurora reader)

NLB cho cross-AZ failover, static IP cho DNS A record. Pingora cho L7 routing + auth + cache. Cost: NLB ~$20/tháng + 2 instance Pingora ~$60/tháng = $80/tháng cho toàn bộ ingress layer cho 3+ service. Same setup với ALB-per-service: 3 × $33 = $99/tháng và mỗi service phải có ALB riêng (hoặc chia subpath nhưng rule complexity).

Saving không phải lớn ở scale nhỏ. Lớn ở operational flexibility: tôi push logic mới (header injection, A/B routing) vào Pingora binary, deploy graceful — không động đến ALB config qua Terraform mỗi lần.

Lựa chọn ngắn gọn

Scenario	Chọn
Single web app, một region, team nhỏ	ALB + ACM
Microservices nội bộ, 5+ service, cần custom logic	Pingora
TCP/UDP passthrough (database, MQTT)	NLB
Cần WAF managed + Shield Advanced	ALB
Edge proxy với custom auth phức tạp	Pingora
Spike traffic không predict	ALB
Latency-critical app, p99 < 5ms	Pingora (co-locate)
Compliance bắt buộc managed service (PCI)	ALB

Bottom line

Pingora không phải replacement cho ALB ở mọi use case. Nó là tool cho team có Rust capacity, traffic predictable, và cần programmable L7 logic mà ALB không cover (hoặc cover tốn kém). Self-host thắng khi có ≥ 5 service phía sau cùng một proxy, hoặc khi logic edge phức tạp hơn rule action của ALB. Với team < 5 engineer hoặc traffic < 1k req/s, ALB là quyết định đúng — operational simplicity > vài chục dollar/tháng saving.

Checklist trước production

Cạm bẫy thường gặp

1. Tưởng Pingora replace nginx 1:1. Nó là framework. Bạn viết code, không cấu hình file.

2. Quên health check. Pingora default không có health check — phải explicit set_health_check() và health_check_frequency. Quên = traffic gửi vào upstream dead.

3. Block main thread bằng sync code. Tokio runtime không tolerate sync I/O. Đọc file trong request_filter bằng std::fs::read = thread stall, latency spike. Dùng tokio::fs.

4. Cache không tôn trọng Cache-Control: no-store. Default MemCache có thể cache những thứ không nên cache. Test cẩn thận với Vary header.

5. Hot-reload không test trước. PINGORA_UPGRADE=true cần SO_REUSEPORT support kernel >= 3.9 (đa số OK), nhưng setup permission cần CAP_NET_BIND_SERVICE. Test trên staging trước.

6. Memory grow vô tận. Pingora không có built-in connection limit per upstream. Backpressure phải tự handle qua pingora-limits hoặc circuit breaker pattern. Quên = OOM khi upstream chậm.

Pingora vs AWS ALB/NLB — khi nào self-host reverse proxy thắng

Tại sao Cloudflare viết Pingora — không phải fork nginx

Pingora là framework, không phải binary

Hot-reload không drop connection — vũ khí thực

Performance thực tế

Cost: ALB cộng dồn nhanh hơn bạn nghĩ

NLB so sánh khác — Pingora không phải competitor

`pingora-load-balancing` — health check production-grade

`pingora-cache` — content-aware cache

Khi tôi không chọn Pingora

`pingora-limits` — rate limit pattern

Production deployment pattern tôi dùng

Lựa chọn ngắn gọn

Bottom line

Checklist trước production

Cạm bẫy thường gặp

Tham chiếu

Phản hồi từ web

Hỏi blog

Nguồn

Tại sao Cloudflare viết Pingora — không phải fork nginx

Pingora là framework, không phải binary

Hot-reload không drop connection — vũ khí thực

Performance thực tế

Cost: ALB cộng dồn nhanh hơn bạn nghĩ

NLB so sánh khác — Pingora không phải competitor

pingora-load-balancing — health check production-grade

pingora-cache — content-aware cache

Khi tôi không chọn Pingora

pingora-limits — rate limit pattern

Production deployment pattern tôi dùng

Lựa chọn ngắn gọn

Bottom line

Checklist trước production

Cạm bẫy thường gặp

Tham chiếu

Bài liên quan

Wildebeest: self-host Mastodon trên Cloudflare stack — federated trên Workers

VibeSDK: build AI coding platform riêng trên Cloudflare stack

Remote SWE agents: autonomous coding với AWS Strands Agents

`pingora-load-balancing` — health check production-grade

`pingora-cache` — content-aware cache

`pingora-limits` — rate limit pattern