API Gateway Patterns for Growing Teams
APISIX runs at 190% of Kong's performance on rate limiting benchmarks. KrakenD handles API composition without code. Traefik is your gateway if you are already on Kubernetes. This is the honest comparison — plus how to handle the AI-specific patterns no traditional gateway was designed for.

Every growing API product eventually outgrows a single reverse proxy and needs a proper API gateway. The question is which one, for how long, and when to switch from managed to self-hosted (or vice versa). The decision involves performance benchmarks, operational complexity, plugin ecosystems, and increasingly, whether the gateway can handle the AI-specific patterns that traditional HTTP traffic never required.
This is the honest comparison. Performance numbers are from published benchmarks. Operational experience from real deployments. No vendor bias.
The Four Self-Hosted Contenders
Kong Gateway
Kong is the market leader in open-source API gateways. It runs on top of NGINX and OpenResty, has the largest plugin ecosystem (over 300 plugins), and has the deepest enterprise feature set. The governance and authentication plugins (OAuth2, JWT, OIDC, API key management) are mature and production-proven.
The honest weakness: performance. Kong's Lua-based plugin architecture adds overhead per request. At low to moderate traffic (< 5K QPS), this is invisible. At high traffic, the overhead compounds. Kong's cloud-managed version (Konnect) is a reasonable option if you want the Kong plugin ecosystem without the operational overhead of running it yourself.
Apache APISIX
APISIX also runs on NGINX/OpenResty but was architected for higher throughput. Published benchmarks show APISIX at 190% of Kong's performance on rate limiting operations — processing 23,000+ QPS on a single node with 0.2ms average latency. The plugin ecosystem is smaller than Kong's but growing rapidly, and APISIX supports plugins in multiple languages (Lua, Python, Go, Java) via a sidecar model.
APISIX is the right choice when performance at high QPS is the primary concern. The operational model is similar to Kong (both use etcd for configuration), so migration between them is feasible if your requirements change.
Traefik
Traefik is the Kubernetes-native gateway. It reads Kubernetes Ingress and IngressRoute resources natively, auto-discovers services, and handles TLS certificate management through Let's Encrypt integration. If your workload is on Kubernetes, Traefik has the best operational experience of any gateway.
The limitations: the plugin ecosystem is smaller than Kong or APISIX, and Traefik's rate limiting and authentication capabilities are less mature for complex use cases. It is the right choice for teams that want gateway functionality without a separate configuration plane — not for teams with complex API management requirements.
KrakenD
KrakenD occupies a different niche: API composition. Where the other gateways proxy requests, KrakenD can aggregate multiple backend responses into a single client-facing API call. A single request to KrakenD can fan out to five services, merge the responses, filter fields, and return a composed response. For BFF patterns and API facades, this is a significant operational simplification.
KrakenD is stateless and requires no external configuration store. Configuration is a static JSON file, which makes it easy to version-control and deploy. The trade-off: less dynamic; changing configuration requires a restart.
Performance Comparison
| Gateway | Max QPS (single node) | Latency (p99) | Memory footprint | Plugin ecosystem |
|---|---|---|---|---|
| APISIX | 23,000+ | < 2ms | Low (~50MB) | Growing (300+) |
| Kong | ~12,000 | < 5ms | Medium (~100MB) | Large (300+ commercial/OSS) |
| KrakenD | 25,000+ | < 1ms | Very low (~30MB) | Limited, composition-focused |
| Traefik | ~20,000 | < 2ms | Low (~40MB) | Small, middleware-focused |
The Core Gateway Patterns
Authentication and Authorization
All four gateways support API key validation, JWT verification, and OAuth2. The implementation quality varies. Kong's JWT and OAuth2 plugins are the most battle-tested. APISIX has equivalent functionality and adds OIDC support through the openid-connect plugin. For enterprise identity integration (SAML, enterprise SSO), Kong's enterprise tier has the most complete support.
Rate Limiting
Covered separately in this research series. At the gateway level: all four support per-consumer rate limiting. APISIX and Kong both use Redis for distributed state. For simple use cases, Traefik's built-in rate limiting middleware is sufficient. For complex per-consumer, per-route rate limiting at high throughput, APISIX's performance advantage matters.
Circuit Breaking
Circuit breaking (stopping requests to a degraded upstream to allow recovery) is available in all four gateways. Kong's circuit breaker is plugin-based. APISIX has built-in circuit breaking on the proxy level. Traefik implements it as a middleware. The key configuration: the failure threshold percentage, the timeout before attempting recovery, and the half-open state behavior.
Request Transformation
Request transformation — modifying headers, query parameters, or request bodies before forwarding to the upstream — is where KrakenD shines. Its declarative transformation DSL can reshape, filter, and merge API responses without code. For simple header addition or parameter mapping, all four gateways handle it adequately.
AI-Specific Gateway Patterns
The patterns above were designed for traditional HTTP APIs. AI workloads require a different set of gateway capabilities that traditional gateways are adding retroactively.
Token Counting at the Gateway
LLM requests are priced per token, not per request. A gateway that can count input tokens before forwarding a request can enforce token-based rate limits, track per-consumer token spend, and generate usage reports that match the actual cost model. Standard request counting tells you nothing about actual AI API cost.
Kong has an AI Gateway product (released 2024) that adds token counting, model routing, and LLM-specific rate limiting as first-class features. APISIX can be extended with custom Lua plugins to add token counting, but requires engineering to implement. Neither KrakenD nor Traefik has native AI gateway support.
Model Routing
Model routing sends requests to different LLM providers or model sizes based on request characteristics. Fast, cheap models for simple requests; slow, expensive models for complex ones. Load balancing across providers for reliability. Fallback routing when a provider is rate-limiting.
This pattern requires semantic understanding of the request — something traditional gateways cannot do without extension. Kong AI Gateway and LiteLLM (an AI-specific proxy) handle this natively. For teams building AI-heavy products, running a dedicated AI proxy (LiteLLM, Portkey) in front of or alongside the API gateway is often cleaner than extending a traditional gateway with AI logic.
AI gateway integration pattern
Handle auth, standard rate limiting, circuit breaking, and routing at the traditional gateway layer.
Configure the gateway to forward /ai/* or /llm/* paths to LiteLLM or Portkey instead of directly to OpenAI/Anthropic. The AI proxy handles token counting, model routing, and AI-specific rate limiting.
LiteLLM exposes usage metrics via its database or webhook. Build the pipeline from AI proxy usage data to your billing system. This is the data you need for per-token billing or cost attribution.
Managed vs Self-Hosted
| Option | Cost | Operational overhead | Customisation | SLA |
|---|---|---|---|---|
| AWS API Gateway | Pay per call (~$3.50/million) | None | Limited (Lambda integrations) | AWS SLA |
| Cloudflare API Gateway | Included with Cloudflare plans | None | Moderate (Workers) | Cloudflare SLA |
| Kong Konnect (managed) | $250-500/month | Low | Full Kong plugin ecosystem | Kong SLA |
| Self-hosted Kong/APISIX | Infrastructure only (~$50-200/month) | High | Full | Your responsibility |
The managed vs self-hosted decision is primarily an operational capacity question. If your team does not have dedicated infrastructure engineers, managed options (AWS API Gateway, Cloudflare, Kong Konnect) keep the gateway operational without overhead. If you need deep customisation, high throughput, or cost control at scale, self-hosted becomes viable when you have the operational capacity.
The common mistake: self-hosting Kong for a product with 500 QPS to save $200/month on Konnect while spending 10 engineering hours per month on gateway maintenance. The economics do not work until you are at a traffic volume where the self-hosting cost advantage is real (roughly 50K+ QPS, or very high plugin customisation requirements).
- On Kubernetes, simple requirements: Traefik
- High QPS, performance-critical, self-hosted: APISIX
- Large plugin ecosystem, mature auth patterns: Kong (or Konnect managed)
- API composition, BFF patterns, no external state: KrakenD
- Serverless, AWS-native: AWS API Gateway
- Edge-first, Cloudflare infrastructure: Cloudflare API Gateway
- AI-heavy traffic: add LiteLLM or Kong AI Gateway alongside your main gateway
When Not to Use an API Gateway
The instinct to put an API gateway in front of every service is a form of over-engineering. An API gateway adds operational complexity, a network hop, and a failure point. For small teams with two or three internal services, direct service-to-service calls over a private network are simpler and faster. The gateway earns its place when: you have multiple client types (mobile, web, third-party) with different protocol or auth requirements, you need central rate limiting or abuse prevention, or you are managing a public API that needs versioning and developer-facing documentation.
The 10-service threshold is a useful heuristic: below 10 services, evaluate whether the operational overhead of a gateway is justified by the actual problems it solves for your team. Above 10 services, the cross-cutting concerns (auth, logging, rate limiting) become expensive to implement per-service and a gateway starts paying for itself.
Gateway Comparison: Kong vs APISIX vs Traefik vs KrakenD vs AWS API Gateway
Choosing a gateway is a decision that will be expensive to reverse — migration means updating client SDKs, auth configurations, and monitoring. The choice should be driven by your team's operational capabilities, not by feature lists. A team with strong Kubernetes expertise and no AWS lock-in will make a different choice than a team running entirely on AWS. The same decision framework applies to API design choices more broadly.
| Gateway | Architecture | Configuration | Best for | Limitations |
|---|---|---|---|---|
| Kong (OSS) | Nginx + Lua plugins | Declarative YAML or Admin API | Teams needing extensive plugin ecosystem (250+ plugins) | Postgres dependency; complex HA setup |
| Apache APISIX | Nginx + Lua/etcd | etcd-backed, hot-reload | High-throughput (handles 140K RPS per node in benchmarks) | Smaller community; steeper learning curve |
| Traefik v3 | Go, provider-based | Auto-discovers from Docker/K8s labels | Container-native teams; auto-SSL via Let's Encrypt | Plugin system less mature than Kong |
| KrakenD (OSS) | Go, stateless | Declarative JSON/YAML only | API aggregation/composition; no state = easy scaling | No built-in auth; requires external IdP |
| AWS API Gateway v2 | Managed | CloudFormation/CDK/Console | AWS-native teams; Lambda integration; pay-per-request | Vendor lock-in; expensive at scale; 29ms added latency |
Kong is the most feature-rich self-hosted option but requires a PostgreSQL instance for plugin state, which becomes a high-availability concern. APISIX outperforms Kong in raw throughput benchmarks — internal tests at Tencent and Bilibili have demonstrated 140,000+ RPS per node — but the community is smaller and documentation quality is inconsistent. Traefik is the right answer for container-native teams who want automatic service discovery without writing gateway configuration: annotate your Docker or Kubernetes service, and Traefik picks it up. KrakenD is the gateway to evaluate if you are doing API composition (aggregating multiple upstream calls into a single client response) — its stateless architecture makes it trivially scalable, and its declarative configuration prevents runtime configuration drift. AWS API Gateway adds roughly 29ms of overhead compared to self-hosted options and becomes expensive at high call volumes, but it eliminates all operational concerns for teams committed to the AWS ecosystem.
Service Mesh vs API Gateway: A Clear Distinction
These two tools are frequently confused because they both sit in the network path. The distinction is traffic direction and purpose. An API gateway handles north-south traffic: external clients talking to your services. A service mesh handles east-west traffic: your services talking to each other. They solve different problems and are not alternatives — they are complements.
A service mesh (Istio, Linkerd, Cilium) provides mutual TLS between services, distributed tracing for internal calls, and fine-grained traffic policies (circuit breaking, retries, canary routing) without code changes. It operates at the infrastructure level via sidecar proxies (Envoy in Istio/Linkerd) or eBPF (Cilium). The operational overhead is substantial — Istio adds 50-100ms of latency overhead for the sidecar proxy in the worst case, and the control plane requires dedicated engineering attention.
For teams under 10 engineers, a service mesh is almost always over-engineering. For teams over 20 engineers with 10+ services and compliance requirements (mutual auth between services, audit logs for all internal calls), it starts to pay for itself. Linkerd is significantly simpler to operate than Istio and adds less overhead — if a service mesh is warranted, start there.
API Gateway for LLM APIs: A Specialised Use Case
LLM APIs expose a new class of gateway requirement: token-based rate limiting, model routing, prompt caching, and cost attribution. Generic gateways can handle some of this with custom plugins, but dedicated LLM gateway solutions (LiteLLM, Portkey, Helicone) are purpose-built for the LLM traffic pattern. They provide: automatic retries with model failover (if GPT-4o is unavailable, fall back to Claude 3.5 Sonnet), semantic caching (cache responses to semantically similar prompts, not just identical strings), and per-team cost attribution via API key tagging. For teams building products on top of multiple LLM providers, an LLM gateway is infrastructure, not optional complexity. Pair it with dual request/token rate limiting patterns to prevent runaway costs from a single misbehaving consumer.
Every public API needs a gate. Even if your auth is initially just an API key check, centralise it in the gateway from the start. Retrofitting auth onto individual services is a multi-sprint effort.
Internal single-client APIs do not need rate limiting. The moment you have external clients or multiple internal consumers, add rate limiting to prevent one consumer from starving others.
Transforming requests at the gateway to shield services from API version changes is legitimate. Transforming because your service API is poorly designed is not — fix the design.
A gateway is the ideal place to generate a correlation ID for every request and attach it to all downstream calls. Without this, debugging production issues becomes reconstructing traffic flows from disparate logs.
Gateway Observability and Debugging
An API gateway that you cannot observe is a black box between your clients and your services. Every gateway should emit: request latency histograms (p50, p95, p99) broken down by route, upstream service response codes, rate limiting trigger counts, and authentication failure counts. Kong exposes these via its Prometheus plugin. Traefik provides built-in Prometheus metrics at the /metrics endpoint. APISIX integrates with Prometheus, Datadog, and SkyWalking out of the box.
The debugging anti-pattern: when a request fails, engineers bypass the gateway and hit the service directly to "eliminate the gateway as a variable." This works for diagnosis but masks gateway-specific issues (header stripping, body size limits, timeout configurations) that only manifest through the gateway. A better approach: gateway access logs with request correlation IDs that trace through to the upstream service. When a request fails, grep the correlation ID in both gateway and service logs to pinpoint where it broke.
For teams building comprehensive observability stacks, the gateway is often the best single place to instrument — it sees every request, knows every response code, and can add trace context headers for distributed tracing.
Need this kind of thinking applied to your product?
We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.
Enjoyed this? Get the weekly digest.
Research highlights and AI news, delivered every Thursday. No spam.