Skip to main content

API Gateway Patterns for Growing Teams

APISIX runs at 190% of Kong's performance on rate limiting benchmarks. KrakenD handles API composition without code. Traefik is your gateway if you are already on Kubernetes. This is the honest comparison — plus how to handle the AI-specific patterns no traditional gateway was designed for.

Abhishek Sharma· Head of Engg @ Fordel Studios
13 min read min read
API Gateway Patterns for Growing Teams

Every growing API product eventually outgrows a single reverse proxy and needs a proper API gateway. The question is which one, for how long, and when to switch from managed to self-hosted (or vice versa). The decision involves performance benchmarks, operational complexity, plugin ecosystems, and increasingly, whether the gateway can handle the AI-specific patterns that traditional HTTP traffic never required.

This is the honest comparison. Performance numbers are from published benchmarks. Operational experience from real deployments. No vendor bias.

···

The Four Self-Hosted Contenders

Kong Gateway

Kong is the market leader in open-source API gateways. It runs on top of NGINX and OpenResty, has the largest plugin ecosystem (over 300 plugins), and has the deepest enterprise feature set. The governance and authentication plugins (OAuth2, JWT, OIDC, API key management) are mature and production-proven.

The honest weakness: performance. Kong's Lua-based plugin architecture adds overhead per request. At low to moderate traffic (< 5K QPS), this is invisible. At high traffic, the overhead compounds. Kong's cloud-managed version (Konnect) is a reasonable option if you want the Kong plugin ecosystem without the operational overhead of running it yourself.

Apache APISIX

APISIX also runs on NGINX/OpenResty but was architected for higher throughput. Published benchmarks show APISIX at 190% of Kong's performance on rate limiting operations — processing 23,000+ QPS on a single node with 0.2ms average latency. The plugin ecosystem is smaller than Kong's but growing rapidly, and APISIX supports plugins in multiple languages (Lua, Python, Go, Java) via a sidecar model.

APISIX is the right choice when performance at high QPS is the primary concern. The operational model is similar to Kong (both use etcd for configuration), so migration between them is feasible if your requirements change.

190%APISIX performance vs Kong on rate limiting benchmarksSingle-node rate limiting operations
23K+ QPSAPISIX on a single node0.2ms average latency on rate limiting operations

Traefik

Traefik is the Kubernetes-native gateway. It reads Kubernetes Ingress and IngressRoute resources natively, auto-discovers services, and handles TLS certificate management through Let's Encrypt integration. If your workload is on Kubernetes, Traefik has the best operational experience of any gateway.

The limitations: the plugin ecosystem is smaller than Kong or APISIX, and Traefik's rate limiting and authentication capabilities are less mature for complex use cases. It is the right choice for teams that want gateway functionality without a separate configuration plane — not for teams with complex API management requirements.

KrakenD

KrakenD occupies a different niche: API composition. Where the other gateways proxy requests, KrakenD can aggregate multiple backend responses into a single client-facing API call. A single request to KrakenD can fan out to five services, merge the responses, filter fields, and return a composed response. For BFF patterns and API facades, this is a significant operational simplification.

KrakenD is stateless and requires no external configuration store. Configuration is a static JSON file, which makes it easy to version-control and deploy. The trade-off: less dynamic; changing configuration requires a restart.

···

Performance Comparison

GatewayMax QPS (single node)Latency (p99)Memory footprintPlugin ecosystem
APISIX23,000+< 2msLow (~50MB)Growing (300+)
Kong~12,000< 5msMedium (~100MB)Large (300+ commercial/OSS)
KrakenD25,000+< 1msVery low (~30MB)Limited, composition-focused
Traefik~20,000< 2msLow (~40MB)Small, middleware-focused
···

The Core Gateway Patterns

Authentication and Authorization

All four gateways support API key validation, JWT verification, and OAuth2. The implementation quality varies. Kong's JWT and OAuth2 plugins are the most battle-tested. APISIX has equivalent functionality and adds OIDC support through the openid-connect plugin. For enterprise identity integration (SAML, enterprise SSO), Kong's enterprise tier has the most complete support.

Rate Limiting

Covered separately in this research series. At the gateway level: all four support per-consumer rate limiting. APISIX and Kong both use Redis for distributed state. For simple use cases, Traefik's built-in rate limiting middleware is sufficient. For complex per-consumer, per-route rate limiting at high throughput, APISIX's performance advantage matters.

Circuit Breaking

Circuit breaking (stopping requests to a degraded upstream to allow recovery) is available in all four gateways. Kong's circuit breaker is plugin-based. APISIX has built-in circuit breaking on the proxy level. Traefik implements it as a middleware. The key configuration: the failure threshold percentage, the timeout before attempting recovery, and the half-open state behavior.

Request Transformation

Request transformation — modifying headers, query parameters, or request bodies before forwarding to the upstream — is where KrakenD shines. Its declarative transformation DSL can reshape, filter, and merge API responses without code. For simple header addition or parameter mapping, all four gateways handle it adequately.

···

AI-Specific Gateway Patterns

The patterns above were designed for traditional HTTP APIs. AI workloads require a different set of gateway capabilities that traditional gateways are adding retroactively.

Token Counting at the Gateway

LLM requests are priced per token, not per request. A gateway that can count input tokens before forwarding a request can enforce token-based rate limits, track per-consumer token spend, and generate usage reports that match the actual cost model. Standard request counting tells you nothing about actual AI API cost.

Kong has an AI Gateway product (released 2024) that adds token counting, model routing, and LLM-specific rate limiting as first-class features. APISIX can be extended with custom Lua plugins to add token counting, but requires engineering to implement. Neither KrakenD nor Traefik has native AI gateway support.

Model Routing

Model routing sends requests to different LLM providers or model sizes based on request characteristics. Fast, cheap models for simple requests; slow, expensive models for complex ones. Load balancing across providers for reliability. Fallback routing when a provider is rate-limiting.

This pattern requires semantic understanding of the request — something traditional gateways cannot do without extension. Kong AI Gateway and LiteLLM (an AI-specific proxy) handle this natively. For teams building AI-heavy products, running a dedicated AI proxy (LiteLLM, Portkey) in front of or alongside the API gateway is often cleaner than extending a traditional gateway with AI logic.

AI gateway integration pattern

01
Run a traditional gateway (Kong or APISIX) for all traffic

Handle auth, standard rate limiting, circuit breaking, and routing at the traditional gateway layer.

02
Route AI-bound traffic to a dedicated AI proxy

Configure the gateway to forward /ai/* or /llm/* paths to LiteLLM or Portkey instead of directly to OpenAI/Anthropic. The AI proxy handles token counting, model routing, and AI-specific rate limiting.

03
Aggregate cost data from the AI proxy into your billing system

LiteLLM exposes usage metrics via its database or webhook. Build the pipeline from AI proxy usage data to your billing system. This is the data you need for per-token billing or cost attribution.

···

Managed vs Self-Hosted

OptionCostOperational overheadCustomisationSLA
AWS API GatewayPay per call (~$3.50/million)NoneLimited (Lambda integrations)AWS SLA
Cloudflare API GatewayIncluded with Cloudflare plansNoneModerate (Workers)Cloudflare SLA
Kong Konnect (managed)$250-500/monthLowFull Kong plugin ecosystemKong SLA
Self-hosted Kong/APISIXInfrastructure only (~$50-200/month)HighFullYour responsibility

The managed vs self-hosted decision is primarily an operational capacity question. If your team does not have dedicated infrastructure engineers, managed options (AWS API Gateway, Cloudflare, Kong Konnect) keep the gateway operational without overhead. If you need deep customisation, high throughput, or cost control at scale, self-hosted becomes viable when you have the operational capacity.

The common mistake: self-hosting Kong for a product with 500 QPS to save $200/month on Konnect while spending 10 engineering hours per month on gateway maintenance. The economics do not work until you are at a traffic volume where the self-hosting cost advantage is real (roughly 50K+ QPS, or very high plugin customisation requirements).

Gateway selection cheat sheet
  • On Kubernetes, simple requirements: Traefik
  • High QPS, performance-critical, self-hosted: APISIX
  • Large plugin ecosystem, mature auth patterns: Kong (or Konnect managed)
  • API composition, BFF patterns, no external state: KrakenD
  • Serverless, AWS-native: AWS API Gateway
  • Edge-first, Cloudflare infrastructure: Cloudflare API Gateway
  • AI-heavy traffic: add LiteLLM or Kong AI Gateway alongside your main gateway
···

When Not to Use an API Gateway

The instinct to put an API gateway in front of every service is a form of over-engineering. An API gateway adds operational complexity, a network hop, and a failure point. For small teams with two or three internal services, direct service-to-service calls over a private network are simpler and faster. The gateway earns its place when: you have multiple client types (mobile, web, third-party) with different protocol or auth requirements, you need central rate limiting or abuse prevention, or you are managing a public API that needs versioning and developer-facing documentation.

The 10-service threshold is a useful heuristic: below 10 services, evaluate whether the operational overhead of a gateway is justified by the actual problems it solves for your team. Above 10 services, the cross-cutting concerns (auth, logging, rate limiting) become expensive to implement per-service and a gateway starts paying for itself.

···

Gateway Comparison: Kong vs APISIX vs Traefik vs KrakenD vs AWS API Gateway

Choosing a gateway is a decision that will be expensive to reverse — migration means updating client SDKs, auth configurations, and monitoring. The choice should be driven by your team's operational capabilities, not by feature lists. A team with strong Kubernetes expertise and no AWS lock-in will make a different choice than a team running entirely on AWS. The same decision framework applies to API design choices more broadly.

GatewayArchitectureConfigurationBest forLimitations
Kong (OSS)Nginx + Lua pluginsDeclarative YAML or Admin APITeams needing extensive plugin ecosystem (250+ plugins)Postgres dependency; complex HA setup
Apache APISIXNginx + Lua/etcdetcd-backed, hot-reloadHigh-throughput (handles 140K RPS per node in benchmarks)Smaller community; steeper learning curve
Traefik v3Go, provider-basedAuto-discovers from Docker/K8s labelsContainer-native teams; auto-SSL via Let's EncryptPlugin system less mature than Kong
KrakenD (OSS)Go, statelessDeclarative JSON/YAML onlyAPI aggregation/composition; no state = easy scalingNo built-in auth; requires external IdP
AWS API Gateway v2ManagedCloudFormation/CDK/ConsoleAWS-native teams; Lambda integration; pay-per-requestVendor lock-in; expensive at scale; 29ms added latency

Kong is the most feature-rich self-hosted option but requires a PostgreSQL instance for plugin state, which becomes a high-availability concern. APISIX outperforms Kong in raw throughput benchmarks — internal tests at Tencent and Bilibili have demonstrated 140,000+ RPS per node — but the community is smaller and documentation quality is inconsistent. Traefik is the right answer for container-native teams who want automatic service discovery without writing gateway configuration: annotate your Docker or Kubernetes service, and Traefik picks it up. KrakenD is the gateway to evaluate if you are doing API composition (aggregating multiple upstream calls into a single client response) — its stateless architecture makes it trivially scalable, and its declarative configuration prevents runtime configuration drift. AWS API Gateway adds roughly 29ms of overhead compared to self-hosted options and becomes expensive at high call volumes, but it eliminates all operational concerns for teams committed to the AWS ecosystem.

···

Service Mesh vs API Gateway: A Clear Distinction

These two tools are frequently confused because they both sit in the network path. The distinction is traffic direction and purpose. An API gateway handles north-south traffic: external clients talking to your services. A service mesh handles east-west traffic: your services talking to each other. They solve different problems and are not alternatives — they are complements.

A service mesh (Istio, Linkerd, Cilium) provides mutual TLS between services, distributed tracing for internal calls, and fine-grained traffic policies (circuit breaking, retries, canary routing) without code changes. It operates at the infrastructure level via sidecar proxies (Envoy in Istio/Linkerd) or eBPF (Cilium). The operational overhead is substantial — Istio adds 50-100ms of latency overhead for the sidecar proxy in the worst case, and the control plane requires dedicated engineering attention.

For teams under 10 engineers, a service mesh is almost always over-engineering. For teams over 20 engineers with 10+ services and compliance requirements (mutual auth between services, audit logs for all internal calls), it starts to pay for itself. Linkerd is significantly simpler to operate than Istio and adds less overhead — if a service mesh is warranted, start there.

···

API Gateway for LLM APIs: A Specialised Use Case

LLM APIs expose a new class of gateway requirement: token-based rate limiting, model routing, prompt caching, and cost attribution. Generic gateways can handle some of this with custom plugins, but dedicated LLM gateway solutions (LiteLLM, Portkey, Helicone) are purpose-built for the LLM traffic pattern. They provide: automatic retries with model failover (if GPT-4o is unavailable, fall back to Claude 3.5 Sonnet), semantic caching (cache responses to semantically similar prompts, not just identical strings), and per-team cost attribution via API key tagging. For teams building products on top of multiple LLM providers, an LLM gateway is infrastructure, not optional complexity. Pair it with dual request/token rate limiting patterns to prevent runaway costs from a single misbehaving consumer.

01
Authentication/authorisation — add at day 1

Every public API needs a gate. Even if your auth is initially just an API key check, centralise it in the gateway from the start. Retrofitting auth onto individual services is a multi-sprint effort.

02
Rate limiting — add when you go public or have more than one client

Internal single-client APIs do not need rate limiting. The moment you have external clients or multiple internal consumers, add rate limiting to prevent one consumer from starving others.

03
Request/response transformation — add only when you have backward compatibility debt

Transforming requests at the gateway to shield services from API version changes is legitimate. Transforming because your service API is poorly designed is not — fix the design.

04
Observability (request logging, tracing) — add at day 1 in production

A gateway is the ideal place to generate a correlation ID for every request and attach it to all downstream calls. Without this, debugging production issues becomes reconstructing traffic flows from disparate logs.

···

Gateway Observability and Debugging

An API gateway that you cannot observe is a black box between your clients and your services. Every gateway should emit: request latency histograms (p50, p95, p99) broken down by route, upstream service response codes, rate limiting trigger counts, and authentication failure counts. Kong exposes these via its Prometheus plugin. Traefik provides built-in Prometheus metrics at the /metrics endpoint. APISIX integrates with Prometheus, Datadog, and SkyWalking out of the box.

The debugging anti-pattern: when a request fails, engineers bypass the gateway and hit the service directly to "eliminate the gateway as a variable." This works for diagnosis but masks gateway-specific issues (header stripping, body size limits, timeout configurations) that only manifest through the gateway. A better approach: gateway access logs with request correlation IDs that trace through to the upstream service. When a request fails, grep the correlation ID in both gateway and service logs to pinpoint where it broke.

For teams building comprehensive observability stacks, the gateway is often the best single place to instrument — it sees every request, knows every response code, and can add trace context headers for distributed tracing.

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...