What is an API gateway and what problems does it solve for growing teams?

An API gateway centralizes cross-cutting concerns — authentication, rate limiting, logging, routing, SSL termination — that would otherwise need to be implemented in every service. For growing teams, it solves: consistent auth enforcement across services, a single point for API usage monitoring and cost control, traffic management for gradual feature rollout, and centralized rate limiting for multi-tenant APIs.

When should a small team add an API gateway vs handling concerns at the application level?

Add an API gateway when: you have multiple services that share auth requirements, you need request routing logic that would otherwise be duplicated, you want to rate limit without modifying application code, or you need to expose multiple backend services under a single domain. Skip the gateway when you have one monolith — the added infrastructure complexity is not justified.

What are the most common API gateway mistakes for growing teams?

API gateway mistakes: using the gateway for business logic (it should only handle cross-cutting concerns), not setting timeout budgets per route (a slow upstream can block gateway worker threads), missing circuit breakers for downstream service failures, treating the gateway as a bottleneck (deploy it in a highly available, horizontally scalable configuration from day one), and not logging request IDs for distributed tracing.

How do you choose between Kong, AWS API Gateway, and self-hosted options?

AWS API Gateway is the right choice for teams already on AWS who want zero ops overhead and native Lambda/ECS integration. Kong is better for multi-cloud or on-premises deployments requiring deep customization via plugins. Nginx or Caddy with custom configuration is appropriate for teams with specific performance requirements or simple routing needs. Evaluate based on your team's operational capacity, not feature lists.

How does AI change API gateway requirements for SMB teams?

AI workloads add new gateway requirements: token-based rate limiting (not just request-count limiting), cost budget enforcement per API key or tenant, model-specific routing (route to the right LLM provider based on the request), and LLM response streaming support. Standard API gateways handle AI traffic adequately; specialized AI gateways (LiteLLM, Portkey) add AI-specific controls on top.

Fordel Studios

API Gateway Patterns for Growing Teams

AI agents hammer your APIs unpredictably — bursty, retry-heavy, expensive per call. Gateway choice (APISIX, Kong, KrakenD, Traefik) decides whether you survive the bill.

Abhishek Sharma· Founder, Fordel Studios

March 31, 2026Updated May 8, 202613 min read min read

Every growing API product eventually outgrows a single reverse proxy and needs a proper API gateway. The question is which one, for how long, and when to switch from managed to self-hosted (or vice versa). The decision involves performance benchmarks, operational complexity, plugin ecosystems, and increasingly, whether the gateway can handle the AI-specific patterns that traditional HTTP traffic never required.

This is the honest comparison. Performance numbers are from published benchmarks. Operational experience from real deployments. No vendor bias.

···

The Four Self-Hosted Contenders

Kong Gateway

Kong is the market leader in open-source API gateways. It runs on top of NGINX and OpenResty, has the largest plugin ecosystem (over 300 plugins), and has the deepest enterprise feature set. The governance and authentication plugins (OAuth2, JWT, OIDC, API key management) are mature and production-proven.

The honest weakness: performance. Kong's Lua-based plugin architecture adds overhead per request. At low to moderate traffic (< 5K QPS), this is invisible. At high traffic, the overhead compounds. Kong's cloud-managed version (Konnect) is a reasonable option if you want the Kong plugin ecosystem without the operational overhead of running it yourself.

Apache APISIX

APISIX also runs on NGINX/OpenResty but was architected for higher throughput. Published benchmarks show APISIX at 190% of Kong's performance on rate limiting operations — processing 23,000+ QPS on a single node with 0.2ms average latency. The plugin ecosystem is smaller than Kong's but growing rapidly, and APISIX supports plugins in multiple languages (Lua, Python, Go, Java) via a sidecar model.

APISIX is the right choice when performance at high QPS is the primary concern. The operational model is similar to Kong (both use etcd for configuration), so migration between them is feasible if your requirements change.

190%APISIX performance vs Kong on rate limiting benchmarksSingle-node rate limiting operations

23K+ QPSAPISIX on a single node0.2ms average latency on rate limiting operations

Traefik

Traefik is the Kubernetes-native gateway. It reads Kubernetes Ingress and IngressRoute resources natively, auto-discovers services, and handles TLS certificate management through Let's Encrypt integration. If your workload is on Kubernetes, Traefik has the best operational experience of any gateway.

The limitations: the plugin ecosystem is smaller than Kong or APISIX, and Traefik's rate limiting and authentication capabilities are less mature for complex use cases. It is the right choice for teams that want gateway functionality without a separate configuration plane — not for teams with complex API management requirements.

KrakenD

KrakenD occupies a different niche: API composition. Where the other gateways proxy requests, KrakenD can aggregate multiple backend responses into a single client-facing API call. A single request to KrakenD can fan out to five services, merge the responses, filter fields, and return a composed response. For BFF patterns and API facades, this is a significant operational simplification.

KrakenD is stateless and requires no external configuration store. Configuration is a static JSON file, which makes it easy to version-control and deploy. The trade-off: less dynamic; changing configuration requires a restart.

···

Performance Comparison

Gateway	Max QPS (single node)	Latency (p99)	Memory footprint	Plugin ecosystem
APISIX	23,000+	< 2ms	Low (~50MB)	Growing (300+)
Kong	~12,000	< 5ms	Medium (~100MB)	Large (300+ commercial/OSS)
KrakenD	25,000+	< 1ms	Very low (~30MB)	Limited, composition-focused
Traefik	~20,000	< 2ms	Low (~40MB)	Small, middleware-focused

···

The Core Gateway Patterns

Authentication and Authorization

All four gateways support API key validation, JWT verification, and OAuth2. The implementation quality varies. Kong's JWT and OAuth2 plugins are the most battle-tested. APISIX has equivalent functionality and adds OIDC support through the openid-connect plugin. For enterprise identity integration (SAML, enterprise SSO), Kong's enterprise tier has the most complete support.

Rate Limiting

Covered separately in this research series. At the gateway level: all four support per-consumer rate limiting. APISIX and Kong both use Redis for distributed state. For simple use cases, Traefik's built-in rate limiting middleware is sufficient. For complex per-consumer, per-route rate limiting at high throughput, APISIX's performance advantage matters.

Circuit Breaking

Circuit breaking (stopping requests to a degraded upstream to allow recovery) is available in all four gateways. Kong's circuit breaker is plugin-based. APISIX has built-in circuit breaking on the proxy level. Traefik implements it as a middleware. The key configuration: the failure threshold percentage, the timeout before attempting recovery, and the half-open state behavior.

Request Transformation

Request transformation — modifying headers, query parameters, or request bodies before forwarding to the upstream — is where KrakenD shines. Its declarative transformation DSL can reshape, filter, and merge API responses without code. For simple header addition or parameter mapping, all four gateways handle it adequately.

···

AI-Specific Gateway Patterns

The patterns above were designed for traditional HTTP APIs. AI workloads require a different set of gateway capabilities that traditional gateways are adding retroactively.

Token Counting at the Gateway

LLM requests are priced per token, not per request. A gateway that can count input tokens before forwarding a request can enforce token-based rate limits, track per-consumer token spend, and generate usage reports that match the actual cost model. Standard request counting tells you nothing about actual AI API cost.

Kong has an AI Gateway product (released 2024) that adds token counting, model routing, and LLM-specific rate limiting as first-class features. APISIX can be extended with custom Lua plugins to add token counting, but requires engineering to implement. Neither KrakenD nor Traefik has native AI gateway support.

Model Routing

Model routing sends requests to different LLM providers or model sizes based on request characteristics. Fast, cheap models for simple requests; slow, expensive models for complex ones. Load balancing across providers for reliability. Fallback routing when a provider is rate-limiting.

This pattern requires semantic understanding of the request — something traditional gateways cannot do without extension. Kong AI Gateway and LiteLLM (an AI-specific proxy) handle this natively. For teams building AI-heavy products, running a dedicated AI proxy (LiteLLM, Portkey) in front of or alongside the API gateway is often cleaner than extending a traditional gateway with AI logic.

AI gateway integration pattern

Run a traditional gateway (Kong or APISIX) for all traffic

Handle auth, standard rate limiting, circuit breaking, and routing at the traditional gateway layer.

Route AI-bound traffic to a dedicated AI proxy

Configure the gateway to forward /ai/* or /llm/* paths to LiteLLM or Portkey instead of directly to OpenAI/Anthropic. The AI proxy handles token counting, model routing, and AI-specific rate limiting.

Aggregate cost data from the AI proxy into your billing system

LiteLLM exposes usage metrics via its database or webhook. Build the pipeline from AI proxy usage data to your billing system. This is the data you need for per-token billing or cost attribution.

···

Managed vs Self-Hosted

Option	Cost	Operational overhead	Customisation	SLA
AWS API Gateway	Pay per call (~$3.50/million)	None	Limited (Lambda integrations)	AWS SLA
Cloudflare API Gateway	Included with Cloudflare plans	None	Moderate (Workers)	Cloudflare SLA
Kong Konnect (managed)	$250-500/month	Low	Full Kong plugin ecosystem	Kong SLA
Self-hosted Kong/APISIX	Infrastructure only (~$50-200/month)	High	Full	Your responsibility

The managed vs self-hosted decision is primarily an operational capacity question. If your team does not have dedicated infrastructure engineers, managed options (AWS API Gateway, Cloudflare, Kong Konnect) keep the gateway operational without overhead. If you need deep customisation, high throughput, or cost control at scale, self-hosted becomes viable when you have the operational capacity.

The common mistake: self-hosting Kong for a product with 500 QPS to save $200/month on Konnect while spending 10 engineering hours per month on gateway maintenance. The economics do not work until you are at a traffic volume where the self-hosting cost advantage is real (roughly 50K+ QPS, or very high plugin customisation requirements).

Gateway selection cheat sheet

On Kubernetes, simple requirements: Traefik
High QPS, performance-critical, self-hosted: APISIX
Large plugin ecosystem, mature auth patterns: Kong (or Konnect managed)
API composition, BFF patterns, no external state: KrakenD
Serverless, AWS-native: AWS API Gateway
Edge-first, Cloudflare infrastructure: Cloudflare API Gateway
AI-heavy traffic: add LiteLLM or Kong AI Gateway alongside your main gateway

Updated 2026-03-31

OpenAI has added a plugin system to Codex specifically to help enterprises govern AI coding agents — a clear signal that AI API governance is moving from custom middleware into first-party tooling, which changes the calculus on building your own gateway plugins for LLM traffic shaping. Separately, the Axios NPM supply chain compromise (malicious versions dropping a remote access trojan) underscores why API gateways with strict egress filtering and dependency verification matter more than ever, especially for teams routing agent traffic through third-party endpoints. The Foundation for Defense of Democracies also published new security considerations for AI agents, reinforcing that gateway-level auth, rate limiting, and audit logging for agentic API calls are becoming compliance requirements, not optional hardening.

···

When Not to Use an API Gateway

The instinct to put an API gateway in front of every service is a form of over-engineering. An API gateway adds operational complexity, a network hop, and a failure point. For small teams with two or three internal services, direct service-to-service calls over a private network are simpler and faster. The gateway earns its place when: you have multiple client types (mobile, web, third-party) with different protocol or auth requirements, you need central rate limiting or abuse prevention, or you are managing a public API that needs versioning and developer-facing documentation.

The 10-service threshold is a useful heuristic: below 10 services, evaluate whether the operational overhead of a gateway is justified by the actual problems it solves for your team. Above 10 services, the cross-cutting concerns (auth, logging, rate limiting) become expensive to implement per-service and a gateway starts paying for itself.

···

Gateway Comparison: Kong vs APISIX vs Traefik vs KrakenD vs AWS API Gateway

Choosing a gateway is a decision that will be expensive to reverse — migration means updating client SDKs, auth configurations, and monitoring. The choice should be driven by your team's operational capabilities, not by feature lists. A team with strong Kubernetes expertise and no AWS lock-in will make a different choice than a team running entirely on AWS. The same decision framework applies to API design choices more broadly.

Gateway	Architecture	Configuration	Best for	Limitations
Kong (OSS)	Nginx + Lua plugins	Declarative YAML or Admin API	Teams needing extensive plugin ecosystem (250+ plugins)	Postgres dependency; complex HA setup
Apache APISIX	Nginx + Lua/etcd	etcd-backed, hot-reload	High-throughput (handles 140K RPS per node in benchmarks)	Smaller community; steeper learning curve
Traefik v3	Go, provider-based	Auto-discovers from Docker/K8s labels	Container-native teams; auto-SSL via Let's Encrypt	Plugin system less mature than Kong
KrakenD (OSS)	Go, stateless	Declarative JSON/YAML only	API aggregation/composition; no state = easy scaling	No built-in auth; requires external IdP
AWS API Gateway v2	Managed	CloudFormation/CDK/Console	AWS-native teams; Lambda integration; pay-per-request	Vendor lock-in; expensive at scale; 29ms added latency

Kong is the most feature-rich self-hosted option but requires a PostgreSQL instance for plugin state, which becomes a high-availability concern. APISIX outperforms Kong in raw throughput benchmarks — internal tests at Tencent and Bilibili have demonstrated 140,000+ RPS per node — but the community is smaller and documentation quality is inconsistent. Traefik is the right answer for container-native teams who want automatic service discovery without writing gateway configuration: annotate your Docker or Kubernetes service, and Traefik picks it up. KrakenD is the gateway to evaluate if you are doing API composition (aggregating multiple upstream calls into a single client response) — its stateless architecture makes it trivially scalable, and its declarative configuration prevents runtime configuration drift. AWS API Gateway adds roughly 29ms of overhead compared to self-hosted options and becomes expensive at high call volumes, but it eliminates all operational concerns for teams committed to the AWS ecosystem.

···

Service Mesh vs API Gateway: A Clear Distinction

These two tools are frequently confused because they both sit in the network path. The distinction is traffic direction and purpose. An API gateway handles north-south traffic: external clients talking to your services. A service mesh handles east-west traffic: your services talking to each other. They solve different problems and are not alternatives — they are complements.

A service mesh (Istio, Linkerd, Cilium) provides mutual TLS between services, distributed tracing for internal calls, and fine-grained traffic policies (circuit breaking, retries, canary routing) without code changes. It operates at the infrastructure level via sidecar proxies (Envoy in Istio/Linkerd) or eBPF (Cilium). The operational overhead is substantial — Istio adds 50-100ms of latency overhead for the sidecar proxy in the worst case, and the control plane requires dedicated engineering attention.

For teams under 10 engineers, a service mesh is almost always over-engineering. For teams over 20 engineers with 10+ services and compliance requirements (mutual auth between services, audit logs for all internal calls), it starts to pay for itself. Linkerd is significantly simpler to operate than Istio and adds less overhead — if a service mesh is warranted, start there.

···

API Gateway for LLM APIs: A Specialised Use Case

LLM APIs expose a new class of gateway requirement: token-based rate limiting, model routing, prompt caching, and cost attribution. Generic gateways can handle some of this with custom plugins, but dedicated LLM gateway solutions (LiteLLM, Portkey, Helicone) are purpose-built for the LLM traffic pattern. They provide: automatic retries with model failover (if GPT-4o is unavailable, fall back to Claude 3.5 Sonnet), semantic caching (cache responses to semantically similar prompts, not just identical strings), and per-team cost attribution via API key tagging. For teams building products on top of multiple LLM providers, an LLM gateway is infrastructure, not optional complexity. Pair it with dual request/token rate limiting patterns to prevent runaway costs from a single misbehaving consumer.

Authentication/authorisation — add at day 1

Every public API needs a gate. Even if your auth is initially just an API key check, centralise it in the gateway from the start. Retrofitting auth onto individual services is a multi-sprint effort.

Rate limiting — add when you go public or have more than one client

Internal single-client APIs do not need rate limiting. The moment you have external clients or multiple internal consumers, add rate limiting to prevent one consumer from starving others.

Request/response transformation — add only when you have backward compatibility debt

Transforming requests at the gateway to shield services from API version changes is legitimate. Transforming because your service API is poorly designed is not — fix the design.

Observability (request logging, tracing) — add at day 1 in production

A gateway is the ideal place to generate a correlation ID for every request and attach it to all downstream calls. Without this, debugging production issues becomes reconstructing traffic flows from disparate logs.

···

Gateway Observability and Debugging

An API gateway that you cannot observe is a black box between your clients and your services. Every gateway should emit: request latency histograms (p50, p95, p99) broken down by route, upstream service response codes, rate limiting trigger counts, and authentication failure counts. Kong exposes these via its Prometheus plugin. Traefik provides built-in Prometheus metrics at the /metrics endpoint. APISIX integrates with Prometheus, Datadog, and SkyWalking out of the box.

The debugging anti-pattern: when a request fails, engineers bypass the gateway and hit the service directly to "eliminate the gateway as a variable." This works for diagnosis but masks gateway-specific issues (header stripping, body size limits, timeout configurations) that only manifest through the gateway. A better approach: gateway access logs with request correlation IDs that trace through to the upstream service. When a request fails, grep the correlation ID in both gateway and service logs to pinpoint where it broke.

For teams building comprehensive observability stacks, the gateway is often the best single place to instrument — it sees every request, knows every response code, and can add trace context headers for distributed tracing.

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

All articles