The API Gateway Pattern in Microservices (and When It Bites)

A few years back I was chasing a bug where product pages loaded fine for some users and timed out for others. Same code, same request, different result. Turned out one instance of our API gateway had a stale routing config and was happily forwarding traffic to a service that had been scaled down to zero. The gateway didn’t know. It just kept knocking on a door nobody was behind.

That was the moment the gateway stopped being “infrastructure I don’t think about” and became the thing I think about first.

what the gateway actually does for you

Picture a small shop backend. You’ve got a users service, an orders service, and an inventory service, each its own deployment, each with its own database. A phone app needs to render a “your order” screen: customer name, the line items, whether each item is still in stock.

Without a gateway, the app calls three services directly. It needs to know three hostnames, handle three auth handshakes, and stitch three responses together on a device that’s probably on spotty mobile data. Gross.

The gateway sits in front and gives the client one door. It does the boring-but-critical stuff:

Routing - /orders/* goes to orders, /inventory/* goes to inventory. The client never learns the internal topology.
Auth - validate the token once, at the edge, then pass a trusted identity inward.
Aggregation - fan out to three services, merge, return one payload.
Rate limiting - one place to say “this API key gets 100 requests a minute.”

Here’s the opinion I’ll die on: the gateway is where cross-cutting concerns go. Auth, rate limiting, request logging, header normalization. The second I see that logic copy-pasted into every service, I know someone skipped the gateway and is now maintaining the same middleware in five repos with three subtle behavioral differences between them.

A stripped-down routing config looks about like this:

routes:
  - path: /orders
    service: http://orders:8080
    auth: required
    rate_limit: 100/min
  - path: /inventory
    service: http://inventory:8080
    auth: required
    rate_limit: 500/min   # reads are cheap, let 'em through

The one non-obvious thing here: those per-route limits are a feature, not decoration. Inventory reads are cheap and hammered constantly by the catalog. Order writes touch a database and a payment flow. Giving them the same budget is how you let a burst of window-shopping starve out actual paying customers.

validate the token once, trust it inward

The auth pattern that made everything click for me: the gateway is the only thing that talks to the outside world’s messy tokens. It verifies the JWT, then rewrites the request with a clean internal header before it ever reaches a service.

# what comes in from the internet
Authorization: Bearer eyJhbGci...   (untrusted, could be forged)

# what the gateway forwards internally
X-User-Id: 4412
X-User-Roles: customer

Now the orders service never parses a JWT. It reads X-User-Id and trusts it, because the only way that header exists is if the gateway put it there. Your internal services get dramatically simpler.

The trap - and I’ve watched this go wrong - is when a service is also reachable directly, bypassing the gateway. Now that “trusted” header is trivially forgeable by anyone who can reach the pod. If you do the trust-the-header thing, you have to actually lock the services down to gateway-only traffic. Network policy, mTLS, something. Otherwise you’ve built a lovely front door and left the back wide open.

the part where it bites

Everything above is the sales pitch. Here’s the stuff that cost me sleep.

It’s a single point of failure, and it’s the one. When the gateway is down, everything is down. Not one feature - everything. Individual services could be perfectly healthy and users still see nothing, because nobody can reach them. You run multiple gateway instances behind a load balancer, sure, but you’ve now concentrated all your risk into one tier and you’d better treat its config changes with the same fear you’d treat a database migration. My stale-config outage from the intro? That was one bad instance. Imagine all of them.

Aggregation makes the gateway slow, and its slowness is everyone’s slowness. That merge-three-services endpoint is only as fast as the slowest of the three. If inventory has a bad day and takes 900ms, the whole order screen takes 900ms, even though users and orders answered in 20. And if you’re not careful with timeouts, one sluggish downstream service ties up gateway connections until the gateway itself falls over. Slap aggressive timeouts and circuit breakers on those fan-out calls or the failure propagates upward and outward.

It quietly becomes a monolith. This is the sneaky one. Every new feature needs “just a little” routing tweak, a new aggregation, a special header rule. A year in, your gateway config has business logic in it - which product categories are visible, how discounts get computed - and it’s owned by everyone, which means it’s owned by no one. Changing it is scary. You’ve rebuilt the coupling you split services up to escape, except now it lives in a YAML file nobody wants to touch.

gateway vs BFF, and where federation fits

Two things I get asked about a lot.

A BFF - backend-for-frontend - is a gateway that stops pretending to be generic. Instead of one gateway for everyone, you build one per client: a BFF for the mobile app, another for the web app. Why bother? Because the mobile app wants a lean payload and the web dashboard wants the kitchen sink, and cramming both into one endpoint means it serves neither well. I reach for a BFF the moment two clients start fighting over the shape of the same response. Before that, one gateway is plenty - don’t build three things when one does the job.

GraphQL federation is another take on aggregation. Instead of hand-writing “call these three, merge like so” in the gateway, each service exposes a slice of one graph, and a router figures out the fan-out from the query. The client asks for exactly the fields it wants and the router splits the work. When it clicks, it’s genuinely lovely - no more bespoke aggregation endpoints piling up.

But it’s not free. You’ve traded hand-written glue for a federation layer that’s its own thing to learn, operate, and debug. When a federated query is slow, tracing which subgraph dragged it down is a special kind of hell compared to reading three plain HTTP calls in a log. I’d only reach for federation once the number of hand-rolled aggregation endpoints has genuinely gotten out of hand. On a small shop backend with three services? Way overkill. A plain gateway with a couple of aggregation routes will serve you for years.

that’s basically it

A gateway is one of those patterns that’s almost always worth it - the routing, the single auth point, the one door for clients. I’d add one early and rarely regret it. Just go in clear-eyed:

It’s your most important single point of failure. Run more than one, and fear its config changes.
Timeout and circuit-break every fan-out, or one slow service takes the whole thing down.
Keep business logic out of it, or you’ve quietly rebuilt the monolith in YAML.

Reach for a BFF when clients start fighting over payload shape. Reach for federation when hand-written aggregation genuinely stops scaling - and not one day before. Most days, the boring gateway is exactly the right amount of clever.