Skip to main content
Research
AI & Tools9 min read

Your AI Gateway Got Hacked and You Didn't Notice for Hours

LiteLLM got hit with a malware attack this week. If you're running an open-source LLM gateway in production, the question isn't whether your dependencies are safe — it's whether you'd even know if they weren't.

AuthorAbhishek Sharma· Head of Engg @ Fordel Studios

This week, someone published a detailed minute-by-minute account of responding to a malware attack on LiteLLM — one of the most popular open-source LLM proxy and gateway tools in the ecosystem. Around the same time, Kaspersky flagged infostealers disguised as AI developer tools, including packages masquerading as Claude Code. Two data points that, taken together, paint a picture I find genuinely unsettling.

We are building production AI infrastructure on top of a dependency graph that nobody is seriously auditing. And the attackers have figured that out.

···

The AI Infrastructure Trust Problem

Let me be direct about what happened. LiteLLM is a gateway layer that sits between your application and every LLM API you call. It handles routing, load balancing, fallback logic, rate limiting, cost tracking. If you're running multi-model in production — and in 2026, who isn't — there's a decent chance you're running LiteLLM or something like it.

That means this one dependency has access to every API key for every model provider you use. It sees every prompt you send. It sees every response you get back. It handles your cost data, your usage patterns, your entire AI interaction history.

When your gateway layer gets compromised, the attacker doesn't just get one key — they get the skeleton key to your entire AI stack.

This is not a theoretical risk. This is what actually happened. Someone got malicious code into the supply chain, and the response — while apparently handled well — required the kind of minute-by-minute incident response that most teams running LiteLLM in production are simply not prepared for.

···

Why AI Tooling Is a Uniquely Attractive Target

I've been writing about supply chain security for years, long before AI made it sexy. But AI infrastructure has a property that makes it qualitatively different from your average npm package getting typosquatted.

Traditional supply chain attacks target code execution. You get RCE on a build server, maybe you exfiltrate some environment variables, perhaps you inject a backdoor. Bad, but bounded.

AI infrastructure attacks target data flow. Your LLM gateway sees everything. Your prompt templates, your system instructions, your RAG context, your user queries, your model responses. For companies building AI products, this is the crown jewels. Not the code — the data that flows through the code.

And here is the part that keeps me up at night: the Kaspersky findings about infostealers disguised as AI developer tools are a separate, parallel attack vector targeting the same ecosystem. This is not one incident. This is a pattern. Attackers are systematically targeting AI tooling because that is where the high-value data flows.

···

The Open Source Gateway Dilemma

I recommended LiteLLM in an article I wrote three days ago about LLM gateways. I stand by that recommendation — it is a genuinely useful tool. But I need to be honest about the tradeoff we're all making.

Open-source LLM gateways give you flexibility, cost savings, and avoid vendor lock-in. They also give you a dependency that handles your most sensitive data, maintained by a team you don't control, funded in ways you might not fully understand, with a contributor base you haven't audited.

This is not unique to LiteLLM. It is true of every open-source tool in the AI gateway space. Portkey has an open-source option. There are smaller projects. They all have the same fundamental trust problem.

The managed alternatives — Cloudflare AI Gateway, AWS Bedrock's routing, Azure AI Gateway — solve the trust problem by shifting it. You trust the cloud provider instead. Whether that is actually better depends on your threat model, but at least those providers have dedicated security teams, SOC 2 compliance, and contractual obligations.

The gateway trust spectrum
  • Self-hosted open source: Maximum control, maximum responsibility. You own every CVE, every dependency audit, every incident response.
  • Managed open source (vendor-hosted): Shared responsibility. The vendor handles infrastructure security, you handle configuration and access.
  • Cloud provider native: Minimum responsibility, minimum flexibility. The provider's security posture is your security posture.
  • Build your own: Maximum control, maximum engineering cost. Only makes sense at serious scale.
···

What Most Teams Actually Get Wrong

Here is what I see when I audit AI infrastructure for clients at Fordel. Almost nobody treats their LLM gateway as a security-critical component. They treat it as a convenience layer — a nice-to-have that saves some boilerplate code for multi-model routing.

The gateway runs in the same network as everything else. It has the same access controls as your internal services. Nobody is monitoring its outbound connections. Nobody has pinned its dependency versions. Nobody has a runbook for "what happens if the gateway is compromised."

Compare this to how the same teams treat their database. Separate network segment. Encrypted at rest and in transit. Access logging. Regular backups. Tested restore procedures. Penetration tested annually.

Your LLM gateway handles data that is arguably more sensitive than what sits in your database — because it includes the prompts and reasoning that reveal your business logic, not just the static data. And yet it gets the security treatment of a utility library.

We treat our LLM gateways like logging libraries and our databases like Fort Knox. The data sensitivity is backwards.
···

The Detection Gap Is the Real Problem

The LiteLLM incident was caught and responded to. That is good. But I want to ask a harder question: how long would it have taken your team to notice?

Most organizations running LLM gateways do not have observability on the gateway's own behavior — as opposed to the AI traffic flowing through it. They monitor latency, error rates, token usage. They do not monitor whether the gateway itself is making unexpected network connections, loading unexpected modules, or exfiltrating data to unexpected endpoints.

This is the detection gap. Your AI observability stack watches what the models do. Nobody watches what the infrastructure between you and the models does.

This is not a tooling problem. The tools exist. Container network policies, dependency scanning, SBOM generation, runtime anomaly detection — all of these are solved problems for traditional infrastructure. We just have not applied them to AI infrastructure because we are still treating AI components as experimental rather than production-critical.

···

What You Should Actually Do

I am not going to tell you to stop using open-source LLM gateways. That would be hypocritical — we use them at Fordel, and they solve real problems. But I am going to tell you to treat them like what they are: the most privileged component in your AI stack.

Isolate the gateway

Your LLM gateway should run in its own network segment with explicit egress rules. It should only be able to talk to your LLM providers and your application. Every other outbound connection should be blocked and alerted on. This is basic network segmentation that we do for databases but somehow forget for AI infrastructure.

Pin and verify dependencies

Do not run latest. Pin your gateway version. Pin its dependencies. Use hash verification if your package manager supports it. Generate an SBOM for the gateway container and diff it on every update. Yes, this means you fall behind on features. That is an acceptable tradeoff when the alternative is silently ingesting malicious code.

Monitor the gateway, not just the traffic

You already monitor AI traffic for latency and cost. Add monitoring for the gateway process itself. Unexpected network connections, file system access, CPU spikes that do not correlate with traffic — these are your signals. Runtime security tools like Falco exist for exactly this purpose.

Have an incident response plan

If your gateway is compromised, what do you do? At minimum: rotate every API key the gateway had access to, audit logs for data exfiltration, notify downstream consumers that prompts and responses may have been exposed. If you do not have this written down and practiced, you are not ready to run a gateway in production.

Evaluate whether managed is actually cheaper

When you factor in the engineering time for dependency auditing, security monitoring, incident response planning, and the risk cost of a breach — managed gateway services start looking less expensive than they appear on the pricing page. Run the actual numbers for your situation.

···

The Bigger Pattern

Step back from LiteLLM specifically and look at the trend. In the past week alone: malware in an LLM gateway, infostealers disguised as AI coding tools, ongoing concerns about prompt injection in MCP servers, and a steady drumbeat of AI supply chain vulnerabilities.

The AI developer tooling ecosystem is now a high-value target. We collectively poured billions into building this stack, and the security maturity is roughly where web application security was in 2005. We know the attack vectors. We know the mitigations. We are just not applying them because we are moving too fast and security slows things down.

I get it. I run a company that builds AI systems. Speed matters. Shipping matters. But I also remember what happened to web security when we collectively decided to bolt it on later. It took a decade of breaches, regulations, and lawsuits to get to where we are now. We do not need to repeat that cycle with AI infrastructure.

AI infrastructure security is where web application security was in 2005. We know better this time. The question is whether we'll act like it.

The LiteLLM incident is a warning shot. A relatively well-handled one, from what I can tell. The next one might not be. And if your AI infrastructure is not prepared for it, the blast radius is not a compromised build server or a leaked API key — it is every prompt, every response, and every piece of context your AI system has ever processed.

Treat your AI gateway like what it is: the most dangerous single point of compromise in your entire stack. Because the attackers already do.

Loading comments...