Skip to main content
Research
Engineering & AI14 min read

AI-Generated Code Is Poisoning Your Software Supply Chain

Veracode found security flaws in 45% of AI-generated code. Endor Labs reports 80% of AI-suggested dependencies contain risks. A new attack vector called slopsquatting exploits hallucinated package names to inject malware into your build pipeline. This article breaks down the real supply chain risks of AI-assisted development, what the data actually shows, and what engineering teams need to do before the next npm install runs.

AuthorAbhishek Sharma· Fordel Studios

In July 2025, Veracode published its GenAI Code Security Report after testing more than 100 large language models across 80 curated coding tasks. The headline finding: 45% of AI-generated code samples failed security tests and introduced OWASP Top 10 vulnerabilities. When given a choice between a secure and insecure method, the models chose the insecure option nearly half the time.

Four months later, Endor Labs released its fourth annual State of Dependency Management report. It found that only 1 in 5 dependency versions recommended by AI coding assistants were both safe and free from hallucination. The other 80% introduced risk — known vulnerabilities, non-existent packages, or third-party modules with unclear provenance.

These are not edge cases from contrived benchmarks. These are the tools that 95% of software engineers now use at least weekly, generating code that ships to production. The supply chain attack surface has not just expanded — it has been automated.

45%of AI-generated code fails security testsSource: Veracode GenAI Code Security Report, 100+ LLMs tested across 80 coding tasks
80%of AI-suggested dependencies contain risksSource: Endor Labs State of Dependency Management 2025, covering 10,663 GitHub repos
742%average annual increase in OSS supply chain attacks over 3 yearsSource: Sonatype, with 120,612 malware attacks blocked in a single quarter of 2025
···

The Anatomy of AI-Assisted Supply Chain Compromise

Traditional supply chain attacks required a threat actor to compromise an existing package, typosquat a popular library name, or infiltrate a maintainer account. These attacks were manual, targeted, and relatively slow. AI code generation has introduced three new attack vectors that are faster, broader, and harder to detect.

Slopsquatting: When the Model Hallucinates Your Next Dependency

Slopsquatting is a supply chain attack that exploits a quirk of large language models: they hallucinate package names. An LLM asked to generate code that parses CSV files might recommend importing a package called csv-parser-utils — a package that does not exist in any registry. The name sounds plausible. It follows naming conventions. But it was invented by the model.

A research team studying 756,000 AI-generated code samples found that nearly 20% recommended non-existent packages. When the same prompt was repeated, 43% of the hallucinated packages appeared consistently across 10 queries. The hallucinations are not random — they are reproducible.

This reproducibility is what makes slopsquatting viable. An attacker monitors which package names LLMs consistently hallucinate, registers those names on npm or PyPI, and publishes packages containing malicious code. The next developer who accepts the AI suggestion runs npm install, and the malicious package enters their dependency tree.

Insecure Code Patterns at Scale

Beyond dependency hallucination, AI models generate insecure code patterns with alarming consistency. The Veracode study found that Java was the riskiest language for AI code generation, with a security failure rate exceeding 70%. Python, C#, and JavaScript followed with failure rates between 38% and 45%.

The specific vulnerability that AI models handle worst is Cross-Site Scripting (CWE-80). AI tools failed to defend against XSS in 86% of relevant code samples. This is not a subtle, hard-to-detect vulnerability. XSS is one of the oldest and most well-documented web security issues, and AI models still generate vulnerable code for it the vast majority of the time.

Georgetown University’s Center for Security and Emerging Technology (CSET) independently confirmed these findings. Their evaluation of five LLMs found that almost half of all generated code snippets contained bugs that were “often impactful and could potentially lead to malicious exploitation.” Earlier research on GitHub Copilot specifically found that approximately 40% of its 1,689 generated programs were vulnerable to MITRE’s CWE Top 25 Most Dangerous Software Weaknesses.

When given a choice between a secure and insecure method to write code, generative AI models chose the insecure option 45% of the time. This rate has remained largely unchanged even as models have dramatically improved in generating syntactically correct code.
Veracode GenAI Code Security Report

Dependency Version Roulette

Even when AI models recommend real packages, they frequently recommend the wrong version. Endor Labs found that between 44% and 49% of AI-imported dependency versions had known vulnerabilities. The model does not check the CVE database before suggesting a version. It recommends whatever version appeared most frequently in its training data — which, given the age distribution of open source code, is often an outdated version with known security issues.

This creates a perverse dynamic: the more popular a package was at a particular version, the more likely the model is to recommend that version, regardless of whether it has since been patched. Developers who trust the AI’s version recommendation without checking are importing yesterday’s vulnerabilities into today’s code.

Why This Is Getting Worse, Not Better

The intuitive assumption is that newer, larger models should generate more secure code. The data shows otherwise. Veracode’s key finding is that security performance has remained largely unchanged over time, even as models have dramatically improved at generating syntactically correct and functionally complete code. The models are getting better at writing code that works. They are not getting better at writing code that is safe.

FactorWhy It Amplifies Risk
Scale of adoption95% of engineers use AI tools weekly; 56% report doing 70%+ of their work with AI. Every insecure pattern propagates faster.
Speed of generationA developer using AI generates code 3–10x faster. Security review processes designed for human-speed development cannot keep up.
Trust calibrationDevelopers treat AI suggestions like senior developer recommendations. But unlike a senior developer, the model has no concept of security posture.
Training data lagModels are trained on historical code that includes years of unpatched vulnerabilities, deprecated APIs, and pre-disclosure CVEs.
Feedback loopsAI-generated code enters public repositories, becomes training data for the next model generation, and reinforces insecure patterns.
MCP server proliferationEndor Labs found that 10,663 MCP server repositories often use AI-suggested dependencies, centralizing supply chain risk at integration points.

The feedback loop is especially dangerous. When AI-generated code — including its insecure patterns — gets committed to public repositories on GitHub, it becomes training data for the next generation of models. Georgetown’s CSET report identified this as a systemic risk: models training on their own insecure outputs creates a degenerative cycle where insecure code patterns become statistically dominant in the training distribution.

···

The OWASP Perspective

The OWASP Top 10 for LLM Applications (2025 edition) addresses AI-generated code risks under two categories: Supply Chain Vulnerabilities and Improper Output Handling.

Supply Chain Vulnerabilities cover the full dependency lifecycle — from AI models recommending malicious or hallucinated packages to compromised model weights and training data poisoning. Improper Output Handling covers the case where LLM-generated code runs unsanitized in downstream systems, which is precisely what happens when a developer accepts an AI code suggestion and ships it without security review.

OWASP LLM Top 10 Categories Relevant to AI Code Generation
  • Supply Chain Vulnerabilities: hallucinated packages, outdated dependencies, compromised third-party models, unvetted MCP server integrations
  • Improper Output Handling: AI-generated code executed without sanitisation, leading to injection, XSS, or privilege escalation
  • Data and Model Poisoning: adversarial training data that causes models to consistently recommend specific malicious packages or insecure patterns
  • Excessive Agency: AI coding agents with write access to filesystems, package managers, and CI/CD pipelines operating without adequate guardrails

The 2025 edition added Vector and Embedding Weaknesses as a new category, acknowledging that RAG-based coding assistants that retrieve code snippets from vector stores inherit the security posture of whatever code was indexed. If your vector store contains insecure code patterns, your AI assistant will recommend them with high confidence.

···

What the Malware Numbers Actually Show

The open source supply chain was under attack before AI code generation existed. But the scale has changed dramatically.

Snyk identified over 3,000 malicious npm packages in 2024, with more than 3,600 malicious packages total across npm and PyPI. By Q4 2025, Sonatype was blocking 120,612 malware attacks in a single quarter. Socket’s mid-year 2025 threat report documented a steady rise in destructive malware using delayed execution and remotely controlled kill switches to evade early detection.

The intersection of this existing malware landscape with AI-generated code hallucinations creates a multiplier effect. Attackers no longer need to guess which package names developers might mistype. They can query the same AI models developers use, identify which non-existent packages the models consistently recommend, and register those names. The AI model becomes an unwitting accomplice in the supply chain attack.

3,600+malicious packages identified across npm and PyPI in 2024Source: Snyk, with npm being the most targeted ecosystem
120K+malware attacks blocked by Sonatype in Q4 2025 aloneRepresenting a 742% average annual increase in OSS supply chain attacks
34%of AI-suggested dependencies do not exist in any public registrySource: Endor Labs, testing AI coding assistants across PyPI, npm, Maven, and NuGet

The SBOM Question

Software Bills of Materials have been a federal requirement for software sold to the US government since Executive Order 14028 in 2021. CISA updated its SBOM guidance in 2025, requiring machine-readable formats like SPDX or CycloneDX. In January 2026, the OMB shifted from prescriptive mandates to a risk-based approach, but agencies can still require SBOMs as part of their risk assessment.

For AI-generated code, SBOMs face a fundamental challenge: they document what is in the software, but they cannot document what should not be there. An SBOM will faithfully record that your application depends on csv-parser-utils version 1.0.0. It will not tell you that csv-parser-utils was hallucinated by an AI model, registered by a threat actor two weeks ago, and contains a reverse shell.

This is not a failure of SBOMs. It is a limitation of post-hoc documentation when the code generation process itself is compromised. SBOMs remain essential for transparency and incident response. But they are a detection mechanism, not a prevention mechanism. The prevention has to happen earlier in the pipeline — before the dependency is installed, before the code is committed, before the AI suggestion is accepted.

···

What Engineering Teams Need to Do

The response to AI-generated code security risks is not to stop using AI tools. That ship has sailed — 95% adoption means these tools are infrastructure, not optional. The response is to treat AI-generated code with the same rigour you would apply to code from an untrusted contributor.

Securing Your AI-Assisted Development Pipeline

01
Validate every dependency before installation

Never run npm install or pip install on an AI-suggested package without first confirming it exists in the public registry, checking its publish date, download count, and maintainer history. Tools like Socket, Snyk, and Endor Labs provide automated checks for package provenance and known malicious indicators. If a package was published in the last 30 days with minimal downloads, treat it as suspicious regardless of how confidently the AI recommended it.

02
Pin dependency versions and audit AI suggestions against CVE databases

AI models default to whatever version was most common in training data, which is rarely the most current or secure version. Use lockfiles (package-lock.json, poetry.lock) and audit tools (npm audit, pip-audit, Snyk) to catch known vulnerabilities in AI-suggested versions before they enter your dependency tree. Consider running automated version checks as a pre-commit hook.

03
Run static analysis on all AI-generated code

Integrate SAST tools (Semgrep, CodeQL, Veracode) into your CI pipeline and run them on every commit, not just periodic scans. Given that 45% of AI-generated code introduces OWASP Top 10 vulnerabilities, static analysis is no longer optional — it is the minimum viable security practice for AI-assisted development. Pay particular attention to XSS, injection, and authentication bypass patterns.

04
Isolate AI coding agents from sensitive infrastructure

AI coding agents with MCP integrations, filesystem access, and terminal execution capabilities should run in sandboxed environments with minimal permissions. An AI agent that can write files, install packages, and execute code has the same attack surface as an untrusted script. Apply the principle of least privilege: read access to the codebase, write access only to designated directories, no direct access to production credentials or deployment pipelines.

05
Generate and maintain SBOMs with AI provenance tracking

Use CycloneDX or SPDX to generate SBOMs for every release. Where possible, annotate which dependencies were AI-suggested versus developer-chosen. This provenance information accelerates incident response when a supply chain compromise is discovered — you can immediately identify which AI-suggested dependencies need review rather than auditing the entire dependency tree.

06
Establish an AI code review policy

Treat AI-generated code as untrusted contributor code in your review process. This means no auto-merging AI-generated PRs, mandatory security-focused review for any code that handles authentication, authorization, data access, or external API calls, and explicit sign-off that dependencies have been validated. The goal is not to slow development down — it is to ensure the 45% of insecure suggestions get caught before they ship.

···

The Tooling Landscape for AI Code Security

Tool / PlatformWhat It DoesWhen to Use It
SocketDeep package inspection, detecting supply chain attacks via behavioural analysis rather than just CVE matchingPre-install validation of every new dependency, especially AI-suggested ones
SnykVulnerability scanning across dependencies, containers, and IaC with AI-specific package risk scoringContinuous monitoring of your dependency tree and CI/CD integration
Endor LabsDependency risk scoring that accounts for AI hallucination, version safety, and provenanceEvaluating AI-suggested dependencies before committing them to your lockfile
Semgrep / CodeQLStatic analysis for security anti-patterns in AI-generated codeEvery commit in CI, with rules tuned for the vulnerabilities AI models most commonly introduce
VeracodeComprehensive application security testing including SAST, DAST, and SCAEnterprise security programmes requiring compliance-grade scanning of AI-generated code
Sonatype NexusRepository firewall that blocks known-malicious packages before they enter your buildProtecting your package manager from installing slopsquatted or typosquatted dependencies

No single tool covers the full attack surface. The production pattern is layered: a repository firewall (Sonatype) blocks known-bad packages at the registry level, a dependency scanner (Snyk, Endor Labs, Socket) validates packages at install time, a SAST tool (Semgrep, CodeQL) catches insecure patterns in generated code, and an SBOM generator documents everything for audit and incident response.

What Happens Next

The AI code generation security problem has three possible trajectories:

Three Possible Futures
  • Models improve their security awareness: Possible but not happening yet. Veracode’s data shows no meaningful improvement in security outcomes across model generations. The models are optimised for functionality, not safety. Until security metrics are weighted equally with correctness in model training, this trajectory is unlikely.
  • Tooling catches up and provides guardrails: This is the most likely near-term outcome. Socket, Snyk, Endor Labs, and others are building AI-specific security capabilities. The challenge is adoption — most teams have not yet updated their security tooling to account for AI-generated code patterns.
  • A major incident forces industry-wide change: The most probable catalyst for systemic improvement. When an AI-hallucinated dependency leads to a significant breach at a well-known company, the resulting regulatory and reputational pressure will accelerate adoption of the practices described in this article. The question is not whether this will happen, but when.

The US Department of Defense published an AI/ML Supply Chain Risks and Mitigations advisory in March 2026, explicitly acknowledging that AI-generated code and AI-suggested dependencies create novel supply chain risks that existing frameworks do not adequately address. ENISA, the EU’s cybersecurity agency, published a package manager advisory in March 2026 addressing similar concerns for European organisations.

The burden of ensuring that AI-generated code outputs are secure should not rest solely on individual users, but also on AI developers, organizations producing code at scale, and those who can improve security at large, such as policymaking bodies or industry leaders.
Georgetown CSET
···

Where Fordel Builds

We build production software for clients in finance, healthcare, insurance, and SaaS — industries where a supply chain compromise is not an inconvenience but a regulatory incident. Every project we deliver includes dependency auditing, SAST integration, SBOM generation, and security-focused code review as standard practice, not premium add-ons.

If you are using AI coding tools in production and have not updated your security pipeline to account for AI-specific risks — hallucinated dependencies, insecure code patterns, version roulette — you have a gap. We can audit your current pipeline, identify where AI-generated code is introducing risk, and implement the tooling and processes to close it. That conversation costs nothing. The alternative costs more.

Keep Exploring

Related services, agents, and capabilities

Services
01
AI-Powered Testing & QATest infrastructure that keeps pace with Cursor-speed development.
02
AI Safety & Red TeamingFind what breaks your AI system before adversarial users do.
03
Full-Stack EngineeringAI-native product engineering — the 100x narrative meets production reality.
04
API Design & IntegrationAPIs that AI agents can call reliably — and humans can maintain.
Capabilities
05
AI Agent DevelopmentAutonomous systems that act, not just answer
06
Backend DevelopmentThe infrastructure that makes AI-powered systems reliable
07
Cloud Infrastructure & DevOpsInfrastructure that scales with AI workloads
08
Web Application DevelopmentModern web apps built for AI-era interaction patterns
Industries
09
SaaSThe SaaSocalypse narrative is real and it is not done. Cursor with Claude built Anysphere into a $2.5B company selling to developers who used to pay for multiple separate tools. Bolt, Lovable, and Replit Agent are letting non-engineers ship MVPs in hours. Zero-seat software is emerging — AI agents as the only users of your API, with no human seat count to price against. The "wrapper problem" is killing thin AI wrappers with no moat. Single-person billion-dollar companies are no longer theoretical. Vertical AI is eating horizontal SaaS in category after category. And the great SaaS repricing is underway: customers are refusing to renew at legacy prices when AI does the same job for less.
10
FinanceAI-first neobanks are emerging. Bloomberg GPT and domain-specific financial LLMs are in production. Upstart and Zest AI are disrupting FICO-based credit scoring. Deepfake voice fraud is hitting bank call centers at scale. The RegTech market is heading toward $20B+ as compliance automation replaces compliance headcount. JP Morgan's LOXM and Goldman's AI initiatives are setting expectations for what institutional-grade financial AI looks like — and the compliance infrastructure required to deploy it.
11
HealthcareAmbient AI scribes are in production at health systems across the country — Abridge raised $150M, Nuance DAX is embedded in Epic, and physicians are actually adopting these tools because they remove documentation burden rather than adding to it. The prior authorization automation wars are heating up with CMS mandating FHIR APIs. AlphaFold and Recursion Pharma are rewriting drug discovery timelines. The engineering challenge is not AI capability — it is building systems that are safe, explainable, and HIPAA-compliant at the same time.
12
InsuranceInsurTech 2.0 is collapsing — most of the startups that raised on "AI-first insurance" burned through capital and failed or are being quietly absorbed by incumbents. What is emerging from the wreckage is more interesting: parametric AI underwriting, embedded insurance via API, and agent-first claims processing that handles FNOL to payment without human intervention. The carriers that win will be those that treat AI governance as an engineering requirement under the NAIC FACTS framework, not a compliance afterthought.