In July 2025, Veracode published its GenAI Code Security Report after testing more than 100 large language models across 80 curated coding tasks. The headline finding: 45% of AI-generated code samples failed security tests and introduced OWASP Top 10 vulnerabilities. When given a choice between a secure and insecure method, the models chose the insecure option nearly half the time.
Four months later, Endor Labs released its fourth annual State of Dependency Management report. It found that only 1 in 5 dependency versions recommended by AI coding assistants were both safe and free from hallucination. The other 80% introduced risk — known vulnerabilities, non-existent packages, or third-party modules with unclear provenance.
These are not edge cases from contrived benchmarks. These are the tools that 95% of software engineers now use at least weekly, generating code that ships to production. The supply chain attack surface has not just expanded — it has been automated.
The Anatomy of AI-Assisted Supply Chain Compromise
Traditional supply chain attacks required a threat actor to compromise an existing package, typosquat a popular library name, or infiltrate a maintainer account. These attacks were manual, targeted, and relatively slow. AI code generation has introduced three new attack vectors that are faster, broader, and harder to detect.
Slopsquatting: When the Model Hallucinates Your Next Dependency
Slopsquatting is a supply chain attack that exploits a quirk of large language models: they hallucinate package names. An LLM asked to generate code that parses CSV files might recommend importing a package called csv-parser-utils — a package that does not exist in any registry. The name sounds plausible. It follows naming conventions. But it was invented by the model.
A research team studying 756,000 AI-generated code samples found that nearly 20% recommended non-existent packages. When the same prompt was repeated, 43% of the hallucinated packages appeared consistently across 10 queries. The hallucinations are not random — they are reproducible.
This reproducibility is what makes slopsquatting viable. An attacker monitors which package names LLMs consistently hallucinate, registers those names on npm or PyPI, and publishes packages containing malicious code. The next developer who accepts the AI suggestion runs npm install, and the malicious package enters their dependency tree.
Insecure Code Patterns at Scale
Beyond dependency hallucination, AI models generate insecure code patterns with alarming consistency. The Veracode study found that Java was the riskiest language for AI code generation, with a security failure rate exceeding 70%. Python, C#, and JavaScript followed with failure rates between 38% and 45%.
The specific vulnerability that AI models handle worst is Cross-Site Scripting (CWE-80). AI tools failed to defend against XSS in 86% of relevant code samples. This is not a subtle, hard-to-detect vulnerability. XSS is one of the oldest and most well-documented web security issues, and AI models still generate vulnerable code for it the vast majority of the time.
Georgetown University’s Center for Security and Emerging Technology (CSET) independently confirmed these findings. Their evaluation of five LLMs found that almost half of all generated code snippets contained bugs that were “often impactful and could potentially lead to malicious exploitation.” Earlier research on GitHub Copilot specifically found that approximately 40% of its 1,689 generated programs were vulnerable to MITRE’s CWE Top 25 Most Dangerous Software Weaknesses.
“When given a choice between a secure and insecure method to write code, generative AI models chose the insecure option 45% of the time. This rate has remained largely unchanged even as models have dramatically improved in generating syntactically correct code.”
Dependency Version Roulette
Even when AI models recommend real packages, they frequently recommend the wrong version. Endor Labs found that between 44% and 49% of AI-imported dependency versions had known vulnerabilities. The model does not check the CVE database before suggesting a version. It recommends whatever version appeared most frequently in its training data — which, given the age distribution of open source code, is often an outdated version with known security issues.
This creates a perverse dynamic: the more popular a package was at a particular version, the more likely the model is to recommend that version, regardless of whether it has since been patched. Developers who trust the AI’s version recommendation without checking are importing yesterday’s vulnerabilities into today’s code.
Why This Is Getting Worse, Not Better
The intuitive assumption is that newer, larger models should generate more secure code. The data shows otherwise. Veracode’s key finding is that security performance has remained largely unchanged over time, even as models have dramatically improved at generating syntactically correct and functionally complete code. The models are getting better at writing code that works. They are not getting better at writing code that is safe.
| Factor | Why It Amplifies Risk |
|---|---|
| Scale of adoption | 95% of engineers use AI tools weekly; 56% report doing 70%+ of their work with AI. Every insecure pattern propagates faster. |
| Speed of generation | A developer using AI generates code 3–10x faster. Security review processes designed for human-speed development cannot keep up. |
| Trust calibration | Developers treat AI suggestions like senior developer recommendations. But unlike a senior developer, the model has no concept of security posture. |
| Training data lag | Models are trained on historical code that includes years of unpatched vulnerabilities, deprecated APIs, and pre-disclosure CVEs. |
| Feedback loops | AI-generated code enters public repositories, becomes training data for the next model generation, and reinforces insecure patterns. |
| MCP server proliferation | Endor Labs found that 10,663 MCP server repositories often use AI-suggested dependencies, centralizing supply chain risk at integration points. |
The feedback loop is especially dangerous. When AI-generated code — including its insecure patterns — gets committed to public repositories on GitHub, it becomes training data for the next generation of models. Georgetown’s CSET report identified this as a systemic risk: models training on their own insecure outputs creates a degenerative cycle where insecure code patterns become statistically dominant in the training distribution.
The OWASP Perspective
The OWASP Top 10 for LLM Applications (2025 edition) addresses AI-generated code risks under two categories: Supply Chain Vulnerabilities and Improper Output Handling.
Supply Chain Vulnerabilities cover the full dependency lifecycle — from AI models recommending malicious or hallucinated packages to compromised model weights and training data poisoning. Improper Output Handling covers the case where LLM-generated code runs unsanitized in downstream systems, which is precisely what happens when a developer accepts an AI code suggestion and ships it without security review.
- Supply Chain Vulnerabilities: hallucinated packages, outdated dependencies, compromised third-party models, unvetted MCP server integrations
- Improper Output Handling: AI-generated code executed without sanitisation, leading to injection, XSS, or privilege escalation
- Data and Model Poisoning: adversarial training data that causes models to consistently recommend specific malicious packages or insecure patterns
- Excessive Agency: AI coding agents with write access to filesystems, package managers, and CI/CD pipelines operating without adequate guardrails
The 2025 edition added Vector and Embedding Weaknesses as a new category, acknowledging that RAG-based coding assistants that retrieve code snippets from vector stores inherit the security posture of whatever code was indexed. If your vector store contains insecure code patterns, your AI assistant will recommend them with high confidence.
What the Malware Numbers Actually Show
The open source supply chain was under attack before AI code generation existed. But the scale has changed dramatically.
Snyk identified over 3,000 malicious npm packages in 2024, with more than 3,600 malicious packages total across npm and PyPI. By Q4 2025, Sonatype was blocking 120,612 malware attacks in a single quarter. Socket’s mid-year 2025 threat report documented a steady rise in destructive malware using delayed execution and remotely controlled kill switches to evade early detection.
The intersection of this existing malware landscape with AI-generated code hallucinations creates a multiplier effect. Attackers no longer need to guess which package names developers might mistype. They can query the same AI models developers use, identify which non-existent packages the models consistently recommend, and register those names. The AI model becomes an unwitting accomplice in the supply chain attack.
The SBOM Question
Software Bills of Materials have been a federal requirement for software sold to the US government since Executive Order 14028 in 2021. CISA updated its SBOM guidance in 2025, requiring machine-readable formats like SPDX or CycloneDX. In January 2026, the OMB shifted from prescriptive mandates to a risk-based approach, but agencies can still require SBOMs as part of their risk assessment.
For AI-generated code, SBOMs face a fundamental challenge: they document what is in the software, but they cannot document what should not be there. An SBOM will faithfully record that your application depends on csv-parser-utils version 1.0.0. It will not tell you that csv-parser-utils was hallucinated by an AI model, registered by a threat actor two weeks ago, and contains a reverse shell.
This is not a failure of SBOMs. It is a limitation of post-hoc documentation when the code generation process itself is compromised. SBOMs remain essential for transparency and incident response. But they are a detection mechanism, not a prevention mechanism. The prevention has to happen earlier in the pipeline — before the dependency is installed, before the code is committed, before the AI suggestion is accepted.
What Engineering Teams Need to Do
The response to AI-generated code security risks is not to stop using AI tools. That ship has sailed — 95% adoption means these tools are infrastructure, not optional. The response is to treat AI-generated code with the same rigour you would apply to code from an untrusted contributor.
Securing Your AI-Assisted Development Pipeline
Never run npm install or pip install on an AI-suggested package without first confirming it exists in the public registry, checking its publish date, download count, and maintainer history. Tools like Socket, Snyk, and Endor Labs provide automated checks for package provenance and known malicious indicators. If a package was published in the last 30 days with minimal downloads, treat it as suspicious regardless of how confidently the AI recommended it.
AI models default to whatever version was most common in training data, which is rarely the most current or secure version. Use lockfiles (package-lock.json, poetry.lock) and audit tools (npm audit, pip-audit, Snyk) to catch known vulnerabilities in AI-suggested versions before they enter your dependency tree. Consider running automated version checks as a pre-commit hook.
Integrate SAST tools (Semgrep, CodeQL, Veracode) into your CI pipeline and run them on every commit, not just periodic scans. Given that 45% of AI-generated code introduces OWASP Top 10 vulnerabilities, static analysis is no longer optional — it is the minimum viable security practice for AI-assisted development. Pay particular attention to XSS, injection, and authentication bypass patterns.
AI coding agents with MCP integrations, filesystem access, and terminal execution capabilities should run in sandboxed environments with minimal permissions. An AI agent that can write files, install packages, and execute code has the same attack surface as an untrusted script. Apply the principle of least privilege: read access to the codebase, write access only to designated directories, no direct access to production credentials or deployment pipelines.
Use CycloneDX or SPDX to generate SBOMs for every release. Where possible, annotate which dependencies were AI-suggested versus developer-chosen. This provenance information accelerates incident response when a supply chain compromise is discovered — you can immediately identify which AI-suggested dependencies need review rather than auditing the entire dependency tree.
Treat AI-generated code as untrusted contributor code in your review process. This means no auto-merging AI-generated PRs, mandatory security-focused review for any code that handles authentication, authorization, data access, or external API calls, and explicit sign-off that dependencies have been validated. The goal is not to slow development down — it is to ensure the 45% of insecure suggestions get caught before they ship.
The Tooling Landscape for AI Code Security
| Tool / Platform | What It Does | When to Use It |
|---|---|---|
| Socket | Deep package inspection, detecting supply chain attacks via behavioural analysis rather than just CVE matching | Pre-install validation of every new dependency, especially AI-suggested ones |
| Snyk | Vulnerability scanning across dependencies, containers, and IaC with AI-specific package risk scoring | Continuous monitoring of your dependency tree and CI/CD integration |
| Endor Labs | Dependency risk scoring that accounts for AI hallucination, version safety, and provenance | Evaluating AI-suggested dependencies before committing them to your lockfile |
| Semgrep / CodeQL | Static analysis for security anti-patterns in AI-generated code | Every commit in CI, with rules tuned for the vulnerabilities AI models most commonly introduce |
| Veracode | Comprehensive application security testing including SAST, DAST, and SCA | Enterprise security programmes requiring compliance-grade scanning of AI-generated code |
| Sonatype Nexus | Repository firewall that blocks known-malicious packages before they enter your build | Protecting your package manager from installing slopsquatted or typosquatted dependencies |
No single tool covers the full attack surface. The production pattern is layered: a repository firewall (Sonatype) blocks known-bad packages at the registry level, a dependency scanner (Snyk, Endor Labs, Socket) validates packages at install time, a SAST tool (Semgrep, CodeQL) catches insecure patterns in generated code, and an SBOM generator documents everything for audit and incident response.
What Happens Next
The AI code generation security problem has three possible trajectories:
- Models improve their security awareness: Possible but not happening yet. Veracode’s data shows no meaningful improvement in security outcomes across model generations. The models are optimised for functionality, not safety. Until security metrics are weighted equally with correctness in model training, this trajectory is unlikely.
- Tooling catches up and provides guardrails: This is the most likely near-term outcome. Socket, Snyk, Endor Labs, and others are building AI-specific security capabilities. The challenge is adoption — most teams have not yet updated their security tooling to account for AI-generated code patterns.
- A major incident forces industry-wide change: The most probable catalyst for systemic improvement. When an AI-hallucinated dependency leads to a significant breach at a well-known company, the resulting regulatory and reputational pressure will accelerate adoption of the practices described in this article. The question is not whether this will happen, but when.
The US Department of Defense published an AI/ML Supply Chain Risks and Mitigations advisory in March 2026, explicitly acknowledging that AI-generated code and AI-suggested dependencies create novel supply chain risks that existing frameworks do not adequately address. ENISA, the EU’s cybersecurity agency, published a package manager advisory in March 2026 addressing similar concerns for European organisations.
“The burden of ensuring that AI-generated code outputs are secure should not rest solely on individual users, but also on AI developers, organizations producing code at scale, and those who can improve security at large, such as policymaking bodies or industry leaders.”
Where Fordel Builds
We build production software for clients in finance, healthcare, insurance, and SaaS — industries where a supply chain compromise is not an inconvenience but a regulatory incident. Every project we deliver includes dependency auditing, SAST integration, SBOM generation, and security-focused code review as standard practice, not premium add-ons.
If you are using AI coding tools in production and have not updated your security pipeline to account for AI-specific risks — hallucinated dependencies, insecure code patterns, version roulette — you have a gap. We can audit your current pipeline, identify where AI-generated code is introducing risk, and implement the tooling and processes to close it. That conversation costs nothing. The alternative costs more.