Is AI-generated code insecure?

More often than many assume. In the Veracode GenAI Code Security Report 2025, 45% of AI-generated code samples contained at least one OWASP Top 10 vulnerability, with even higher failure rates on tasks like XSS or log-injection defence. The point is not to avoid AI but to review, scan and test every AI contribution like any other code.

What is slopsquatting?

Slopsquatting is a supply-chain attack that exploits AI hallucinations. A 2025 study found that roughly 19.7% of the code samples examined referenced a non-existent package — and about 43% of these invented names reappeared on every re-run. If an attacker registers such a name in advance and publishes a malicious package under it, an inattentive team installs it unchecked.

Should an agency use AI for code generation at all?

Yes — but with discipline. AI measurably accelerates experienced teams and is part of the craft in 2026. What matters is that the agency knows the risks and absorbs them: peer code review, automated security scans, dependency control, tests and clear data boundaries. An agency that uses no AI leaves speed on the table; one that uses it naively is dangerous.

Does AI code increase technical debt?

There are signals for it. GitClear's 2025 analysis shows that code churn — code reverted within two weeks — rose from 3.1% (2020) to 5.7% (2024), alongside markedly more copied rather than refactored code. Without review and test discipline, that effort migrates into the future as maintenance load.

Can I feed proprietary code into an AI tool?

Only with clear data boundaries. Sending source code or customer data to an external model means accounting for contract terms, data protection (GDPR) and the obligations of the EU AI Act. In practice that means a defined tool list, no training on your data and sensitive repositories kept out — points an agency secures contractually and technically.

How do I recognise a disciplined AI agency?

It can explain how AI is embedded in its workflow: a review requirement for every AI line, SAST/security scans in CI, a dependency allowlist against hallucinated packages, tests as a gate and documented data boundaries. Anyone who instead sells speed and "vibe coding" without talking about safeguards is shifting the risk onto you.

Contact

All posts

KIJune 29, 20267 min read

The risks of AI code generation — and how an agency keeps them in check

AI generates code in seconds — but 45% of samples contain security flaws, nearly 20% reference invented packages, and code churn is rising. The danger isn't AI, it's naive AI use. We lay out the new risk classes with numbers — and the discipline an agency uses to turn AI into a safe accelerator.

Marius Gill

Managing Director & Software Engineer, 10+ years

Share

7 min read

AI writes code in seconds today — and that is both the temptation and the risk. The Veracode GenAI Code Security Report 2025 tested over 100 models across more than 80 coding tasks: 45% of the generated code samples contained at least one OWASP Top 10 vulnerability. Newer or larger models did not do better — the problem is structural, not a question of the next release.

That is not an argument against AI. It is an argument against uncontrolled AI. Use AI naively in software development and you import new risk classes straight into production. Use it with discipline and you gain speed without giving up control. That difference — discipline — is the actual work of an agency.

The new risk classes — with numbers

AI doesn't only shift the pace, it shifts the risk profile. Three effects are evidenced and relevant to anyone shipping AI-generated code to production: insecure code, hallucinated dependencies and rising code churn. A fourth, contractual class joins them: data and IP leakage.

Three risk metrics: 45 percent of AI code samples with a flaw, 19.7 percent with invented packages, code churn from 3.1 to 5.7 percent. — The three measurable risk classes of AI code generation. Sources: Veracode 2025, slopsquatting study 2025, GitClear 2025.

Insecure code is the most direct effect: a 45% vulnerability rate doesn't mean every other line is broken, but that on nearly half of the solved tasks an exploitable gap remained. Hallucinated packages are subtler: models invent library names that sound plausible but don't exist. Per a study summarised by BleepingComputer, around 19.7% of samples referenced such a package — and because the names recur, an attacker can register them and slip in malicious code ("slopsquatting"). Code churn, finally, is the quietest signal: per GitClear, more code is reverted shortly after being written and copied more often than cleanly refactored — an early indicator of maintenance load.

Risk class	What happens	Finding	Countermeasure
Insecure code	OWASP flaws in the output	45% of samples (Veracode 2025)	review + SAST/security scan
Hallucinated packages	invented dependencies	19.7% of samples (slopsquatting)	dependency allowlist + pinning
Code churn / tech debt	copied not refactored	3.1% → 5.7% (GitClear)	tests, CI gates, architecture
Data / IP leakage	code to external models	GDPR & EU AI Act	clear data boundaries

Why naive AI use bites in production

The most dangerous moment isn't the prototype, it's the day unreviewed AI code goes live. AI is excellent at producing something that is almost right — and "almost right" is the most expensive category in production. In the Stack Overflow Developer Survey 2025, 66% of developers cite exactly that as their biggest frustration; for around 45%, debugging AI code takes longer than writing it themselves. The speed at the start hides the cost at the end.

This dynamic is measurable at team level too. The DORA report 2025 confirms, on one hand, that AI amplifies productivity — more tasks completed, more pull requests merged. On the other, it shows a negative relationship between AI adoption and delivery stability as long as there is no strong foundation of automated tests, version control and fast feedback. AI amplifies what's already there: where discipline is missing, it amplifies instability; where discipline exists, it amplifies quality. We described that same logic in more depth in Risks in AI software projects and governance.

The safeguard: the stack an agency puts in between

Every single risk class has an established countermeasure — the craft is running them as a mandatory layer, not a good intention. An agency that takes AI seriously treats AI output like a new team member's code: useful, but never merged unreviewed. Above it sits a governance layer enforcing one simple rule — AI is a tool, the human decides and is accountable.

Safeguard stack with a lime governance layer and six measures: code review, SAST scan, dependency allowlist, tests, data boundaries, IP checks. — The safeguard stack: a governance layer over six concrete measures makes AI code production-safe.

Concretely, that is six measures working together:

Peer code review: no AI-generated line enters the main branch without human sign-off. That catches both the 45% flaws and subtle logic errors.
SAST & security scan in CI: automated OWASP checks on every merge, so security doesn't depend on the day.
Dependency allowlist & pinning: only approved, version-pinned packages — the direct answer to slopsquatting. Snyk recommends verified sources and lockfiles as the standard here.
Tests & CI gates: automated tests are the foundation DORA shows decides stability — they turn speed into reliable releases.
Data & secret boundaries: a defined tool list, no proprietary code to external models, no secrets in prompts.
License & IP checks: the provenance and licensing of the output are clarified before it ships.

How these practices come together in a structured review is shown in our software audit & code review.

Data, IP and the EU AI Act

The moment proprietary code leaves for an external model, a technical question becomes a legal one. Two things need clarifying: what happens to the data you send the model — and who owns the output. In practice that means drawing a clear line: which repositories AI tools may see and which they may not; whether the provider trains on your data; and whether secrets or customer data come anywhere near a prompt at all.

Then there is the regulatory frame. The EU AI Act has been in force since 1 August 2024 and applies in staggered phases; depending on use, documentation and transparency obligations arise. For code generation itself this rarely means dramatic hurdles, but it does set a clear expectation: traceable processes, documented tool use, clean data boundaries. An agency that uses AI professionally has drawn those boundaries — contractually and technically — anyway.

How to recognise a disciplined AI agency

The difference between a professional and a "vibe coder" shows not in speed, but in how they answer the question about safeguards. Ask specifically — the answers are a reliable filter:

Does every AI-generated line go through review, or just "the important ones"?
Do security scans run automatically in CI, or occasionally by hand?
Is there a dependency allowlist and lockfiles against hallucinated packages?
Are tests a mandatory gate that blocks merges?
Are there documented data boundaries — which code may see which tool?

Anyone who answers "of course, that's how we work" with concrete examples uses AI for what it is: an accelerator within solid engineering. Anyone who deflects or talks only about speed is shifting the risk onto your product. For more on how AI embeds cleanly into day-to-day development, see AI coding with Codex and Claude, as well as our sibling posts How an agency ships faster with AI and Will AI replace your software agency?.

Next steps

Three questions quickly show whether your AI code is safeguarded:

Review: does every AI-generated line pass a human code review before it goes live?
Supply chain: does a dependency allowlist with pinning protect against hallucinated or slipped-in packages?
Data boundaries: is it clearly regulated which code each AI tool may see — and what happens to your data?

If any of these stays unanswered, an outside look pays off. We use AI productively in projects — with exactly these safeguards. Take a look at our AI integration and development, or book an intro call directly.

Frequently asked questions

Conclusion

AI code generation introduces new risk classes: insecure code, hallucinated dependencies, more tech debt and data leakage. None of these is an argument against AI — they are arguments against uncontrolled AI. The discipline a good agency already lives (review, security scans, dependency control, tests, clear data boundaries) is what turns AI from a liability into a safe accelerator. That discipline is exactly what you pay an agency for.

Written by

Marius Gill

Managing Director & Software Engineer, 10+ years

Share

All posts

Keep reading

Let's talk about your project

Book a 30-minute discovery call. We'll review your goals, surface unknowns, and outline how we would run the engagement.

Schedule a call

Booking calendar (Cal.com)

This area embeds the external service Cal.com. By loading it you agree that a connection to Cal.com is established and data may be transferred to the USA.

Privacy policy