Skip to content
All posts
KI7 min read

The risks of AI code generation — and how an agency keeps them in check

AI generates code in seconds — but 45% of samples contain security flaws, nearly 20% reference invented packages, and code churn is rising. The danger isn't AI, it's naive AI use. We lay out the new risk classes with numbers — and the discipline an agency uses to turn AI into a safe accelerator.

Marius Gill

Marius Gill

Managing Director & Software Engineer, 10+ years

Share

7 min read

AI writes code in seconds today — and that is both the temptation and the risk. The Veracode GenAI Code Security Report 2025 tested over 100 models across more than 80 coding tasks: 45% of the generated code samples contained at least one OWASP Top 10 vulnerability. Newer or larger models did not do better — the problem is structural, not a question of the next release.

That is not an argument against AI. It is an argument against uncontrolled AI. Use AI naively in software development and you import new risk classes straight into production. Use it with discipline and you gain speed without giving up control. That difference — discipline — is the actual work of an agency.

The new risk classes — with numbers

AI doesn't only shift the pace, it shifts the risk profile. Three effects are evidenced and relevant to anyone shipping AI-generated code to production: insecure code, hallucinated dependencies and rising code churn. A fourth, contractual class joins them: data and IP leakage.

The three measurable risk classes of AI code generation. Sources: Veracode 2025, slopsquatting study 2025, GitClear 2025.

Insecure code is the most direct effect: a 45% vulnerability rate doesn't mean every other line is broken, but that on nearly half of the solved tasks an exploitable gap remained. Hallucinated packages are subtler: models invent library names that sound plausible but don't exist. Per a study summarised by BleepingComputer, around 19.7% of samples referenced such a package — and because the names recur, an attacker can register them and slip in malicious code ("slopsquatting"). Code churn, finally, is the quietest signal: per GitClear, more code is reverted shortly after being written and copied more often than cleanly refactored — an early indicator of maintenance load.

Risk classWhat happensFindingCountermeasure
Insecure codeOWASP flaws in the output45% of samples (Veracode 2025)review + SAST/security scan
Hallucinated packagesinvented dependencies19.7% of samples (slopsquatting)dependency allowlist + pinning
Code churn / tech debtcopied not refactored3.1% → 5.7% (GitClear)tests, CI gates, architecture
Data / IP leakagecode to external modelsGDPR & EU AI Actclear data boundaries

Why naive AI use bites in production

The most dangerous moment isn't the prototype, it's the day unreviewed AI code goes live. AI is excellent at producing something that is almost right — and "almost right" is the most expensive category in production. In the Stack Overflow Developer Survey 2025, 66% of developers cite exactly that as their biggest frustration; for around 45%, debugging AI code takes longer than writing it themselves. The speed at the start hides the cost at the end.

This dynamic is measurable at team level too. The DORA report 2025 confirms, on one hand, that AI amplifies productivity — more tasks completed, more pull requests merged. On the other, it shows a negative relationship between AI adoption and delivery stability as long as there is no strong foundation of automated tests, version control and fast feedback. AI amplifies what's already there: where discipline is missing, it amplifies instability; where discipline exists, it amplifies quality. We described that same logic in more depth in Risks in AI software projects and governance.

The safeguard: the stack an agency puts in between

Every single risk class has an established countermeasure — the craft is running them as a mandatory layer, not a good intention. An agency that takes AI seriously treats AI output like a new team member's code: useful, but never merged unreviewed. Above it sits a governance layer enforcing one simple rule — AI is a tool, the human decides and is accountable.

The safeguard stack: a governance layer over six concrete measures makes AI code production-safe.

Concretely, that is six measures working together:

  • Peer code review: no AI-generated line enters the main branch without human sign-off. That catches both the 45% flaws and subtle logic errors.
  • SAST & security scan in CI: automated OWASP checks on every merge, so security doesn't depend on the day.
  • Dependency allowlist & pinning: only approved, version-pinned packages — the direct answer to slopsquatting. Snyk recommends verified sources and lockfiles as the standard here.
  • Tests & CI gates: automated tests are the foundation DORA shows decides stability — they turn speed into reliable releases.
  • Data & secret boundaries: a defined tool list, no proprietary code to external models, no secrets in prompts.
  • License & IP checks: the provenance and licensing of the output are clarified before it ships.

How these practices come together in a structured review is shown in our software audit & code review.

Data, IP and the EU AI Act

The moment proprietary code leaves for an external model, a technical question becomes a legal one. Two things need clarifying: what happens to the data you send the model — and who owns the output. In practice that means drawing a clear line: which repositories AI tools may see and which they may not; whether the provider trains on your data; and whether secrets or customer data come anywhere near a prompt at all.

Then there is the regulatory frame. The EU AI Act has been in force since 1 August 2024 and applies in staggered phases; depending on use, documentation and transparency obligations arise. For code generation itself this rarely means dramatic hurdles, but it does set a clear expectation: traceable processes, documented tool use, clean data boundaries. An agency that uses AI professionally has drawn those boundaries — contractually and technically — anyway.

How to recognise a disciplined AI agency

The difference between a professional and a "vibe coder" shows not in speed, but in how they answer the question about safeguards. Ask specifically — the answers are a reliable filter:

  • Does every AI-generated line go through review, or just "the important ones"?
  • Do security scans run automatically in CI, or occasionally by hand?
  • Is there a dependency allowlist and lockfiles against hallucinated packages?
  • Are tests a mandatory gate that blocks merges?
  • Are there documented data boundaries — which code may see which tool?

Anyone who answers "of course, that's how we work" with concrete examples uses AI for what it is: an accelerator within solid engineering. Anyone who deflects or talks only about speed is shifting the risk onto your product. For more on how AI embeds cleanly into day-to-day development, see AI coding with Codex and Claude, as well as our sibling posts How an agency ships faster with AI and Will AI replace your software agency?.

Next steps

Three questions quickly show whether your AI code is safeguarded:

  1. Review: does every AI-generated line pass a human code review before it goes live?
  2. Supply chain: does a dependency allowlist with pinning protect against hallucinated or slipped-in packages?
  3. Data boundaries: is it clearly regulated which code each AI tool may see — and what happens to your data?

If any of these stays unanswered, an outside look pays off. We use AI productively in projects — with exactly these safeguards. Take a look at our AI integration and development, or book an intro call directly.

Frequently asked questions

Conclusion

AI code generation introduces new risk classes: insecure code, hallucinated dependencies, more tech debt and data leakage. None of these is an argument against AI — they are arguments against uncontrolled AI. The discipline a good agency already lives (review, security scans, dependency control, tests, clear data boundaries) is what turns AI from a liability into a safe accelerator. That discipline is exactly what you pay an agency for.

Marius Gill

Written by

Marius Gill

Managing Director & Software Engineer, 10+ years

Next steps

Let's talk about your project

Book a 30-minute discovery call. We'll review your goals, surface unknowns, and outline how we would run the engagement.

Schedule a call

Booking calendar (Cal.com)

This area embeds the external service Cal.com. By loading it you agree that a connection to Cal.com is established and data may be transferred to the USA.

Privacy policy