Skip to content
Paula Livingstone writing · projects · tools

Writing

When the Model Becomes the Attack Surface

AI is not just another tool in the cybersecurity stack. It is becoming part of the system being defended, part of the system doing the defending, and increasingly part of the system being attacked. This piece separates cybersecurity with AI, models that detect threats, triage alerts, and accelerate response, from cybersecurity of AI, where the model itself, its data, prompts, outputs, permissions, and training pipeline become the attack surface. It walks through adversarial manipulation, poisoned training data, inference and privacy leaks, and the model as a weapon, then argues for governance without theatre: discipline across the whole chain rather than one framework or control. As models move from tool to participant, the old security boundary does not disappear, part of it moves inside the model.

Artificial intelligence is not just another tool in the cybersecurity stack. It is becoming part of the system being defended, part of the system doing the defending, and increasingly part of the system being attacked.

That distinction matters. There is cybersecurity with AI, where models are used to detect threats, summarise telemetry, triage alerts, classify behaviour, and accelerate response. Then there is cybersecurity of AI, where the model itself, its data, prompts, outputs, permissions, training pipeline, inference path, and deployment context become security concerns in their own right.

The promise is real. AI can help defenders process volumes of data that no human team could sensibly inspect by hand. It can find patterns, correlate weak signals, reduce toil, and surface anomalies that would otherwise disappear into operational noise. In complex environments, especially critical infrastructure, that capability is not trivial.

But every promise carries a condition. AI can improve cybersecurity provided the model is trustworthy, the data is clean, the outputs are validated, the deployment boundary is understood, and the humans around it know what they are looking at. The hard part is not the slogan. The hard part is the condition.

This is where the model becomes the attack surface.

The Promise Is Real

The strongest case for AI in cybersecurity is easy to make, because much of it is true.

Modern defenders are overloaded. Security teams face too much telemetry, too many alerts, too many assets, too many dependencies, and too many changeable systems. In operational technology and critical infrastructure, the problem is worse, because the estate is often old, mixed, fragile, poorly inventoried, and governed by availability constraints that do not exist in normal enterprise IT.

AI can help. It can assist with log analysis, anomaly detection, malware classification, vulnerability prioritisation, phishing detection, incident summarisation, and threat hunting. It can reduce the time between signal and interpretation, and let a small team operate with a reach that would previously have required far more people. Used properly, it shortens response cycles, increases visibility, and makes weak signals easier to see. This is not marketing nonsense; it is a genuine capability.

But the fact that AI can help security does not mean that AI is automatically secure. A model that supports cyber defence also becomes a new dependency. It has inputs, outputs, assumptions, data sources, permissions, interfaces, hidden behaviours, and failure modes. It may sit beside critical decisions, enrich alerts, drive automation, or influence human judgement. Once that happens, it is no longer just a tool. It is part of the security boundary.

The defender has not merely gained a capability. The defender has inherited a new class of risk.

With AI and Of AI

Cybersecurity with AI is the more comfortable conversation.

It lets organisations talk about efficiency, automation, smarter detection, reduced analyst burden, faster response, and improved situational awareness. It is the language of capability. It is attractive because it sounds like a force multiplier, and sometimes it is.

Cybersecurity of AI is the less comfortable conversation, and it asks different questions. What trained the model? Who labelled the data? What can the model access? What can it call? What happens when it is wrong? Can its output be trusted? Can its behaviour be reproduced? Can it leak sensitive information? Can it be manipulated by crafted inputs? Can it be rolled back? Can anyone explain why it produced a particular answer?

Those questions are less glamorous, but they are the engineering questions that matter.

A conventional system has code, configuration, identities, interfaces, logs, dependencies, and infrastructure. An AI-enabled system still has all of those, but it also has training data, model weights, embeddings, prompts, context windows, retrieval pipelines, guardrails, evaluation sets, inference endpoints, and human trust wrapped around probabilistic output.

The attack surface has not just grown. It has changed shape.

The New Attack Surface

Generative AI has expanded the attack surface because it changes how systems interact with information.

Traditional software follows explicit logic. That does not make it safe, but it gives defenders familiar objects to inspect: code paths, services, ports, permissions, libraries, configurations, and logs. AI systems add something stranger. They produce behaviour from learned statistical structure, and their outputs are shaped by data, context, prompts, embeddings, weights, and runtime constraints.

That flexibility is exactly what makes them useful. A generative model can summarise a report, explain an event, draft a response, classify an incident, translate technical language, generate code, map relationships, and assist decision-making. It can operate across messy human and machine information in a way that rigid software cannot.

The same flexibility creates new opportunities for manipulation. The prompt becomes an interface. The context window becomes an input channel. The retrieval system becomes a dependency. The plugin or tool layer becomes a possible route to action. The model's output becomes something humans may over-trust, because it is fluent, plausible, and confidently presented.

This matters because AI often fails in ways that look superficially competent. A bad output may not crash. It may not throw an error. It may not trip an obvious alarm. It may simply be wrong in a way that sounds right.

That is dangerous in normal business. It is more dangerous when the model is involved in security, safety, finance, healthcare, industrial control, or national infrastructure.

The old question was often whether a system was exposed. The AI-era question becomes what can influence this model, and what can this model influence. That is a much wider problem.

Adversarial Manipulation

Machine learning models can generalise impressively across complex inputs. A model can classify images, interpret language, detect unusual patterns, infer relationships, and make predictions from noisy data. It can spot things that would be hard to capture with hand-written rules, which is why AI is valuable in fraud detection, malware analysis, intrusion detection, medical imaging, language processing, and many other fields.

Adversarial attacks expose the fragility inside that strength.

Unlike a conventional vulnerability, an adversarial attack does not necessarily exploit a coding error or a misconfigured service. It exploits the way the model interprets data. Carefully crafted inputs can cause a model to misclassify an image, misunderstand text, ignore malicious activity, or produce misleading output.

A stop sign modified in a way that seems insignificant to a human may be read differently by a machine vision system. Network traffic shaped with adversarial noise may be less visible to a threat detection model. Text prompts may be crafted to bypass safety instructions, override context, or draw a model into producing information it should not reveal.

The problem is not merely that the model is wrong. The problem is that it can be made wrong by someone who understands its behaviour.

That is a profound shift. In conventional cyber, defenders ask whether an attacker can exploit a vulnerability. In AI security, defenders must also ask whether an attacker can shape the input environment so the model participates in its own compromise.

Adversarial robustness therefore cannot be treated as an academic curiosity. Models need to be stress-tested, probed, evaluated, monitored, and attacked before hostile actors do it for real. Adversarial training, input validation, anomaly detection, constrained deployment, red-teaming, and continuous evaluation all matter.

Even that is not enough. The organisation also needs to understand where the model sits in the decision chain. A flawed model that only advises a skilled human is one thing. A flawed model that triggers automated action is another thing entirely.

Poisoned Foundations

AI systems are shaped by data, and that is both their strength and their weakness.

Better data can improve performance. More representative data can improve generalisation. Well-labelled data can make a model more useful. A model trained on relevant examples can detect, classify, and predict things that would be difficult to express through fixed rules.

But if the data is poisoned, the model may learn the poison.

Data poisoning attacks target the foundation of the system. Instead of attacking the model at runtime, the attacker corrupts the material from which the model learns. The result may be subtle. The model may behave normally most of the time, then fail in specific cases, misclassify certain inputs, ignore certain threat patterns, or embed a bias that only becomes visible under particular conditions.

That is why training data is not just "data". In an AI system, it becomes part of the trusted computing base.

This creates a security problem that many organisations are not yet mature enough to handle. Traditional cybersecurity pays attention to code, credentials, networks, endpoints, patches, and configuration. AI security must also protect training sets, labelling processes, feature pipelines, embeddings, evaluation data, model registries, provenance records, and deployment artefacts.

If an organisation cannot say where its data came from, who touched it, how it was transformed, what assumptions were embedded, how it was validated, and whether it has changed, then the model's output rests on sand.

This is especially serious in critical infrastructure. Industrial environments already struggle with asset data quality, incomplete inventories, inconsistent naming, legacy dependencies, and fragmented operational context. If AI systems are layered on top of poor data discipline, they may not solve the visibility problem. They may launder it into a more confident form.

Bad data plus AI does not equal intelligence. It equals automated uncertainty with a polished interface.

Inference and Privacy

AI models are often described as generalising from data rather than memorising it. The ideal is that the model learns patterns, not secrets; that it extracts structure, not individual records; that it becomes useful without exposing the specific sensitive information used to train or tune it. In many cases, that is broadly how machine learning is meant to work.

But "meant to" carries too much weight, because models can leak.

They can reveal information through outputs, behaviour, probabilities, embeddings, retrieval systems, logs, prompts, and poorly constrained interfaces. Privacy risk is not limited to someone stealing a database. It can also arise when a model exposes traces of the data it has seen.

Membership inference attacks try to determine whether a particular data point was included in training. Model inversion attacks attempt to reconstruct sensitive features from outputs. Prompt extraction can expose hidden instructions. Retrieval-augmented systems can return information from documents that should not have been surfaced. Generative models may regurgitate fragments of training data or proprietary material.

In normal data security, the sensitive object is often clear: a file, a table, a record, a credential, a document, a database. In AI systems, sensitive information can become smeared across more places. It may exist in source data, transformed data, embeddings, vector stores, prompts, logs, fine-tuning sets, model outputs, and downstream summaries. That makes privacy harder to reason about. Abstraction is not erasure.

This is where privacy-preserving techniques become important. Differential privacy, federated learning, secure multi-party computation, access-controlled retrieval, data minimisation, encryption, and provenance tracking all have roles to play. But none of them removes the need for basic discipline.

Do not train on what you should not retain. Do not expose what you cannot govern. Do not connect a model to data you cannot classify. Do not assume fluent output is safe output.

AI as a Weapon

AI is not only something to defend. It is also something attackers can use.

It gives defenders powerful new tools: it can automate analysis, enrich threat intelligence, detect anomalies, classify malware, accelerate reverse engineering, and help responders understand incidents quickly. But attackers get the same class of advantage.

AI can help generate phishing emails, write convincing impersonation messages, automate reconnaissance, produce malicious code variants, summarise stolen material, generate deepfake audio or video, assist fraud, and scale social engineering. It can make mediocre attackers more productive and skilled attackers faster.

The most obvious example is phishing. Traditional phishing often failed because it was generic, badly written, or contextually weak. AI changes that. It can produce messages with better grammar, more plausible tone, more relevant context, and more convincing personalisation, adapted for language, role, culture, organisation, and timing.

Deepfakes add another layer. Voice and video impersonation weaken old assumptions about identity. A familiar voice is no longer strong evidence. A realistic image is no longer strong evidence. A plausible message in the right style is no longer strong evidence.

This does not mean trust is impossible. It means trust has to move down into stronger mechanisms. Cryptographic identity, transaction verification, out-of-band confirmation, hardware-backed credentials, signing, provenance, and strong process controls become more important. Human recognition alone is not enough in a world where synthetic media can imitate the signals humans evolved to trust.

AI also lowers the cost of scale. A criminal does not need to craft every lure by hand. A hostile actor does not need to manually tailor every message. A fraud campaign can be generated, tested, varied, and optimised. The defender may gain leverage from AI; so does the attacker. That is the arms race.

Governance Without Theatre

Governance is necessary. AI systems need policy, accountability, audit, assurance, standards, risk assessment, legal review, ethical scrutiny, and operational control. Regulation has a role. Internal governance has a role. Security architecture has a role. AI cannot be left as a playground for enthusiastic teams connecting models to business data and operational workflows without adult supervision.

But governance can easily become theatre. Governance that cannot inspect the real system is just paperwork around a black box.

If an organisation cannot explain what data the model uses, where that data came from, what the model is allowed to access, what actions it can trigger, how its output is validated, how drift is detected, how failure is handled, who owns the risk, and how the system is rolled back, then the governance is not doing enough.

Risk registers do not secure models. Policy decks do not validate training data. Committees do not detect adversarial prompts. A compliance statement does not prove that an AI system behaves safely in deployment.

Good governance must be connected to engineering reality. It needs inventories, ownership, data lineage, model registries, evaluation evidence, approval gates, monitoring, incident response, change control, and retirement processes. It needs to know where AI is being used, not merely where people admit it is being used.

Shadow AI is therefore a serious concern. If users paste sensitive data into external tools, build unofficial automations, connect models to documents, or use AI-generated code without review, then the organisation may have an AI risk estate it cannot even see. The governance challenge is not only to regulate grand AI systems. It is to find and control the small, useful, unofficial ones before they become embedded in business process.

Cryptography and Architecture

Cryptography gives AI security some serious tools. Signing can help prove origin. Hashing can help detect tampering. Encryption can protect confidentiality. Hardware-backed attestation can strengthen trust in runtime environments. Secure multi-party computation can allow collaborative processing without exposing all underlying data. Homomorphic encryption can, in some cases, allow computation on encrypted data. Zero-knowledge proofs may help prove claims without disclosing the underlying information.

These are not toys. In AI systems, cryptography can support model provenance, dataset integrity, identity, access control, auditability, and secure collaboration. It can help answer questions such as:

  • Was this model altered?
  • Did this dataset change?
  • Who signed this artefact?
  • Where did this output come from?
  • Was this inference performed in an approved environment?
  • Can this party prove a fact without revealing the source data?

That is powerful, but it has limits. Cryptography does not make a bad model good. It does not make poisoned data clean. It does not make a flawed objective safe. It does not make an ungoverned deployment trustworthy. It does not prove that an output is correct merely because the system that produced it was signed.

Cryptography can prove integrity, origin, identity, and confidentiality. It cannot supply judgement.

That means secure AI architecture has to be broader than cryptography. It must include data governance, model evaluation, access control, runtime isolation, least privilege, logging, monitoring, human review, rollback, incident response, and clear operational ownership.

In critical infrastructure, this matters even more. AI should not be allowed to blur the line between advisory intelligence and operational control. A model that summarises maintenance logs is not the same class of risk as a model that recommends control actions, adjusts process parameters, prioritises alarms, or influences safety-critical decisions. Architecture must make those distinctions explicit.

The question is not simply whether the model is secure. The better question is what the model is allowed to know, decide, influence, and change. That is where architecture becomes security.

The Discipline Ahead

AI will become part of cybersecurity. That is inevitable.

It will help defenders classify, correlate, summarise, prioritise, detect, and respond. It will be built into tools, platforms, workflows, and operating models. It will assist analysts, engineers, auditors, and developers, and it will assist attackers, fraudsters, and propagandists. It will be everywhere, because the economic pressure is too strong for it not to be.

But AI will not save weak security programmes from themselves. It will amplify whatever discipline already exists.

A mature organisation with strong identity, clean asset data, good monitoring, controlled change, tested recovery, clear ownership, and solid engineering judgement may use AI to become faster and sharper. A messy organisation may use AI to become faster and more confidently wrong.

That is the real risk. Not that AI is useless, not that AI is magic, not that AI is automatically dangerous in every context. The risk is that organisations treat it as ordinary software, ordinary automation, or ordinary analytics, when it is something more awkward.

AI systems are data-shaped decision systems. Their behaviour depends on training material, prompts, context, constraints, architecture, access, evaluation, and human interpretation. They can be useful, but they can also be manipulated through channels that conventional controls were not designed to inspect.

The future of AI security will not be defined by one control, one framework, one regulation, or one breakthrough. It will be defined by discipline across the whole chain:

  • model integrity
  • data provenance
  • adversarial robustness
  • identity and access control
  • secure deployment
  • continuous monitoring
  • privacy preservation
  • human accountability
  • cryptographic assurance
  • operational recovery

The old security boundary was already hard enough to defend. AI does not remove that burden. It moves part of the boundary inside the model.