Data Loss Prevention for AI-First Businesses

Data loss prevention (DLP) used to mean stopping sensitive information from leaving through email, endpoints, cloud storage or the network. Those controls still matter. But artificial intelligence (AI) has changed the shape of the problem.

At one end of the risk spectrum, incidents such as Samsung’s early ChatGPT leaks showed how easily employees can copy sensitive code, notes or documents into public AI tools. At the other, the EchoLeak vulnerability in Microsoft 365 Copilot showed that AI systems can be manipulated through content the user may not even actively engage with.

Those two examples are useful bookends. One is a familiar behaviour problem: copy and paste. The other is a newer architecture problem: AI systems combining prompts, retrieval, permissions and external content in ways that traditional controls were not designed to inspect.

For an AI-first business, DLP is no longer only about stopping files leaving the organisation. It is about governing what AI can see, what users can send to AI, what AI can retrieve, and what AI-powered systems are allowed to do.

Why AI changes the DLP problem

AI introduces two new leakage channels.

The first is the prompt and response channel. Employees may paste confidential data into consumer AI tools, upload documents for summarisation or ask an assistant to analyse sensitive commercial information. Even when the document itself does not leave through email or file transfer, the prompt may reveal customer data, intellectual property, strategy, pricing, legal positions or operational weaknesses.

The second is the AI build channel. Organisations building their own AI systems need to protect retrieval augmented generation (RAG) pipelines, vector databases, embeddings, fine-tuning data, logs, agents, connectors and system prompts. Sensitive data can leak through retrieval, model responses, over-permissioned tools, prompt injection or poorly governed training data.

Traditional DLP categories still apply, but each now needs to adapt:

Endpoint DLP must detect and control paste, upload and file transfer into AI tools from managed devices and browsers.
Network DLP must understand AI destinations, sanctioned and unsanctioned tools, tenant boundaries and encrypted traffic patterns.
Email DLP must account for AI-generated content, AI-assisted attachments and prompt-injection risks hidden in messages.
Cloud DLP must govern Microsoft 365, software-as-a-service (SaaS) applications, third-party AI tools and the permissions that determine what AI assistants can retrieve.

The result is not a replacement for existing DLP. It is an extension of DLP into AI usage, AI development and AI governance.

Risk dimension one: employees using AI

The most immediate risk for many organisations is not a sophisticated attack. It is ordinary work happening in an uncontrolled tool.

Staff want to summarise, rewrite, translate, compare and analyse. If the approved route is unclear, slow or unavailable, they may use public AI services through personal accounts. A ban rarely solves this. It usually pushes behaviour out of sight.

The main DLP risks from employee AI use are:

Copying sensitive data into public AI tools. This may include personal data, source code, contracts, board papers, customer records, incident reports or commercially sensitive plans.
Uploading documents for analysis. A document upload can bypass established email and sharing controls if the AI tool is outside the organisation’s governed environment.
Using personal accounts. Personal AI accounts make activity harder to monitor, govern, retain or audit.
Prompt leakage. The question itself may reveal information, even if the user does not upload a file.
RAG oversharing. AI assistants grounded in Microsoft 365 or other content stores can surface information that a user technically has permission to access but does not need for their role.

Microsoft 365 Copilot is a good example of why information governance matters before AI is scaled. Copilot works with content available to the signed-in user. If SharePoint, Teams and OneDrive permissions are too broad, AI can make that overexposure more visible. virtco®’s AI readiness material makes the same point: organisations should prepare Microsoft 365 content for search and Copilot by improving information quality, reducing redundant, outdated and trivial content, and putting access controls in place before deployment.

In practice, this means reviewing oversharing, anonymous links, external access, broken inheritance, dormant sites, sensitivity labels and retention before rolling Copilot out widely.

Risk dimension two: organisations building AI

The second risk area is less visible but often more complex: the systems an organisation builds itself.

RAG and agentic AI can be powerful, but they change the security boundary. Data that used to sit behind an application permission model may now be chunked, embedded, indexed, retrieved, summarised and passed to a model at runtime. If the controls are weak, AI can expose information faster and at greater scale than a human user searching manually.

Key build-side DLP risks include:

Retrieval augmented generation

In a RAG system, retrieved content becomes prompt context. If the retrieval layer does not enforce source-document permissions, the model may receive information the user should not see. Filtering after generation is too late. Access control must happen before content is added to the prompt.

Embeddings and vector stores

Embeddings are not harmless metadata. They can carry information about the source material and should be protected as sensitive data. Vector databases need encryption, access control, tenant separation, retention rules and deletion processes that align with the underlying documents.

Training data and fine-tuning

Personal data, customer records, confidential decisions and intellectual property should not be added to training or fine-tuning data without a clear lawful basis, minimisation, approval and retention model. Prompt and response logs need the same attention.

Prompt injection

Prompt injection can cause an AI system to ignore instructions, reveal information or misuse tools. The risk increases when the system processes untrusted content such as emails, web pages, uploaded documents or external knowledge sources.

Agents and connectors

Agents can read, write, send, approve and trigger actions. That makes least privilege essential. Tokens should be scoped, time-bound and monitored. Tools should be allow-listed. High-risk actions should require human approval.

System prompt leakage

System prompts often contain internal rules, integration details or business logic. They should not contain secrets, keys or sensitive configuration. Treat them as controlled assets, not casual documentation.

The tooling landscape

There is no single AI DLP product that solves the whole problem. The market is developing in layers, and most organisations will need a combination of controls.

Traditional DLP, extended for AI

Endpoint, email, network and cloud DLP remain important. The difference is that policies now need to cover AI websites, browser activity, document upload, paste events and sanctioned versus unsanctioned AI services.

Cloud access security broker, security service edge and secure access service edge

Cloud access security broker (CASB), security service edge (SSE) and secure access service edge (SASE) platforms can help discover AI use, distinguish corporate from personal accounts, apply inline controls and coach users towards approved tools.

Data security posture management

Data security posture management (DSPM) tools help organisations discover sensitive data, classify it, assess exposure and prioritise remediation. In Microsoft-heavy environments, Microsoft Purview is particularly relevant because it connects information protection, DLP, audit, insider risk and Microsoft 365 data governance.

AI security gateways and guardrails

AI gateways and large language model (LLM) firewalls inspect prompts, responses and tool calls. They can help detect sensitive data, prompt injection, jailbreak attempts, policy violations and unauthorised tool use. They are useful, but they should not be treated as a substitute for data minimisation and strong architecture.

Microsoft 365 governance controls

For Microsoft 365 environments, the core toolkit usually includes:

Microsoft Purview Information Protection;
sensitivity labels and auto-labelling;
Microsoft Purview DLP;
endpoint DLP and browser controls;
Microsoft Purview audit and activity monitoring;
insider risk management where appropriate;
Microsoft Entra conditional access and privileged identity management;
SharePoint Advanced Management for oversharing, inactive sites, access reviews and restricted discovery.

virtco®’s Microsoft 365 AI adoption guidance emphasises the same foundations: content, permissions, Teams and SharePoint structure, OneDrive usage, external sharing, retention, sensitivity labels and user adoption should be reviewed before AI is scaled.

Techniques that actually work

Effective AI DLP is not only a tooling decision. It is a set of repeatable practices.

Classify and label data

You cannot reliably protect data you have not identified. Start with a pragmatic classification model and apply sensitivity labels to documents, emails, sites, groups and workspaces where appropriate.

Reduce the data AI can see

The safest data is data the model never receives. Remove redundant, outdated and trivial content. Archive stale sites. Tighten permissions. Separate high-risk repositories from general knowledge stores.

Enforce access at retrieval time

For RAG systems, apply role-based access control (RBAC) or attribute-based access control (ABAC) before content is retrieved into the prompt. Store source permissions and metadata with each indexed item. Filter by identity, role, region, matter, project, client or other relevant attributes.

Redact and minimise

Use redaction, masking, tokenisation or anonymisation where the model does not need the original data. Microsoft Presidio and similar tools can support detection and anonymisation of personal information, but they should be used as part of a wider control model because detection is never perfect.

Separate trusted and untrusted content

Do not mix internal knowledge, external web content, user uploads and system instructions without boundaries. Prompt partitioning, content sanitisation and strict tool permissions reduce the impact of prompt injection.

Control tools and actions

Agentic systems should have explicit tool allow-lists, narrow scopes, just-in-time access and approval gates for high-impact actions such as sending emails, changing records, making payments or publishing content.

Log and monitor AI activity

AI interactions should produce evidence. Log prompts, retrieval events, documents used, tool calls, policy decisions, overrides and outputs where lawful and proportionate. These logs support audit, investigation, tuning and incident response.

Governance should be the backbone

AI DLP is strongest when it sits inside a governance model rather than a collection of disconnected technical controls.

A practical governance backbone should cover:

Ownership: who approves AI use cases, data sources, tools and exceptions.
Policy: what staff can use AI for, what data is prohibited, and when human review is required.
Risk assessment: how use cases are assessed before launch and reviewed during operation.
Information governance: classification, retention, access, sharing, records and deletion.
Supplier and connector governance: which models, plug-ins, connectors and data processors are approved.
Monitoring and assurance: how the organisation proves controls are working.
Incident response: how AI-related leakage, prompt injection or misuse will be detected and handled.

Several frameworks can help structure this:

the National Institute of Standards and Technology Artificial Intelligence Risk Management Framework (NIST AI RMF) for risk language and lifecycle thinking;
International Organization for Standardization / International Electrotechnical Commission (ISO/IEC) 42001 for an AI management system;
the Open Worldwide Application Security Project (OWASP) Top 10 for Large Language Model Applications for technical AI risks;
the European Union Artificial Intelligence Act for organisations operating in or selling into the EU;
the General Data Protection Regulation (GDPR) and UK GDPR for personal data, purpose limitation, minimisation, transparency, deletion and lawful processing.

virtco®’s AI Risk Framework™ is designed as a living approach to AI adoption, with governance, identification, assessment, management and monitoring layers. That is the right direction for AI DLP: not a one-off control project, but an operating model that adapts as use cases, data and regulation change.

Architecture-first AI DLP

The strongest AI DLP posture is designed into the architecture.

That means applying zero-trust principles to AI:

verify user identity and device posture;
apply least privilege to content, tools and connectors;
assume prompts and external content may be hostile;
minimise the data sent to models;
isolate sensitive workloads;
monitor continuously;
require approval for high-risk actions.

For sensitive data, organisations should favour private or dedicated AI endpoints, strong tenant controls, region and residency choices, customer-managed keys where required, and network isolation. Highly sensitive workloads may need local or private inference rather than consumer AI services.

The practical principle is simple: keep personal data, regulated data and crown-jewel intellectual property out of third-party consumer AI tools. Where AI does need sensitive context, use governed enterprise services, minimise what is shared, and make sure controls can be evidenced.

Three practical how-tos

1. Reduce sensitive paste and upload into AI tools

A practical Microsoft 365 pattern is to combine endpoint DLP, browser controls and user coaching.

Start by identifying the AI tools staff are using. Then separate approved enterprise tools from consumer or personal-account use. Use managed browsers where possible, block or restrict unmanaged browsers for high-risk AI sites, and apply DLP policies to sensitive information types and labelled content.

The aim is not only to block. Blocking without an approved route often creates workarounds. A better pattern is:

discover AI usage;
provide sanctioned tools;
warn users when they are about to share sensitive data;
block the highest-risk actions;
allow justified overrides where policy permits;
review activity and tune controls.

For source code, product designs, regulated data or client-confidential material, use stricter controls and require enterprise-approved AI environments only.

2. Remediate SharePoint before scaling Copilot

Before scaling Microsoft 365 Copilot, review SharePoint, Teams and OneDrive as if they were becoming a searchable AI knowledge base — because they are.

A practical remediation sequence is:

identify overshared sites, broad links and external access;
find inactive, ownerless or unmanaged sites;
remove redundant, outdated and trivial content;
apply sensitivity labels to sensitive sites and documents;
restrict access to high-risk sites;
use restricted discovery where content should not appear in organisation-wide search or Copilot experiences;
validate with a pilot group before wider deployment;
move to continuous monitoring and access reviews.

This aligns with virtco®’s wider view of Microsoft 365 AI adoption: AI performs best when the information estate is organised, permissioned and trusted.

3. Build permission-aware RAG

For organisations building their own AI assistants, permission-aware retrieval is non-negotiable.

The reference pattern is:

capture source permissions and metadata at ingestion;
store those permissions with the indexed content and vectors;
authenticate the user at query time;
filter retrieval results before the prompt is assembled;
separate internal, external and restricted corpora;
redact or minimise personal data before embedding where possible;
encrypt the vector store and logs;
record every retrieval event for audit and deletion handling;
test with adversarial prompts and over-permissioned user scenarios.

If a user could not open the source document, the RAG system should not retrieve it for that user.

A staged roadmap

AI DLP can feel broad, but the work becomes manageable when it is staged.

Stage 0: govern first

Define the AI policy, approval process, ownership model and risk assessment route. Decide which tools are approved, which data is prohibited, and which use cases require legal, security or compliance review.

Stage 1: make AI use visible

Discover where staff are using AI today. Identify personal-account use, high-risk tools, sensitive prompts, file uploads and unmanaged browser activity. Provide a sanctioned route so people can work productively without moving data into uncontrolled services.

Stage 2: fix Microsoft 365 oversharing

Review SharePoint, Teams, OneDrive, external sharing, anonymous links, inactive sites, ownerless workspaces, retention and sensitivity labels. Do this before scaling Copilot or other AI assistants grounded in Microsoft 365 content.

Stage 3: secure AI systems you build

Apply secure-by-design patterns for RAG, embeddings, agents, connectors, logs and model endpoints. Enforce retrieval-time access control. Keep sensitive data out of training and fine-tuning unless there is a clear governance basis.

Stage 4: operate and prove

Monitor AI activity, review exceptions, update policies, test prompt-injection defences, audit access and maintain evidence. AI governance should become part of business-as-usual risk management.

Where virtco® can help

For many organisations, the hardest part is not choosing an AI tool. It is preparing the information estate, defining the controls and helping people adopt AI safely.

virtco® works across cloud adoption, Microsoft 365, SharePoint, Teams, Power Platform, automation, security, data migration, digital workplace adoption and business change. That combination matters because AI DLP spans people, process, information architecture, permissions, compliance and technical configuration.

If your organisation is Microsoft 365-heavy, a governance-led approach can help you move faster without weakening control. Start with the content, permissions and risk model. Then scale the AI use cases that are safe, useful and measurable.

Talk to virtco® if you want a practical view of how to prepare Microsoft 365, Copilot and AI-enabled workflows for secure adoption.

Data Loss Prevention for the AI-First Business