Andrew Tollinton, Co-Founder of SIRV, explores when you should and shouldn’t rely on ChatGPT.
ChatGPT is easy and convenient. You open a window, ask a question and within seconds you get an answer.
For security, risk and compliance teams, working under time pressure and with limited capacity, that speed is tempting. Ask it to draft an audit report, correct notes and produce minutes.
Used this way, on small, bounded tasks, it can be helpful.
The trouble begins when we ask it to produce much more than a discreet bundle of work; when a department comes to rely on it.
Ask ChatGPT to stand in for a library, a filing cabinet or a controlled document system and you get into trouble.
Security, risk and compliance departments hold millions of words across policies, contracts, technical specifications, incident reports and regulatory guidance.
Much of that material changes, and some of it is sensitive. ChatGPT is very good at reasoning and writing, but it doesn’t know it cannot see most of your information or prove exactly where a claim came from or show that the version it used is the one your auditors will accept.
That is not a technical quibble; it’s the difference between a useful assistant and an unreliable authority.
We should think of ChatGPT as a gifted editor with a small desk. You can place a handful of documents in front of it and it will read them and respond.
But if you stack too many papers on the desk, the ones at the bottom are simply invisible and ones in the middle go unseen. This “desk size” is the context window (each chat).
When the pile exceeds the desk space, relevant passages drop out of view. In practice, it means the model might miss the appendix that holds a crucial definition, or a newer policy that replaces last year’s rule.
Connectivity is another problem. The model does not roam the internet or your file shares unless you specifically feed it those materials. If a correct answer sits outside the material you provide or beyond what the model learned during training, it will generalise.
Sometimes that is fine; in high-stakes environments it is not.
In risk work we care about the exact clause, the page number, the wording that tells us which button to press on the fire panel. We need to point a colleague or an auditor to the source.
An answer without a clear citation is, at best, a draft that must be checked.
Repeatability matters too. Ask the same question twice and you may get two different phrasings, or even two different recommendations, unless both answers are anchored to the same evidence.
If the client wants to know why a decision was taken, “because ChatGPT said so” is not enough or satisfactory.
We need provenance and a trail: Who asked whether we should evacuate?! Which document was consulted, which version and what changed the last time we looked.
Access control is an obvious concern: Security information is sensitive and shared on a need-to-know basis. A general chat interface does not, by itself, enforce who can see which paragraph in which policy, or who is allowed to combine information across departments.
Nor does it keep an authoritative log of “who saw what and when”.
You can run ChatGPT in safer configurations, but the point remains: If you require fine-grained permissions and auditable access, you need more than a clever writer.
None of this means we should avoid ChatGPT.
It means we should place it in the right role. For small, low-stakes work, say up to a few hundred pages of material, you can curate the excerpts, give the model a focused brief and ask it to answer in simple terms. It will save time and improve clarity.
The problems arise as soon as the scale increases, the facts change frequently or the stakes rise. In those cases we need an information layer between questions and the model’s words.
A helpful analogy is a well-run library.
A good librarian does not make you read every book; they help you find the right passages quickly. In the AI world, this is often called retrieval.
Instead of dumping piles of documents on the model’s desk, you ask a separate system to search your corpus, pick the few excerpts most likely to answer your question and pass only those to the model.
The model is then asked to write from those excerpts and to cite them. The page number and document ID is recorded and on click, the section from the original document pops-up. When the source document is updated, the search index is updated too, so freshness improves without changing the model itself.
This approach, pairing ChatGPT’s writing ability with your own “library service”, solves several of the problems above. Because the model writes from selected passages, factual accuracy improves and claims become checkable.
Because you store the document IDs and page numbers, audit becomes easier. Because you only pass a small number of targeted excerpts, the system stays fast and costs less to run.
And, because your library service can respect permissions, you can filter sources by user and role, which keeps sensitive material in the right hands.
There are trade-offs. Building a capable “library service” is a lot of work.
You need to prepare your documents, scan the ones that are images, split them into sensible chunks with their headings attached and keep an index up to date as files change.
Some formats are easier to digest than others: Simple, structured email text is easy, PDFs with diagrams, images and sketches are harder. You must monitor search quality and avoid over-tight filters that hide useful context.
You should also measure how the system performs: Can it find an answer at all, does the answer include citations, how long does it take, how often is the information out of date?
These are not insurmountable problems, but they are real work and someone has to pay for it.
For a non-technical audience, it helps to think in terms of questions you already ask.
If the task is small and bounded, such as “summarise this intel report for a briefing”, “compare two supplier clauses” or “turn yesterday’s incident record into a clean first draft for a crime report”, ChatGPT with a carefully chosen set of excerpts is likely to be good enough.
If the task is large, dynamic or regulated, for example “what to do if someone has acid thrown in their face?”, “tell me about drone usage outside our building” or “evidence our compliance for an external review”, then you want the library service in the loop.
ChatGPT alone is excellent at reasoning but blind to most of a large, evolving corpus found in security, risk and compliance departments.
It cannot reliably ground, cite or stay current without help.
Adding your own “library service” gives you recall, freshness, attribution and policy control.
You pay for that with some infrastructure and ongoing care, but you gain trust. In security and risk, that is the point. Safe AI is trusted AI. Trust is earned with governance and evidence, not speed and flattery.
| Situation | Use ChatGPT | Don’t use ChatGPT alone | Why | 
| Scope of material | A few pages to a few hundred pages, well-curated excerpts. | Large or sprawling corpora (policies, contracts, incident logs, regs). | Context window limits; recall gaps without retrieval. | 
| Stakes | Low-stakes tasks (briefings, drafts, note-tidying). | High-consequence decisions or anything safety/operations-critical. | Needs provenance, repeatability, and approvals. | 
| Freshness | Static or rarely changing content. | Frequently updated sources; “what changed since last review?”. | Risk of stale answers without indexed updates. | 
| Citations/provenance | Nice-to-have references. | Must show clause, page, version, URL/ID. | Audits and reviews require traceability. | 
| Access control | Open or non-sensitive material. | Need-to-know content; row-/field-level permissions. | Chat alone cannot enforce fine-grained ACLs. | 
| Repeatability | Acceptable if wording varies across runs. | Same query must yield the same, sourced answer. | Determinism requires evidence anchoring. | 
| Data tasks | Light synthesis and explanation. | Cross-doc joins, sums, filters, tables at scale. | Needs structured tools (SQL/DF) with retrieval. | 
| Latency/cost | Short prompts, quick drafts. | Long pastes, heavy context, many iterations. | Token bloat slows responses, raises cost. | 
| Compliance | Internal comms, exploratory analysis. | Regulated outputs, external submissions. | Controlled documents beat free-form prose. | 
| Human approval | Reversible edits and suggestions. | Irreversible actions affecting people/assets/posture. | Keep a named approver in the loop. | 
| Training/learning | Table-top scenarios, quizzes, rewriting for clarity. | Live SOPs, command guidance or authoritative checklists. | Source of truth must be controlled and versioned. | 

Andrew Tollinton is Co-Founder of SIRV, the UK’s enterprise resilience platform.
A leader in risk management technology, he chairs the Institute of Strategic Risk Management’s AI in Risk Management group and regularly speaks on AI and resilience at global conferences.
A London Business School alumnus, Andrew brings 20+ years’ experience at the intersection of technology, compliance and security.
To find out more information about SIRV, click here.
Click to Open Code Editor