Skip to content
PAVEL GLUKHIKH
Menu

AI

What is AI integrity? An engineering framework

AI integrity as an engineering discipline: verifiable behavior, governed data paths, resistance to manipulation, and a maturity model to build against.

7 min read

Executive summary

AI integrity is the engineering discipline of making an AI system behave verifiably as intended in production: outputs you can test against defined expectations, data paths you govern, manipulation you resist by design, and a named human accountable for every consequential decision the system touches. That is a property you build, not a value you declare. This article defines the discipline in engineering terms, breaks it into four properties you can hang controls on, lays out a five-level maturity model most teams can locate themselves on in an afternoon, and lists the artifacts that separate a system you can defend from one you merely hope behaves.

AI integrity is the engineering discipline of making an AI system’s behavior in production verifiable, governed, resistant to manipulation, and owned by an accountable human. Not a research agenda. Not an ethics statement. It is the set of controls you can point to when someone asks why anyone should trust what the system just did.

I use the title “AI integrity engineer” deliberately. In every other corner of engineering, integrity is a property you design for and then test: a bridge that holds under load, a database that stays consistent under concurrent writes. Nobody certifies a bridge by interviewing the architect about her values. Yet that is roughly how most organizations are currently assuring their AI systems, and the gap shows the first time something goes wrong at 2 AM and nobody can say what the system was doing, with what data, on whose authority.

AI is software. Software requires engineering. That is the whole framework, and the rest of this article is what it looks like in practice.

The four properties of AI integrity

Every control I have found useful maps to one of four properties. If you are building a program, build it against these rather than against whatever a vendor’s compliance checklist happens to contain.

Verifiable behavior. You can state what the system is supposed to do and demonstrate, with tests that run automatically, that it does it. For an LLM application that means eval suites against golden sets, not vibes and a demo that went well. The test suite is the specification. If a behavior matters and no eval covers it, that behavior is unspecified, and unspecified behavior is what you will be explaining to an incident review later. The mechanics are in evaluating AI systems in production.

Governed data paths. You know what data reaches the model, what the model emits, where both are stored, and who can see them. Training and fine-tuning lineage, retrieval corpora and their permissions, prompt and completion logging, retention. This one deserves more respect than it gets. Most real-world AI incidents I have seen discussed candidly were not model failures at all. The model did exactly what it was designed to do, with data it should never have had.

Security against manipulation. The system resists adversarial steering: prompt injection, poisoned retrieval content, tool abuse, tampered model artifacts. A system that can be talked into misbehaving by anyone able to write text into its context window has no integrity, whatever its eval scores say. The threat model gets its own treatment in securing LLM applications.

Human accountability. For every consequential action the system can take or inform, a named role owns the outcome, and the decision trail shows what the system contributed. “The model decided” is never an acceptable root cause. This property sounds organizational, but it has engineering teeth: audit-grade logging, human-in-the-loop gates on irreversible actions, and escalation paths that have actually been exercised rather than merely documented.

These properties are deliberately model-agnostic. They apply whether you are calling a frontier API, running open weights on your own GPUs, or operating a classical ML classifier that predates the LLM era and has quietly been making decisions for years.

Why “integrity” and not “safety” or “trust”

Terminology matters because it determines who owns the problem.

“AI safety” points at model developers and research labs. “Responsible AI” usually lands in legal or policy. Both matter. Neither ships controls into your production stack, and a problem that belongs to everyone in general belongs to no one on call.

Integrity points at engineering. It is the same territory NIST covers in the AI Risk Management Framework, govern, map, measure, manage, but expressed as system properties an architect can design toward. I have also found the framing survives contact with skeptical delivery teams far better. Nobody argues against “we should be able to prove the system does what we claim.” Announce a “responsible AI initiative” instead and you can watch the room’s immune response kick in. Same controls, different reception. I map the RMF to concrete artifacts in AI governance for engineers.

A maturity model you can locate yourself on

I use five levels. Most enterprises I encounter are at Level 1 and believe they are at Level 3, which is itself a finding.

LevelNameWhat it looks like
0Ad hocAI features ship like demos. No evals, no inventory, prompt changes go straight to prod.
1InventoriedYou know which systems use AI, which models, which data. An owner is named for each. Logging exists.
2TestedGolden eval sets gate releases. Model and prompt versions are pinned and reproducible. Injection testing happens before launch.
3MonitoredProduction behavior is continuously sampled and scored. Drift and regression alerts page a human. Incident thresholds are defined and rehearsed.
4GovernedPolicy is enforced in the pipeline, not in documents. Every consequential output traces to model version, prompt, context, and approver. Integrity evidence falls out of normal operation.

Two observations from walking real teams through this.

Level 1 is embarrassingly valuable and embarrassingly rare. A complete inventory, every AI touchpoint with its model, its data sources, and its owner, takes weeks rather than quarters, and everything else depends on it. It is also the only level at which shadow AI gets found: the team quietly calling a public API from a Lambda turns up during inventory work or it turns up during an incident. There is no third option.

And do not skip levels. Buying an AI observability platform while you sit at Level 0 produces dashboards over systems nobody owns. I have watched the identical failure in security programs, a microsegmentation platform bought for a flat network that nobody had mapped. The tooling outran the operating model, and the shelf got heavier. The lesson from security architecture work transfers directly: controls you cannot operate are worse than no controls, because they manufacture false confidence.

The control set, concretely

Properties and maturity levels are scaffolding. These are the actual engineering controls, grouped by the property they serve.

Verifiable behavior

  • Golden eval sets, version-controlled next to the application code, run in CI on every prompt, model, or retrieval change.
  • Pinned versions everywhere: model identifier, prompt template hash, retrieval index snapshot. If you cannot reproduce last Tuesday’s behavior, you cannot debug last Tuesday’s incident. It really is that mechanical.
  • A behavioral changelog. Prompt and policy changes are reviewed like code, because they are code.

Governed data paths

  • A data flow diagram per AI system, kept current: sources, retrieval corpora, model provider, logging sinks, retention. One page. Auditors and incident commanders both start there, which tells you something about its value.
  • Permission-aware retrieval for any RAG system. The retriever enforces the caller’s entitlements; the model’s judgment is never the enforcement point. Details in RAG architecture for the enterprise.
  • Prompt and completion logging protected with the same access controls as the most sensitive data that can appear in them. The log store is a copy of your secrets, whether you meant it to be or not.

Security against manipulation

  • Treat all model input as untrusted, including retrieved documents and tool outputs, not just the user’s message.
  • Least-privilege tool scopes: narrowly scoped credentials per task, never a service account with standing access.
  • Artifact integrity for self-hosted models: checksums, signed registries, and a supply-chain policy for weights and inference code, per the NCSC secure AI development guidelines.

Human accountability

  • A named owner per system, in the inventory, with the authority to turn the system off. Ownership without a kill switch is spectating.
  • Human-in-the-loop gates on actions that are irreversible or externally visible: payments, customer communications, records changes.
  • An audit trail sufficient to reconstruct any consequential decision: input, context, model version, output, and the human who acted on it.

Tradeoffs worth being honest about

Integrity costs something, and pretending otherwise is the fastest way to discredit the program.

It costs latency and money. Eval pipelines, logging, and human gates all add both. The honest argument is blast-radius-based: spend integrity budget where a bad output is expensive, not uniformly across everything with a model in it.

It costs velocity. Gating releases on evals slows shipping, exactly as much as test suites slow software shipping, and for the same reason. Teams that have lived through one silent behavioral regression in production tend to stop raising this objection.

And it can curdle into false precision. A 94% pass rate on a stale golden set measures the past. The maturity model puts continuous production sampling at Level 3 precisely because it is the only eval that cannot go stale underneath you.

Where to start

Build the Level 1 inventory this month. Every AI system, its model, its data paths, its owner. Then take the single system with the largest blast radius to Level 2, golden set, pinned versions, injection test, before touching anything else. Breadth first for visibility, depth first for risk.

Write three things down as decision records: what correct behavior means for each system, which actions require a human, and what threshold of bad outputs constitutes an incident.

None of this is novel. It is the same discipline engineering has always applied to components it depends on but does not fully trust, which describes databases, networks, vendors, and now models. The technology at the center changed. The obligation to prove your system does what you claim never did.

Frequently asked questions

What does AI integrity mean in practice?
It means you can answer four questions about a production AI system with evidence rather than optimism. Does it behave the way you claim? Do you know what data flows in and out? Can an attacker steer it? Who owns the outcome when it is wrong? If any of those answers is a shrug, the system lacks integrity, no matter how well the underlying model benchmarks.
How is AI integrity different from AI safety or responsible AI?
AI safety research targets model-level behavior, mostly at the frontier. Responsible AI usually lives in policy and ethics. AI integrity is the engineering layer between them: the controls, tests, and audit artifacts that make one specific deployed system trustworthy. It takes the mindset of site reliability and security engineering and points it at AI behavior. Different owner, different deliverables, different failure modes.
Do small teams need an AI integrity program?
They need the properties, not the program. Two people can maintain a golden eval set, log every model interaction, pin model versions, and write down who approves consequential outputs. That is Level 2 maturity on a single page. The formal apparatus, review boards and drift dashboards and policy-as-code, earns its keep only when the blast radius of a bad output justifies the overhead.
Can you measure AI integrity?
You can measure its components: eval pass rates against golden sets, grounding-faithfulness scores for RAG systems, injection-resistance results, data lineage coverage, time to detect a behavioral regression. No single number captures the whole property, and that is exactly why a maturity model beats a score. A score invites gaming; a maturity level tells you which measurement to build next.

References

Related reading