Whitepaper
Building an ICS Security Program: A Blueprint
A blueprint for an industrial control system security program: governance, asset inventory, Purdue-informed architecture, OT monitoring, and incident response.
Abstract
Industrial control system environments present a security problem that enterprise IT programs are structurally unprepared for: systems where availability and safety outrank confidentiality, where patching windows are measured in quarters, and where a wrong active scan can trip a physical process. This paper presents a blueprint for building an ICS security program from the ground up, organized into five pillars in deliberate build order: governance that respects the engineering organization's authority over process safety; asset inventory built passively before any control is deployed; network architecture informed by the Purdue reference model and IEC 62443 zone-and-conduit concepts; monitoring designed for OT protocols and constraints; and incident response that pre-negotiates the authority to act. Guidance is grounded in NIST SP 800-82r3 and informed by direct experience administering process-control networks in a petrochemical environment.
The starting condition
Most ICS security programs begin the same way: an audit finding, a scary advisory, or an executive reading about a peer’s incident, followed by the discovery that the “air gap” everyone assumed has a dozen crossings — a historian replicating to the business network, a vendor VPN from 2014, an HMI with a second NIC someone added during a turnaround.
I administered both office and process-control networks at a petrochemical plant for four years, and the most important thing that experience taught me is that ICS security is not IT security with older Windows. It is a different discipline with an inverted priority order — safety, then availability, then integrity, then confidentiality — and a different power structure, because plant engineering, not IT, answers for the physical process. A program that ignores either fact produces binders, not security.
This blueprint has five pillars, presented in build order.
The order is the point. Each pillar depends on the ones before it, and most failed programs I have seen failed by buying Pillar 4 before earning Pillar 1.
Pillar 1 — Governance: settle authority before touching anything
The first deliverable of an ICS security program is not technical. It is a written agreement on who decides what, because the default answer — nobody knows — paralyzes every later step.
Minimum governance set:
- A charter naming the accountable executive for OT cyber risk, with plant management as co-signer. Security programs imposed on plants from corporate IT fail; programs co-owned with operations survive.
- A decision-rights matrix. Who approves a firewall change on the IT/OT boundary? Who can disconnect a cell network during an incident? Who accepts the risk of an unpatchable controller? Write down names, not departments.
- An OT-specific risk acceptance process. In OT, “patch it” is often not an available answer; the honest alternative is documented compensating controls with an owner and a review date. IEC 62443’s target security levels (SL-T) per zone give this process a vocabulary: you are deciding, zone by zone, how much protection the consequence justifies.
- Change management that routes through plant engineering. Every network or system change in the OT estate rides the plant’s management-of-change process, not IT’s CAB alone.
Pillar 2 — Asset inventory: passive first, always
You cannot defend what you have not enumerated, and in OT you cannot enumerate the way IT does. Active scanning has crashed PLCs and locked up HMIs in enough documented cases that the rule is simple: passive discovery first, active interrogation only with engineering approval, per device class, in a maintenance window.
The practical sequence:
- SPAN/TAP the control network chokepoints and let a passive OT discovery tool (or, in a small plant, disciplined use of Zeek and protocol dissectors) build the initial inventory from observed traffic.
- Reconcile against engineering documentation — P&IDs, panel schedules, the integrator’s as-builts. The deltas are the interesting part: the undocumented device is either forgotten or unauthorized, and both matter.
- Record the attributes that drive decisions: make/model/firmware, role in the process, consequence of loss, network location, remote access paths, and vendor support status. A CSV that is accurate beats a CMDB that is aspirational.
- Assign every asset to a zone (next pillar) and every zone a consequence rating. This is the join point between inventory and architecture.
Expect the inventory to run 20–40% larger than anyone predicted. Every plant I have seen instrumented found devices nobody could name on the first pass.
That surprise is not a failure of the plant. It is the accumulated residue of twenty years of turnarounds, integrators, and urgent fixes — and finding it passively, without tripping anything, is how the security program earns its first credibility with the people who run the process.
Pillar 3 — Network architecture: Purdue-informed zones and conduits
The Purdue model remains the shared vocabulary, even though modern OT traffic — cloud analytics, IIoT sensors, remote vendor access — refuses to respect its levels. Use it the way IEC 62443 does: as input to a zone and conduit design, where a zone groups assets of similar function and consequence, and a conduit is the controlled, monitored path between zones.
The non-negotiable structural elements:
- A real IT/OT boundary with an industrial DMZ (Level 3.5). No direct flows from the business network to control systems, ever. Historians, patch staging, jump hosts, and file transfer all terminate in the DMZ; data crosses by being re-originated, not routed through.
- Zones by consequence, not convenience. Safety systems (SIS) get their own zone, isolated from basic process control — this is the difference between a bad day and a catastrophic one, and 800-82r3 is explicit about it. Cell/area zones follow physical process boundaries so an incident can be contained to one unit.
- Conduits that are enforced, not drawn. Industrial firewalls or at minimum ACL-enforced routing between zones, with deny-by-default posture and protocol-aware rules where the gear supports DPI for Modbus/TCP, EtherNet/IP, or OPC UA.
- Remote access as a designed system. Vendor and engineer access goes through the DMZ jump architecture with MFA, session recording, and time-boxed accounts — the full pattern is in the OT remote access architecture piece. The persistent vendor VPN into Level 2 is the single most common finding in OT assessments, and it is the first thing to kill.
The OT network reference architecture in the library shows a full worked topology; this blueprint’s concern is that the program builds toward it deliberately, zone by zone, starting with the IT/OT boundary because that is where enterprise ransomware enters.
Pillar 4 — Monitoring: visibility tuned to OT reality
OT monitoring succeeds where IT-style monitoring fails by respecting two constraints: you often cannot put agents on endpoints, and the traffic itself is the richest signal you have, because control networks are blessedly predictable. A PLC talks to the same HMIs, the same historian, on the same protocols, at the same cadence, for years.
The monitoring stack in priority order:
- Passive network monitoring in every zone that matters — the same SPAN/TAP infrastructure from Pillar 2, now feeding continuous baseline and anomaly detection. New device, new protocol, new conduit crossing, controller mode change, firmware write: each of these is a high-fidelity alert in OT in a way it never is in IT.
- Boundary telemetry. Firewall logs from every conduit, DMZ authentication logs, and remote-access session records, shipped to the SIEM the SOC actually watches.
- Windows telemetry where it exists. HMIs, engineering workstations, and historians are ordinary Windows machines and deserve ordinary logging — they are also the most commonly compromised OT assets.
- Alert routing that includes the control room. An OT alert that only reaches a corporate SOC that does not know what a PLC is will be mis-triaged. Pair SOC analysts with named plant engineers per site.
Resist the urge to buy detection before finishing Pillars 2 and 3. A detection platform pointed at an unsegmented network with an unknown asset base produces alerts nobody can act on.
Pillar 5 — Response: negotiate authority before the incident
OT incident response planning is 20% technical and 80% pre-negotiated authority. The questions that consume the first hour of a real incident must be answered in writing beforehand:
- Who can order isolation of a zone or the IT/OT boundary, and what is the process-safety impact of each isolation action? Some units can ride through disconnection; some cannot shed their control system without a shutdown sequence measured in hours.
- What is the manual-operation fallback, and when was it last exercised? A plant that has not run a unit in manual since commissioning does not actually have that fallback.
- What gets preserved? Forensics in OT means capturing historian data, controller logic, and network captures without stopping the process — decide in advance what evidence matters and who collects it.
- When does safety trump investigation? Always — write it down anyway, so nobody hesitates.
Exercise annually at minimum, tabletop plus one live drill of a boundary isolation in a maintenance window. The incident response practices from the IT side transfer in structure but not in detail; the OT annex to the IR plan is its own document with plant engineering as co-author.
Build order and honest timelines
| Phase | Pillars | Typical duration | Exit condition |
|---|---|---|---|
| 1 | Governance + inventory started | 3–6 months | Charter signed, passive monitoring live, first inventory cut |
| 2 | IT/OT boundary + DMZ | 6–12 months | No direct business-to-control flows remain |
| 3 | Internal zoning + monitoring | 12–24 months | Priority zones enforced, SOC receiving OT telemetry |
| 4 | Response + exercises | ongoing | Annual exercise complete, authorities documented |
Multi-year, deliberately. The programs that fail are the ones that try to compress this into a fiscal year, deploy tooling ahead of governance, and burn plant engineering’s trust with a scan-induced outage in month two. Trust in a plant is spent in seconds and rebuilt in years, and no security budget line replaces it.
The programs that succeed treat the plant as the customer. In OT security the process is the mission; the program exists to keep it running safely, and every architectural decision should be traceable back to that sentence.
Frequently asked questions
- Where should an ICS security program start?
- Governance and asset inventory, in that order — not tooling. Until roles are agreed between security and plant engineering, and until you know what is actually on the control network, any control you deploy is guesswork. Passive inventory via network monitoring is the standard first technical step because it cannot disturb the process.
- Is the Purdue model still relevant for modern OT environments?
- Yes, as a reference vocabulary and a zoning discipline rather than a literal network diagram. Cloud analytics, remote access, and IIoT devices cross its levels routinely, but the core idea survives: group assets by function and consequence, and force traffic between groups through controlled, monitored conduits. IEC 62443 formalizes this as zones and conduits.
- Can I just extend my IT security stack into the plant?
- Mostly no. Active vulnerability scanning can crash PLCs and older Windows HMIs, agent deployment is often impossible or voids vendor support, and IT patch cadences do not survive contact with production schedules. The program needs OT-specific monitoring, compensating controls in place of patching, and change processes that route through plant engineering.
- What does NIST SP 800-82r3 add over generic security frameworks?
- It translates generic controls into OT reality: it defines OT-specific threat and vulnerability considerations, adjusts control baselines for environments where availability and safety dominate, and gives concrete architecture guidance including segmentation between IT and OT. It is the reference document a mixed IT/OT team can align on without relitigating first principles.