Skip to content
PAVEL GLUKHIKH
Menu

Industrial Systems

SCADA Network Design: Polling, Protocols, and Redundancy

SCADA network design from the wire up: polling vs report-by-exception, Modbus and DNP3 security realities, redundancy, time sync, and remote site links.

6 min read

Executive summary

SCADA network design is the engineering of the communications layer between control centers and field devices: polling architectures, protocol selection, redundant paths, time synchronization, and remote site connectivity. The defining constraints are protocols that were designed without authentication, links that may be decades old and bandwidth-poor, and an availability requirement that tolerates neither single points of failure nor clever fragility. This article covers how polling actually behaves on the wire, what Modbus and DNP3 do and do not protect, redundancy patterns that avoid common-mode failures, why time sync deserves design attention, and how to connect remote sites without putting outstations on the public internet.

The constraint set you are designing against

A SCADA network moves small amounts of critical data over links you often do not fully control, using protocols that predate the concept of an attacker, to devices that may outlive the engineers who installed them. Design for those constraints honestly and the rest follows. Design for a greenfield datacenter fantasy and the field will correct you.

The traffic profile is unlike anything in enterprise networking: kilobits per second per site, but with hard expectations about latency consistency, ordering, and availability. A SCADA master that misses three poll cycles to a pipeline valve site is an operational event.

The design goal is not throughput. It is predictable degradation and unattended recovery.

Polling architectures: what actually happens on the wire

Strict poll-response is the classic model and all that Modbus offers. The master interrogates each outstation in turn; devices never speak unsolicited. It is beautifully deterministic — you can compute worst-case scan time from device count, register count, and line rate — and it wastes most of its bandwidth confirming that nothing changed. On legacy serial loops at 9600 baud with a dozen drops, scan time arithmetic still decides what is possible: adding points to the poll table slows every other point on the loop, and someone will notice at the HMI.

Report-by-exception is why DNP3 (standardized as IEEE 1815) won North American utility SCADA. Outstations timestamp and buffer events locally, grouped into classes; the master polls frequently for event classes (cheap) and occasionally for a full “integrity poll” (expensive) to resynchronize static state. Outstations can also send unsolicited responses when configured. Two design consequences matter. First, bandwidth scales with change, not point count, which is what makes low-rate radio and cellular links viable. Second, event buffering means a communications outage loses no history — when the link returns, the buffered sequence of events arrives with device timestamps. That property is worth real money during incident reconstruction, and it is why time sync (below) is not optional.

Size the polling design deliberately: define scan classes (fast scan for control-critical points, slow scan for the rest), compute worst-case cycle times on the slowest link, and document the integrity-poll schedule. These numbers become your baseline for “the system feels slow” tickets forever after.

Protocol realities: assume no protocol will protect you

Modbus has no authentication of any kind. Anything that can open TCP 502 to a device can write coils and holding registers — there is no concept of an unauthorized master. DNP3 is richer (timestamps, quality flags, event classes) but classic deployments are equally unauthenticated; DNP3 Secure Authentication exists in the standard and is genuinely good engineering, but it requires support at the master and every outstation plus key management across a mixed-vendor fleet, and real-world uptake reflects that. IEC 60870-5-104, common outside North America, shares the same underlying posture.

So the network carries the security burden:

  • Segment ruthlessly. SCADA masters, front-end processors, and field networks live in their own zones per the Purdue-model logic, and nothing in the enterprise can reach an outstation directly.
  • Filter with protocol awareness. Modern OT firewalls can enforce “these three masters may read; only this one may write; function codes 0x05/0x06/0x10 are blocked from everywhere else.” That converts an unauthenticated protocol into one with network-enforced authorization. Deploy in alert-only mode first — you will discover undocumented masters.
  • Encrypt in transit across anything you don’t own. IPsec tunnels or TLS wrappers on WAN links; bump-in-the-wire encryptors where endpoints cannot do it themselves. The goal is protecting the WAN path — the last serial hop inside a locked RTU cabinet is a different risk conversation.

Redundancy without common-mode failure

The standard pattern is dual SCADA servers (hot-standby with database replication and arbitrated failover), dual front-end processors, and dual communication paths per critical site. The pattern is well understood; the failures come from correlated dependencies that the diagram hides:

  • Both “diverse” WAN circuits riding the same last-mile fiber or the same carrier’s regional infrastructure. Ask the carrier for the actual path, in writing.
  • Primary and standby servers on the same UPS, same hypervisor cluster, or same broadcast domain, so one failure or one storm takes both.
  • Cellular backup for dozens of sites all homed to one carrier, discovered during that carrier’s national outage.
  • Failover logic that has never run. Test it on schedule, during the day, with operations informed — an untested failover path is a rumor, not a design.

Between control centers, ICCP (IEC 60870-6/TASE.2) links get the same treatment: redundant associations across diverse paths, and firewall rules scoped to the specific bilateral tables, not the whole subnet.

Time synchronization is a design element, not a checkbox

Sequence-of-events analysis after a trip or a relay operation depends on device timestamps agreeing. Substation SOE recorders commonly resolve to a millisecond; reconstructing which breaker opened first across two sites requires their clocks to agree better than that. Design the time architecture explicitly: GPS-disciplined clocks at major sites distributing NTP (and IRIG-B or PTP where protection-grade accuracy is required), a clear stratum hierarchy for everything else, and holdover behavior you have actually checked. Treat GPS spoofing/jamming as a real, if secondary, concern for exposed sites — dual-constellation receivers and sanity alarms on time jumps are cheap. And monitor drift: a silently free-running clock corrupts your event history for months before anyone notices.

Remote sites: the part attackers can reach

Remote outstations are where SCADA design meets the public internet, and the record here is grim — search engines like Shodan routinely index HMIs and PLCs answering on public IPs, which is how several publicized water and energy incidents began. The rules I hold the line on:

  1. No outstation is internet-addressable. Ever. Cellular connectivity goes through a private APN or at minimum carrier NAT plus IPsec back to the control center — the SIM should have no public route.
  2. Site-to-center tunnels terminate in a SCADA edge zone, not on the SCADA master itself, and the VPN architecture uses certificates or strong PSK hygiene, not one shared key for fifty sites installed in 2011.
  3. Local fallback logic lives at the site. The RTU or local controller must hold the process safe through an extended comms outage. This is a controls-engineering requirement, but the network designer owns making the failure modes (link down, link flapping, link degraded) clean and distinguishable.
  4. Maintenance ports are part of the design. The dial-up modem or vendor cellular dongle wired to the RTU’s serial port bypasses all of the above. Find them during site surveys and bring that access under the remote access architecture or remove it.

What to write down

The poll schedule and worst-case scan times per link; the authorized master/outstation matrix the firewalls enforce; both physical paths per site with carrier confirmations; the time-source hierarchy and holdover specs; and the per-site inventory of every communications device, including the embarrassing ones. Six months from now, a slow-scan complaint, a failed failover, or an incident timeline will make one of these documents the most valuable file you own.

The wire outlives the fashion

SCADA networks age differently than the systems above them. The protocols in this article were designed when the attacker did not exist and the modem was modern, and they will still be polling outstations long after today’s security products have been renamed twice and acquired once. That is the constraint the design has to respect: you are not building for the next refresh cycle, you are building a communications layer that has to stay boring for decades. Get the fundamentals right — bandwidth honesty, network-enforced authorization, redundancy without shared fate, time you can trust — and the network does what good infrastructure always does.

It disappears into the plant.

Frequently asked questions

What is the difference between polling and report-by-exception?
In a polled architecture the master interrogates every outstation on a fixed cycle and devices speak only when asked — simple, deterministic, bandwidth-hungry. Report-by-exception, native to DNP3, lets outstations buffer events locally and report changes when polled for event classes or via unsolicited responses, which cuts bandwidth dramatically and preserves event history across communication outages.
Is Modbus secure?
No. Classic Modbus, serial or TCP, has no authentication, no authorization, and no encryption — any device that can reach TCP port 502 can read and write registers. Security must come from the network: strict segmentation, protocol-aware firewalls that restrict function codes and sources, and encrypted transport over untrusted links.
Does DNP3 have security built in?
Partially. DNP3 Secure Authentication (from IEEE 1815) adds challenge-response authentication for critical operations, and more recent work layers on encryption. In practice deployment is limited because both master and every outstation must support it, keys must be managed, and mixed fleets are the norm — so network-layer controls still carry most of the load.
How should remote SCADA sites be connected?
Over private transport where possible — MPLS, private cellular APN, licensed radio — with IPsec on anything that crosses shared infrastructure, and never with outstations directly addressable from the internet. Design each site with a primary and a diverse backup path, and verify the paths do not share a last mile or a single carrier failure domain.

References

Related reading