Cybersecurity
Backup Security: Protecting the Last Line From Attackers
Backup security design for the era when attackers hunt backups first: isolation architectures, immutability options, air gaps, and restore testing that counts.
Executive summary
Backup security is the practice of protecting backup infrastructure and backup data from the same adversary who compromises production, because modern attackers target backups first to remove the victim's alternative to paying. This article covers why backup systems are a priority target, the isolation architectures that keep a domain compromise from reaching the repositories, the real differences between immutability options and air gaps, and why the only backup metric that means anything is a timed, verified restore. By the end you will be able to evaluate whether your backup platform would survive an attacker holding your highest-privilege credentials.
Backups changed sides
For most of my career, backups were an availability concern: hardware dies, people delete things, auditors ask questions. That era is over.
In a modern intrusion the backup platform is a primary target. Operators locate the backup console during reconnaissance, destroy or encrypt the repositories, and only then launch encryption of production. Read enough incident reports and the sequence is almost monotonous. The backups are attacked first precisely because they are the victim’s alternative to paying.
That inverts the design question. It is no longer “can we restore after a disk failure?” but “can an attacker who owns our directory, our hypervisors, and our admin workstations still not touch the last copy?” Most backup deployments I review fail that question in the first ten minutes, usually because the backup server is a domain-joined VM on the same cluster it protects, managed over RDP from the same workstations the attacker landed on.
How backup platforms actually get destroyed
The kill patterns are consistent and worth designing against explicitly:
- Credential reuse. The backup console accepts the same AD credentials the attacker already harvested. Game over before any exploit is needed.
- Shared platform fate. The backup server is a VM; the attacker encrypts at the hypervisor layer, as in the ESXiArgs campaign documented in CISA advisory AA23-039A, and the backup server’s disks are encrypted alongside everything else.
- Reachable repositories. Repositories exposed as SMB/NFS shares are encrypted directly, no console access required.
- API-driven deletion. Cloud backup buckets deleted with harvested access keys, retention policies shortened, then data purged.
- Quiet corruption. The patient version: backups sabotaged or encryption keys rotated weeks before detonation, so the “good” restore points aren’t.
Every one of these is an architecture failure, not a product failure.
No backup vendor’s feature list fixes a design in which the backup platform trusts the identity system it exists to survive. That is a decision only an architect can make — which is the recurring lesson of this entire subject: security is architecture, not a product category.
Isolation architecture
The design goal is a separate failure domain: identity, network, platform, and administration all independent from production.
Production domain Backup enclave (own identity)
┌────────────────────┐ ┌──────────────────────────┐
│ Workloads │ pull only │ Backup server (no AD) │
│ Hypervisors │ ◄──────────── │ Primary repository │
│ AD / IdP │ no inbound │ │ replicate │
└────────────────────┘ mgmt path │ ▼ │
│ Immutable copy (locked) │
└──────────┬───────────────┘
▼
Offline / offsite copy
The rules that make the diagram real:
- Separate identity. Backup infrastructure lives in its own directory or is standalone. Its credentials are unique, vaulted outside the production password manager, and the console requires MFA. This is the same tier-crossing discipline described in identity-first security, applied to the one system whose compromise ends the game.
- Separate network. The repository network accepts backup traffic and nothing else — a dedicated enclave inside the restricted zone of your segmentation model. No RDP or SSH from the user LAN. Management happens from a dedicated workstation or bastion.
- Pull, not push, for the isolated copy. The secondary repository authenticates to the primary and pulls data. Production-side credentials have no route or rights to the isolated copy, so harvesting them accomplishes nothing.
- Physical or platform separation. At least one repository must not run on the hypervisor estate it protects. A physical server with local disks is unfashionable and extremely effective.
Immutability and air gaps: what you are actually buying
The market says “immutable” about very different mechanisms. The differentiating question is always: who can turn it off?
| Mechanism | Restore speed | Revocable by compromised admin? | Real-world caveat |
|---|---|---|---|
| S3/object lock, compliance mode | Fast | No, until retention expires | Governance mode is revocable — check the mode, not the brochure |
| Hardened Linux repository | Fast | Not through the backup app | Only as strong as the OS hardening and access discipline |
| Storage array snapshots with lock | Fastest | Frequently yes, via array console | Array admin compromise defeats it |
| WORM tape | Slow | No | Requires working drives and practiced handling |
| True offline air gap (removable media, powered-off target) | Slowest | No | Human procedure is the weak point — automate the schedule, audit the logs |
Two sizing notes from experience:
- Retention must exceed dwell time. Attackers commonly wait weeks between initial access and encryption. A 7-day immutability window protects against an impatient attacker; 30 days is a more honest floor for the copy of last resort.
- Immutability is not integrity. A locked copy of backups that were already corrupted is a locked copy of garbage. Which is why the entire program reduces to the next section.
Restore testing is the only real metric
Backup success rate is a vanity metric.
Jobs can complete at 100% for a year while the restore you eventually need fails on the first attempt — wrong dependency order, missing encryption key, undersized restore network, application that starts but won’t take traffic. The metric that matters is: time from decision-to-restore until the service takes production load, measured by drill. A measured number is what security that reduces uncertainty actually looks like — not another dashboard, but a figure you have timed and can defend in front of the board.
A testing program worth the name has three layers:
- Automated verification, every job. Checksums plus automated boot/mount verification where the platform supports it. This is the “0” in the 3-2-1-1-0 rule covered in ransomware-resilient architecture.
- Application-level restore drills, quarterly for the critical tier. Restore the whole stack — database, app, dependencies — into an isolated network and prove the application actually works. Time every step.
- Full-scenario exercise, at least annually. Assume production identity is hostile and primary storage is gone. Restore from the immutable copy into clean infrastructure, including standing up temporary identity and DNS. This is the drill that finds the circular dependency nobody modeled — classically, the backup console that authenticates against the directory you are trying to restore.
Publish the measured times as the official RPO/RTO. If the business finds them unacceptable, that is a budget conversation, not a documentation adjustment.
Implementation notes
- Encrypt backup data in flight and at rest, and store the encryption keys outside the backup platform — in an HSM or a separate vault. Attackers who cannot delete backups will settle for stealing them.
- Alert on the events that precede backup destruction: retention policy changes, repository deletions, immutability setting modifications, new administrator accounts on the backup platform. These deserve the same page-someone-now priority as a domain controller alert.
- Include OT and edge environments in scope. In plants I’ve supported, the historian and engineering workstation backups were an afterthought — yet they are the systems with the longest rebuild times. The industrial systems side of the house needs this architecture at least as much as IT does.
- Document the restore runbook so that someone other than its author can execute it at 3 a.m. — the realistic condition under which it will run, as anyone who has worked a real incident will confirm.
The uncomfortable summary: a backup you have not restored is a rumor, and a backup an attacker can delete is a hostage.
Design for recovery, not perfection. Prevention will eventually fail somewhere — the architecture question is whether that failure becomes an outage with a measured end or a negotiation with a criminal. Backup products will keep changing. The principle that the last copy must live in a different failure domain than the threat will not.
Frequently asked questions
- Why do ransomware attackers target backups first?
- Because working backups eliminate their leverage. If the victim can restore, the ransom demand collapses to a data-extortion threat. Intrusion reports consistently show operators locating backup consoles, deleting or encrypting repositories, and only then triggering encryption of production. A backup platform reachable with production credentials is therefore part of the attack surface, not a safety net.
- What is the difference between an immutable backup and an air-gapped backup?
- An immutable backup is online but write-locked: the storage platform refuses modification or deletion until a retention period expires. An air-gapped backup is offline or network-isolated, unreachable without a physical or procedural step. Immutability preserves fast restores; air gaps provide stronger isolation at the cost of recovery speed. Mature designs use both for different copies.
- How often should restores be tested?
- Automated integrity verification should run on every backup job, and a meaningful restore drill — a full application stack, timed, to isolated infrastructure — should run at least twice a year, plus after any major platform change. If the last full restore test is more than six months old, you have a backup system, not a recovery capability.
- Should backup servers be joined to Active Directory?
- No. Domain-joining the backup platform to the directory it protects means a domain compromise is automatically a backup compromise. Run backup infrastructure in a separate identity domain or standalone, with its own credentials stored outside the production password vault, MFA on the console, and no inbound management access from the user network.