Infrastructure · Pillar Guide
Cloud architecture: decisions that matter more than the venue
Cloud architecture as engineering, not ideology: landing zones, identity, network, data gravity, and egress economics — the decisions that actually matter.
Executive summary
Cloud architecture is the discipline of deciding where workloads run, how they are governed, how they connect, and where data lives — treating cloud as a deployment model to be engineered, not an objective to be reached. The decisions that determine long-term cost and operability are made early and quietly: the landing zone, the identity plane, the network design, and the placement of data whose gravity and egress economics will constrain everything after it. This page maps those decisions, takes an honest look at hybrid and multi-cloud, names the two failure modes — cloud-by-default and on-prem-by-inertia — and lays out the reading path through the deeper articles.
Cloud architecture is the discipline of deciding where workloads run, under what identity and governance structure, connected by what network, with data placed where its gravity and egress economics are acceptable. It treats cloud as a deployment model to be engineered — not an objective to be reached.
That second sentence is the position this entire topic cluster is built on, so I will say it plainly: “we are moving to the cloud” is not an architecture. It is a venue change. The architecture is everything the sentence leaves out — and the organizations that skip those decisions pay for them for a decade, in whichever direction they skipped.
I have run workloads on owned hardware for twelve years of hosting-company operations, operated US ITAR cloud environments where jurisdiction constrained every design choice, and led cloud-scale accounts in enterprise consulting. The pattern across all of it is consistent: the venue matters far less than the half-dozen decisions this page maps.
Cloud is a deployment model, not an objective
NIST defined cloud computing in SP 800-145 as a model for on-demand, elastic, measured resource access. A model. The definition contains no claim that the model fits every workload, and no moral ranking of venues — those were added later, by people selling something or by people afraid of appearing behind.
Treating a deployment model as a destination produces a recognizable failure signature: migration count as the KPI instead of cost, reliability, or delivery speed; lift-and-shift of workloads whose shape never suited elasticity; surprise egress bills; and, several years in, a quiet repatriation project nobody wants to present to the board. The inverse ideology fails just as reliably — datacenters kept alive by sunk-cost reasoning, refresh cycles rubber-stamped because “our data can’t leave,” a hiring pool that thins every year.
Both failures share a root cause. The organization argued about the venue and skipped the architecture.
The venue question itself — which workloads belong where, decided by egress economics, compliance, latency, workload shape, and staffing — has its own deep article: the cloud vs on-prem decision framework. This page assumes you will run that decision honestly, and concentrates on what has to be engineered regardless of its outcome.
The decisions that actually matter
Four decision areas determine most of a cloud environment’s long-term cost and operability. All four are made early, usually implicitly, and all four are expensive to reverse.
The landing zone
A landing zone is the foundation workloads deploy into: account and subscription structure, the identity baseline, network topology, centralized logging, and policy guardrails. It is the least glamorous artifact in cloud architecture and the most consequential, because every workload inherits its assumptions and almost nothing about it can be retrofitted cheaply.
Environments without one converge on the same state: dozens of accounts with drifting configurations, IAM nobody can audit, and shadow networks stitched together by peering requests. Environments with a good one make the right thing the easy thing — new workloads land governed, connected, and observable by default. Good infrastructure disappears; a landing zone is how cloud infrastructure disappears. The hybrid cloud landing zone reference architecture documents a concrete design, including the on-prem connectivity most real enterprises need on day one.
Identity
Identity is the control plane of any cloud environment — there is no perimeter to fall back on, and every API call is an authenticated action. The decisions that matter: one identity source federated everywhere, human access separated from workload access, workload identity done with short-lived credentials rather than static keys, and privilege boundaries that assume a credential will eventually leak. Get this wrong early and every subsequent security effort becomes remediation.
The failure mode here is quiet. Nothing breaks when a build pipeline runs on a static access key with administrative scope; it just works, for years, until the key appears in a repository or a log aggregator and the environment’s entire blast radius belongs to whoever found it. Cloud identity mistakes rarely announce themselves at design time. They announce themselves in the incident report.
Network
Cloud networking looks free until it is neither free nor simple. Hub-and-spoke or flat, transit design, private connectivity back to remaining on-prem estates, DNS across environments, and where traffic inspection lives — these choices harden fast. The enterprise network design fundamentals apply fully here; the cloud did not repeal them, it just re-billed them per gigabyte.
Data gravity and egress economics
Data placement is the most strategic decision on this list, because data is the one thing that does not move easily once it is large. Applications, analytics, and new projects accumulate around wherever the data already lives; egress pricing turns that gravity into a commercial mechanism. Every terabyte placed with a provider raises the price of ever choosing differently.
The practical rule I give teams: decide where your primary data stores live first, deliberately, and let compute placement follow. The reverse ordering — compute placed by preference, data dragged along behind — is how organizations wake up architecturally captive without ever having made the decision.
Hybrid and multi-cloud, honestly
Almost no established organization runs one venue. The honest question is not whether you will be hybrid but whether your hybrid will be deliberate.
Deliberate hybrid places each workload by constraint, connects venues with a properly designed network, and runs one operating model — one identity plane, one deployment pipeline, one observability strategy — across all of it. Done this way, hybrid is simply architecture. Multi-site design disciplines apply directly, and the resilient multi-site infrastructure whitepaper covers the failure-domain and recovery design that spans venues.
Accidental hybrid is what entropy produces: workloads placed by whichever team moved first, three sets of tooling, and a network that grew by exception. It delivers the costs of every venue and the benefits of none.
Multi-cloud deserves particular honesty because it is sold as insurance. As a portability strategy — “we can move workloads between providers” — it almost never pays: you build to the lowest common denominator, forgo the managed services that justified cloud in the first place, and buy an option you will exercise approximately never. As a placement strategy — different workloads on different providers, each for a reason, with no pretense of interchange — it is unremarkable and fine. The distinction is whether anyone claims workloads will move. When a vendor or a slide claims it, ask to see the tested migration. There usually isn’t one.
The operating model underneath
The most reliable predictor of cloud outcomes I have observed is not the provider, the region, or the service catalog. It is whether the operating model changed with the venue.
Cloud priced for datacenter habits is the worst of both worlds: peak-capacity thinking billed by the hour, hand-built environments that drift, tickets flowing to a central team that has become an API with a queue. The model that works treats the environment as a software artifact — infrastructure defined as code, changes through review and pipeline, environments reproducible from the repository. That discipline, and the ownership structure around it, is the subject of the infrastructure-as-code operating model.
Observability is the other half. Elastic infrastructure that scales itself faster than you can inspect it demands telemetry designed in from the landing zone onward — observability stack design covers the architecture. And for organizations carrying an existing estate toward any of this, sequencing the change without breaking production is its own discipline, covered in infrastructure modernization.
Anyone can deploy to the cloud. Operating it well for five years is the engineering.
Where to go deeper
The cluster reads best in this order:
- Cloud vs on-prem: an honest decision framework — the placement decision itself: TCO math, the five constraints, and why deliberate hybrid is the defensible default.
- Hybrid cloud landing zone reference architecture — the foundation design: accounts, identity baseline, network topology, and on-prem connectivity.
- The infrastructure-as-code operating model — the operating discipline that makes any venue governable.
- Infrastructure modernization — sequencing change across an existing estate without betting production on a big bang.
- Resilient multi-site infrastructure — the whitepaper on failure domains, replication, and recovery across sites and venues.
The principle underneath
Strip away the branding and cloud architecture is just architecture: failure domains, trust boundaries, data placement, and operating discipline, applied to rented infrastructure instead of owned. The venue changes the billing model and some of the physics. It does not change the engineering questions, and it has never once answered them for you.
Providers will keep launching services. Fashion will keep swinging between all-in and repatriation. The organizations that do well through all of it are the ones that put the workload where the constraints point, built the foundation before the workloads, and changed how they operate — not just where.
The cloud was never the decision. It was the setting for a hundred of them.
Frequently asked questions
- What is cloud architecture in one sentence?
- Cloud architecture is the discipline of deciding where each workload runs, under what identity and governance structure, connected by what network, with data placed where its gravity and egress costs are acceptable — engineering cloud as one deployment model among several rather than treating adoption itself as the goal.
- What is a landing zone and why does it matter so much?
- A landing zone is the pre-built foundation workloads deploy into: account or subscription structure, identity and access baseline, network topology, logging, and policy guardrails. It matters because it is the part of cloud architecture that is nearly impossible to retrofit — every workload inherits its assumptions. Environments that skip it accumulate accounts the way old datacenters accumulated cabling, and unwinding that costs years.
- Is multi-cloud a good strategy?
- As a portability strategy, rarely — building to the lowest common denominator of two providers costs more than the vendor risk it hedges against. As a reality, almost universally: acquisitions, SaaS choices, and team preferences produce multiple clouds whether you plan them or not. The defensible posture is deliberate placement — each workload where its constraints point — with identity, networking, and observability designed to span, and no pretense that workloads will float freely between providers.
- Why do cloud migrations end up costing more than projected?
- Usually three compounding reasons: workloads were lifted unchanged, so they rent peak capacity around the clock the way they owned it; egress and cross-availability-zone traffic were absent from the business case; and the operating model never changed, so the organization pays cloud prices for datacenter practices. The projection assumed the platform would do the engineering. It never does.
- Should new applications always be built cloud-first?
- It is a reasonable default for new, uncertain workloads — elasticity is worth the most when demand is unknown. But it is a default, not a rule. Steady high-utilization compute, latency-pinned systems, large data sets with heavy egress, and regulated data with residency constraints can all point the other way even for greenfield builds. The discipline is running the placement decision honestly instead of inheriting it from fashion.