Skip to content
PAVEL GLUKHIKH
Menu

Networking

BGP for enterprises: when you need it and how to run it

A practitioner's guide to BGP for enterprises: when multihoming justifies it, route filtering per RFC 7454 and MANRS, communities, and the classic mistakes.

6 min read

Executive summary

BGP is the routing protocol that connects independent networks on the internet, and an enterprise needs to run it in exactly one situation: when it holds its own address space and wants to control how the internet reaches that space across multiple providers. This article covers when multihoming genuinely justifies BGP, the filtering hygiene that RFC 7454 and MANRS codify, how to use communities to steer traffic without renumbering, and the mistakes — becoming accidental transit, missing max-prefix, ignoring RPKI — that I see most often when auditing enterprise edges.

The honest threshold for running BGP

An enterprise needs BGP when three things are simultaneously true: you hold provider-independent (PI) address space, you connect to two or more providers, and a provider failure must not take your public reachability down with it. Remove any one of those and BGP is optional at best.

Everything short of that is preference dressed up as requirement.

I say this as someone who runs BGP for a living — Nubinity is a connectivity provider, and eBGP sessions are how we exist on the internet. Precisely because I operate it daily, I talk more enterprises out of BGP than into it. If you are single-homed, BGP gains you nothing: your provider is your only path regardless of what protocol you speak to them. If you are dual-homed behind NAT with tolerable failover via DNS, SD-WAN, or a cloud front door, BGP is complexity without a payoff. The protocol earns its keep the day you have addresses that must remain reachable when an entire provider — not a link, a provider — fails.

What crossing the threshold costs: an ASN and PI space from your RIR (with annual fees), routers sized for the table you take, and an ongoing operational duty of care. A BGP speaker is a participant in the global routing system. Your filtering mistakes become other people’s outages, which is why the hygiene section below is the longest part of this article.

Multihoming that actually fails over

The standard enterprise topology is two edge routers, each with an eBGP session to a different provider, iBGP between them, announcing one or a few aggregate prefixes.

Outbound (your traffic to the internet) is easy: take a default route or partial routes from each provider, prefer one with local-preference, and let the other carry the load if the first path vanishes. Full tables are justified only when you make genuine per-prefix egress decisions — latency- sensitive paths to specific SaaS, cost-based steering — and your hardware’s FIB can hold well over a million IPv4 routes with growth headroom.

Inbound (the internet’s traffic to you) is where multihoming gets real, because you can only influence, never command, other networks’ choices. Your levers, weakest to strongest:

  1. AS-path prepending on the backup link. Blunt, sometimes ignored by networks that prefer customer routes at maximum local-pref anyway.
  2. Provider action communities — the precision tool (next section).
  3. Selective announcement: advertise more-specific prefixes on the preferred path and only the aggregate on backup. Effective, but you are contributing to global table growth; use it sparingly and never announce anything longer than a /24 (IPv4) expecting it to propagate.

Test the failover by actually shutting a session during a maintenance window. Convergence that has never been exercised is a hypothesis, not a design — the same principle as any redundancy model: you have not built failover until you have watched it happen.

Communities: traffic engineering without renumbering

Communities are 32-bit (or larger, with large communities) tags riding on route announcements. Two uses matter at the enterprise edge.

Provider action communities. Every serious transit provider publishes a list: tag your announcement with X:80 and they set local-pref low; tag X:1000 and they prepend once toward a specific peer group; tag with a regional value and they suppress the announcement in that geography. This is how you engineer inbound traffic surgically. Read your providers’ community documentation before you buy transit — the quality of that page is a decent proxy for the quality of their NOC.

Internal tagging. Tag routes at ingress by origin (customer, peer, transit, internal) and build egress policy by matching tags instead of maintaining prefix lists in two places. Even a two-router enterprise edge benefits: deny anything without your own origin tag from being announced, and route leaks become structurally difficult.

route-map TRANSIT-OUT permit 10
 match community OUR-PREFIXES
route-map TRANSIT-OUT deny 20

Filtering hygiene: RFC 7454 and MANRS as a checklist

RFC 7454 (BGP Operations and Security) and the MANRS program exist because the default behavior of BGP is promiscuous trust. Treat these as the minimum bar, not aspiration. Note that RFC 8212 made “announce nothing without an explicit policy” the standard default for eBGP — modern IOS XR, JunOS, and FRR behave this way, but classic IOS/IOS-XE does not, so build the policy regardless.

Outbound: announce only what is yours.

ip prefix-list OUR-AGGREGATES seq 5 permit 203.0.113.0/24
!
route-map TO-PROVIDER permit 10
 match ip address prefix-list OUR-AGGREGATES
route-map TO-PROVIDER deny 20
!
router bgp 64512
 neighbor 192.0.2.1 remote-as 64500
 neighbor 192.0.2.1 route-map TO-PROVIDER out

The single most damaging enterprise BGP mistake is announcing routes learned from provider A to provider B. You have just offered yourself as free transit between two carriers, and if either accepts it, some fraction of the internet’s traffic now flows through your edge routers until they fall over. An explicit outbound prefix-list makes this impossible; relying on “we would never redistribute” does not.

Inbound: filter the garbage.

  • Drop bogons: RFC 1918 space, default, your own prefixes (hearing your own space from a provider is an attack or a leak — either way, drop it).
  • Drop prefixes longer than /24 (IPv4) and /48 (IPv6).
  • Set maximum-prefix on every session — a provider-facing session that suddenly offers 1M routes into a router sized for 10k is how edges melt.
neighbor 192.0.2.1 maximum-prefix 1000000 90 restart 30

Session protection. MD5/TCP-AO on eBGP sessions, GTSM (ttl-security hops 1) so remote attackers cannot spoof session packets, and control-plane policing so a scanning flood cannot starve the BGP process.

RPKI, both directions. Create ROAs for your prefixes in your RIR portal — it takes an afternoon and means networks doing origin validation will drop hijacks of your space. If your platform supports it, validate inbound and reject invalids; if not, prefer upstreams that do it for you. MANRS participation is essentially this list formalized, and asking a prospective provider whether they are MANRS-conformant is a useful filter in itself.

The mistake catalog

Every one of these is something I have found in real enterprise edges:

MistakeConsequencePrevention
No outbound prefix filterAccidental transit; potential global route leakExplicit prefix-list out, deny-by-default
No maximum-prefix inboundFull-table dump crashes or wedges the edge routerMax-prefix with warning threshold on every session
Private ASN (64512–65534, RFC 6996) leaked in AS-pathAnnouncements rejected or unreachable from parts of the internetremove-private-as toward providers; get a real ASN
Prepending with a private or wrong ASNPath treated as bogus by strict validatorsPrepend only your own ASN
No ROAs publishedYour space is hijackable with no automatic remediationPublish ROAs; alarm on RPKI status changes
iBGP without next-hop-selfRecursion failures that appear only when one eBGP session dropsnext-hop-self on iBGP toward interior peers
Timers left at 60/180 defaults with no BFDUp to 3 minutes of blackholing on a silent path failureBFD where supported; tuned holdtimers where not

The flap-diagnosis side of this — what to check when a session won’t stay up — is its own topic, covered in diagnosing BGP session flaps.

What to write down

Your BGP edge should be reconstructible from documentation alone: ASN, prefixes and their ROA status, every session with its policies and max-prefix values, the provider community values you rely on, and the intended traffic distribution in both directions. When a session flaps at 2 a.m., the on-call engineer needs to know what “normal” looks like — how much of your inbound usually arrives via provider A — or they cannot tell recovery from a new failure. That artifact set belongs in the same living documentation system as the rest of the network, which is exactly the discipline covered in network documentation that works.

BGP is unusual among enterprise technologies in that your mistakes are billed to strangers. The global table works because tens of thousands of independent operators filter what they announce and question what they hear, and joining it makes that duty of care yours. Router platforms and best-path tweaks will keep changing. The obligation that comes with speaking BGP has not changed since the first route leak, and it will outlast whatever replaces your edge hardware.

Frequently asked questions

When does an enterprise actually need BGP?
When you have provider-independent address space and at least two upstream providers, and you need reachability to survive a provider failure without renumbering or waiting on DNS. If you are single-homed, or dual-homed with NAT and can tolerate failover via DNS or SD-WAN, you almost certainly do not need to run BGP with your providers.
Do I need a full BGP table from my providers?
Rarely. A default route plus your providers' customer routes covers most enterprise needs and runs on far cheaper hardware. Take full tables only when you genuinely make per-prefix egress decisions — and remember the global table exceeds a million IPv4 routes, which dictates your router's TCAM/FIB sizing.
What is RPKI and should an enterprise care?
RPKI lets address holders cryptographically state which AS may originate their prefixes, as Route Origin Authorizations. Enterprises should do two things: create ROAs for their own prefixes so hijacks of their space get dropped by validating networks, and prefer providers who reject RPKI-invalid routes. Both are low-effort and materially reduce hijack exposure.
What are BGP communities used for?
Communities are tags attached to routes that carry instructions or metadata. Enterprises mostly use provider action communities — published values that tell the provider to lower local preference, prepend toward specific peers, or suppress announcement regionally. They let you engineer inbound traffic with far more precision than blunt AS-path prepending.

References

Related reading