BGP Aggregate Addresses

This month I am studying in preparation for my CCIP BGP+MPLS exam (booked June 27th). I decided to go through with the CCIP certification, despite my annoyance with the new Service Provider track because the BGP, MPLS and QoS topics are covered in length on the CCIE Routing & Switching blueprints. I figure this is a good bridge to fill in the gaps that CCNP R&S leaves out (in particular MPLS and QoS, which isn’t covered at all in any of the CCNP blueprints).

As such, I’ve been able to dive into all the knobs and switches that BGP offers to control routing policy. For those who have gone through the newer CCNP R&S track, the BGP fundamentals are explained and covered enough to get engineers familiarized with its operations. There’s a lot of depth lacking in CCNP and for good reason…BGP can be a career in and of itself. In service provider environments, when you’re pulling half a million IPv4 routes from upstream peers and providing L3VPN services to your customers via MPLS, you need a protocol like BGP that can scale.

Route summarization, when half a million routes are available on the global Internet table, can help keep specific and unnecessary routes from propagating out to upstream providers and thus alleviate memory and CPU required for carrying these thousands of routes. To summarize a set of routes in BGP, you have a few options:

  • Manual static Null0 routes advertised in BGP
  • aggregate-address command

Let’s look at a scenario. This is taken out of a BGP topology I’ve been working on this week to help me gain a better understanding of some of the more advanced BGP topics.

Subnets*:

  • 100.100.255.0/24 for all CE-facing Point-to-Point links
  • 100.100.254.0/24 for BGP update souce loopbacks
  • 100.100.253.0/24 for all inter-AS Point-to-Point links
  • 100.100.200.0/24 allocated to Enterprise A from this ISP
  • 100.100.0.0/16 allocated to ISP from registry

*Note: this is my best guess of how an ISP would assign addressing in its network. Being an enterprise guy, I’ve yet to be exposed to any service provider network. For those with more experience, any corrections on this please let me know in the comments below 🙂

In this topology, we have one route reflector “RR” with IBGP running between RR and all the PE routers (just PE1 and PE2 in this case).
We want to aggregate all of the ISP’s routes to advertise upstream to Upstream SP at AS 200.

Below is the BGP RIB on our route reflector before any aggregation. These are all the routes advertised by the PE routers as well as any allocations given to customers who require more than a single address.

RR#sh ip bgp
BGP table version is 31, local router ID is 100.100.254.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
* i100.100.200.0/24 100.100.254.3            0    100      0 65501 i
*>i                 100.100.254.1            0    100      0 65501 i
*>i100.100.255.0/31 100.100.254.1            0    100      0 i
*>i100.100.255.8/31 100.100.254.3            0    100      0 i
!
!
AS200#sh ip bgp
BGP table version is 37, local router ID is 100.100.253.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.200.0/24 100.100.253.1                          0 400 i
*> 100.100.255.0/31 100.100.253.1                          0 400 i
*> 100.100.255.8/31 100.100.253.1                          0 400 i

As you can see, in a huge service provider network, the BGP RIB would be filled with any public IP addresses used to connect its customers to the outside world, as well as any allocations given by this ISP to its larger customers (such as Enterprise A in this case, which is dual homed at PE1 and PE2). Also included is the BGP RIB of the upstream AS 200 router, who receives these specific prefixes from AS 400.

Now let’s reduce the routing table by aggregating them into a summarized route. First, we’ll start by adding in a static route to the Null0 interface and advertise it in BGP:

! On RR:
conf t
 ip route 100.100.0.0 255.255.0.0 Null0
!
router bgp 400
 network 100.100.0.0 mask 255.255.0.0
!
RR#sh ip ro static
100.0.0.0/8 is variably subnetted, 11 subnets, 4 masks
  S 100.100.0.0/16 is directly connected, Null0
RR#sh ip bgp
BGP table version is 25, local router ID is 100.100.254.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   0.0.0.0                            32768 i
*>i100.100.255.0/31 100.100.254.1            0    100      0 i
*>i100.100.255.8/31 100.100.254.3            0    100      0 i

And to verify on the upstream AS:

AS200#sh ip bgp
BGP table version is 31, local router ID is 100.100.253.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   100.100.253.1            0             0 400 i
*> 100.100.200.0/24 100.100.253.1                          0 400 i
*> 100.100.255.0/31 100.100.253.1                          0 400 i
*> 100.100.255.8/31 100.100.253.1                          0 400 i

Since we’re still advertising the more specific routes inside the ISP AS 400, manual filtering will be required on the router reflector. This can be accomplished by a simple prefix list or route-map on RR.

! on RR
ip prefix-list OurAlloc permit 100.100.0.0/16 
!
! Match only our allocated address space
!
router bgp 400
 neighbor 100.100.253.0 prefix-list OurAlloc out

The problem with this approach is that, while it is fairly simple, does require you to manually filter any more-specific routes on the edge of your AS. Also, if you are serving multihomed customers with their own address allocation (independent of this ISP’s allocation), you will have to take those into account as well in your filtering.

The other way to aggregate a set of routes in BGP is through the aggregate-address command. This command not only creates a Null0 route automatically but also suppresses more-specific routes from the BGP RIB. Using only the aggregate-address summarization, upstream peers will only receive the aggregated route and not the individual more-specific prefixes.

! on RR
router bgp 400
 aggregate-address 100.100.0.0 255.255.0.0 summary-only
!
RR#sh ip bgp
BGP table version is 31, local router ID is 100.100.254.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   0.0.0.0                            32768 i
s i100.100.200.0/24 100.100.254.3            0    100      0 65501 i
s>i                 100.100.254.1            0    100      0 65501 i
s>i100.100.255.0/31 100.100.254.1            0    100      0 i
s>i100.100.255.8/31 100.100.254.3            0    100      0 i

AS200#sh ip bgp
BGP table version is 37, local router ID is 100.100.253.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   100.100.253.1            0             0 400 i

As you can see, after using the aggregate-address command on AS 400 RR, only our configured summarized address is advertised out to AS200. You can also see the suppressed routes in RR’s BGP RIB, since we used the “summary-only” parameter in the aggregate-address command. All more-specific routes are suppressed from being advertised to BGP peers thus reducing what used to be many routes to just what’s configured.

Aggregation, combined with proper filtering, should be performed wherever and whenever possible. As of today, CIDR Report indicates over 410,000 routes exist in the global table. With aggregation (as estimated by CIDR Report), as much as half of all the routes in existence today can be aggregated.

Cosmetic Bug: IS-IS Network Entity Title

cosmetic bug:

a software error condition that does not impact a system in any functional way; types of errors can include spelling mistakes, transient error messages, etc.

I thought I’d start a series of blog posts dedicated to what I call “cosmetic bugs” in terms of networking technology. What I mean by that is, things that we learn, see and do in networking without any reason as to the why, because it doesn’t impact a router, switch or protocol in anyway…Just that the why’s have somehow been lost in translation of the years.

One such case is related to the lovely link-state protocol IS-IS. IS-IS stands for “Intermediate System to Intermediate System” and was originally developed to facilitate routing between “intermediate systems” – synonymous with an IP router – over the OSI Connectionless Network Service (CLNS) protocol stack. It was later extended in RFC 1195 to support both OSI and TCP/IP networks (renamed to Integrated IS-IS or Dual IS-IS). Since the OSI protocol stack has been obsoleted by TCP/IP, IS-IS is typically used in service provider core networks due to its scalability and link-state properties.

Having taken CCNP BSCI in college and gone through ROUTE in my current profession, I’ve always been intrigued by the mystical awe that is the IS-IS protocol. Being a link-state routing protocol, IS-IS is similar to OSPF in that networks are learned through flooding of link-state information throughout a domain. However, since IS-IS originated from the ISO to work in tandem with the OSI protocol stack, certain “legacy” properties remain. As indicated in the title of this blog post, I just wanted to spend some time as to the “why” behind the Network Entity Title; also known as the IS-IS NET.

The NET is a configured identifier on IS-IS routers that defines a topology. It is a hexadecimal value and indicates both an area ID and a System ID.

An IS-IS NET is made up of Area ID and a System ID. The Area ID performs the same functions as it does with OSPF (with some key differences that I won’t go into in this blog post) and is topology-driven. The System ID performs the same functions as the Router ID does in OSPF. Unlike in OSPF, it does not have to be derived from an IP address nor requires an IP address to be configured on any interface to function. Also, unlike OSPF which sits at Layer 3 (ie. has an IP header below the OSPF header), IS-IS exists directly at Layer 2 (ie. IS-IS PDU header directly after Layer 2 header). To further compare the two, IS-IS NETs must be defined within a certain structure, whereas OSPF uses arbitary values for Area ID’s and Router ID’s. Some of the details I won’t go into just because it simply has nothing to do with the TCP/IP stack. If, like me, you’ve ever wondered why Cisco uses the same configuration example in all IS-IS documentation, hopefully I can shed some light on that. Let’s look at the structure of a NET to give us some more detail:

As indicated in the diagram above, the following rules must be followed when defining the NET:

  • AFI must be 1 byte
  • Area ID can be 0 to 12 bytes long
  • System ID must be 6 bytes long
  • SEL must be 1 byte

The reason for these “rules” is that a NET is a special version of an ISO network service access point (NSAP) address, familiar to anyone who has worked with ISO protocols.

The AFI, or the Authority & Format Identifier, holds no real value in a IP-only environment. In relation to ISO protocols, the AFI was used similarly to an OUI (Organizationally Unique Identifier) in a MAC address, which would have identified the assigning authority of the address. However, in an IP-only environment, this number has no meaning separate from the Area ID itself. Most vendors and operators tend to stay compliant with the defunct protocols by specifying an AFI of “49”. This is synonymous with RFC 1918 IP addresses – it is privately administered and not assigned to another one specific organization. While best practice, the AFI byte can be combined to format a single Area ID value and is left to the discretion of the network admin.

Area ID’s function just as they do in OSPF and are decimal-notated only.

System ID can be anything chosen by the administrator, similarly to an OSPF Router ID. However, best practice with NETs is to keep the configuration as simple as humanly possible. The System ID is typically derived from either the 48-bit MAC address of an interface (“0cad.83b4.03e9”) or an IP address such as configured on a loopback interface. When defining a System ID as derived from an IP address, you can use a few conversion methods since it must be 6 bytes in length and an IPv4 address is only 4 bytes long. One is to simply add enough zeros to fulfill the 6 byte requirement, which is the simplest. You can also convert an IP address to decimal or hexadecimal formats.

Loopback IP address of 10.255.255.200
NET System ID = 1025.5255.2000

The System ID is solely up to the administrator to choose and requires to be unique within a routing domain. MAC addresses are the easiest choice since MAC addresses are globally unique burned-in addresses and *should not* under normal circumstances be the same between different devices.

The final piece in a NET is the SEL byte, or the NSAP Selector byte. In ISO, this value is used to indicate an upper-layer function. Think of this as being similar to a TCP or UDP port number. In an IP-only network, where no upper-layer ISO protocols exist, an IP router will expect a SEL value of 0x00. This value should always be set to 0x00, which indicates the router itself is the “upper layer” protocol. The take away here is that the SEL is not relevant in an IP network and should be set to 00 to keep NET assignment simple.

*note: As pointed out by Marko Milivojevic on Twitter, a non-0 SEL value indicates a pseudonode. IS-IS on multiaccess networks elect a Designated Intermediate System (DIS). Think DR in OSPF. I’m leaving a lot of details out but just keep in mind that configuring a non-zero value for the SEL will throw you a syslog message since IOS will expect this to be configured as a 0. Non-zeros indicate pseudonodes, such as a DIS, which are “virtual nodes”. More on this later.

Below I’ll list some examples of NETs based on the above rules.

For NSAP format compliant NET, AFI of 49, Area ID of 0001, System ID of 0cad.83b4.03e9 (example MAC address) and a SEL of 00:


Router(config)#router isis
Router(config-router)#net 49.0001.0cad.83b4.03e9.00

Routers in different areas can simply use a different Area ID, no different than in OSPF. You just need to be sure the System ID is still unique, as shown below:


Router(config-router)#net 49.0002.0cad.83b4.03f0.00

For smaller networks with fewer areas, you can also define NETs according to this format:

this time using a loopback IP address of 172.31.255.254:
Router(config)#router isis
Router(config-router)#net 01.1723.1255.2540.00

An important note about NETs is that a router can only be part of ONE area. This is different than OSPF, which ABR’s will typically have at least one interface in area 0 and another interface in a standard or stub area. There are slight topology differences that account for this, which will be the topic of a future post.

The biggest thing to note when it comes to IS-IS NETs is to Keep-It-Simple-Stupid! Personally, I got hung up on the why a NET is always shown with an AFI value of 49. Details like this are just “cosmetic” – your IS-IS network will function just fine if you don’t follow ISO standards, since they’re really not relevant in an IP-only world. However, as you can see on Cisco’s website, best practices and simplicity are what determines what we’re told when learning the protocols. The “why” may not be important, but it’s still worth knowing a thing or two about it, even just to quell your own curiosity.

More on IS-IS in future post(s) – it’s worth knowing, being another tool in the Network Wizard’s tool belt.

EDIT: Thanks to Marko for his corrections and clarifications on some of the key terms and concepts. More posts in the future will be needed to explain IS-IS in more depth…stay tuned 😉