Junos public/private key SSH authentication

Hi Everyone,
Just a quick one today. I was reconfiguring my lab SRX for direct SSH access and in the interest of security, wanted to use RSA public/private keys for authentication. I did my usual key generation using puttygen (sorry guys, Windows user here), copied the OpenSSH authorized_keys public key string that Junos uses, applied it to the user of my choice and off I went…or so I thought. Here was my initial configuration:

[edit]
admin@LabSRX# show system login
user admin {
    uid 2002;
    class super-user;
    authentication {
        encrypted-password "<plaintext passwd hash>"; ## SECRET-DATA
        ssh-rsa "ssh-rsa <key data>"; ## SECRET-DATA
    }
}

Seems simple enough. However, when I went to login using the private key that I had just created for this public key pair, my SRX complained:

Using username "admin".
Authenticating with public key ""
Server refused public-key signature despite accepting key!

Huh? I could’ve sworn that pair was correct. I tried generating another pair, just to be sure but the SRX still didn’t want to accept it.

After fiddling with the SSH protocol version and other non-related parameters, I logged into one of my work’s lab SRX’s to see if anyone was using RSA there.

Lo and behold, I forgot the one part in key string needed to authenticate with it: appending the user name to the public key string:

admin@LabSRX# show system login
user admin {
    uid 2002;
    class super-user;
    authentication {
        encrypted-password ...
        ssh-rsa "ssh-rsa <key data> admin"; ## SECRET-DATA
    }
}
[edit system login user admin]
admin@LabSRX# commit
commit complete

After my commit, I was able to use my private key to authenticate to the SRX.

You can have puttygen append the username using the “Key comment” field:

I did some digging around but couldn’t find any mention of this in the Junos documentation. My guess is that OpenSSH includes the username when using ssh-keygen in Linux/Unix. Regardless, just something I’ll have to remember when doing this again.

Advertisements

Basics of a QFabric

Earlier this month, I attended Juniper’s Configuring & Monitoring QFabric Systems in preparation for our customers interested in QFabric for their data centers. Having listened to Packet Pusher Show 51 on Juniper QFabric, I thought I had known all there is to know to QFabric. Throughout the course, I quickly realized that while I did get the “gist” of what QFabric looks like and what problems it solves, there is a bit to know on getting the system up and running. I suggest all of those interested to listen to the Packet Pushers show to at least get the basic idea of what composes a QFabric. Below I’ll list each piece and its function:

  • QFabric Nodes: Comparing the system to a traditional chassis, the QFabric Nodes are the equivalent to line cards. These provide the ports to your external devices such as servers, storage and networking devices (routers, firewalls and load balancers, etc). They are high-density 10GbE (in the case of QFX3500) and 40GbE (QFX3600) switches that can be positioned where your traditional top-of-rack switches might be in the data center. QF Node switches can be implemented in brownfield deployments and can be run as standalone ToR switches, supporting all the classic switch features such as STP, LAG, etc., until an organization decides to go forward with a full QFabric deployment.
  • QFabric Interconnect: Back to our chassis analogy, the Interconnects act as a backplane for the system. It’s sole purpose is to forward packets from one Node to the next. This is high-speed transport to interconnect (hence the name) everything in the fabric.
  • QFabric Directors: Lastly, thinking to our chassis example, this is the Routing Engine (RE) or supervisor of the system. The Director is responsible for managing the QFabric by providing the CLI to the admins and also handles the control plane side of things such as building routing and forwarding tables, as well as managing the QFabric devices. All of the work done to configure and monitor a QFabric system is done on your Directors.
  • Out-of-Band Control Plane (EX4200 in Virtual Chassis’s)*: An out-of-band control plane network is required to connect all the Nodes, Interconnects and Directors. Note that this network is only used within the QFabric for control and management plane communication between all your QF pieces. It does not interact with your existing OOB management network. Juniper provides configuration of EX4200 switches that are to be used for this network so no configuration *should* be performed on these switches. This network serves as an out-of-band control plane network so that no configuration, management, or Layer 2/Layer 3 network control goes over the data path.
  • *Note: For simplicity’s sake, Juniper recommends customers to follow the port cabling as detailed in the following techpubs. All EX4200 control plane switch configurations follow this cabling and you will most likely run into support issues if you do not follow this. As always, YMMV. Connecting the QF Directors, connecting the QF Interconnects, and connecting the QF Nodes to the control plane switches. Keep in mind that Juniper offers two different deployments of QFabric, -G and -M. Cabling may vary depending on which deployment you choose!

    Now that you have the basics of what makes up a QFabric, let’s look at some of the finer details of the system.

    Director Group/Cluster

    For any QFabric deployment, at least two QF Directors are required. QF Directors are grouped into Director Groups or clusters, which can load-balance certain functions between the two. Configuration, topology information, device status and state information is synchronized between all QF Directors in a Director Group (DG). The DG also hosts a number of Routing Engines (RE), each with a specific purpose. For example, DG run a Fabric Manager RE, which provides routing and forwarding functions to QF devices such as topology discovery, internal IP address assignment and inter-fabric communication. Another RE running on the DG is used for the Layer 3 functions of the Network Node group (see below). All REs are virtualized under the hood, running off of a Juniper CentOS hypervisor, and are shared across individual directors in either an active/active or active/standby setup (depending on the function required for the RE). Most of this is very under-the-hood and does not require any direct interaction. The only parts that most operators will be interested in is the single point of management for the entire QFabric. Your DG provides the JUNOS CLI as well as DNS, DHCP, NFS, SNMP, syslog and all your other expected management pieces on traditional Juniper switches.

    Topology & Device Discovery

    Devices are discovered via internal routing processes on each QF device. The Fabric Manager RE on the Director Group, as well as QF Nodes and Interconnects, use what Juniper calls “system discovery protocol”. This protocol is essentially IS-IS extended for use with QFabric, with each device sending out IS-IS-style Hellos across the both the control plane EX4200 VC’s and the 40Gbps/100Gbps* data path to discover one another. The end result is that each node knows about every other node and all data paths can be used for ingress-to-egress through the fabric, similar to multipathing in Layer 3. On the control plane side of things, instead of using simple signaling on a backplane for each “line card” and RE, QFabric is one big TCP/IP LAN and communicates as such. While I’ll leave this blog post with this simplistic explanation of the under-the-hood workings, I suggest reading Ivan’s excellent post at ipspace.net of QFabric’s inner BGP/MPLS-like functions. The internal workings are a little obfuscated from current literature and unfortunately I don’t have the SSH sessions saved from my time on the course. Things like the internal addressing (uses both 169.254.0.0/16 and 128.0.128.0/24 addresses) and routing will be the topic of a future post.

    *Note: Roadmap, currently only 40Gbps backplane.

    Node Groups

    Each Node in a QFabric is designated as part of one of three kinds of “node groups”. These node groups define what role and type of connectivity is required for the node. Note that each QF Node uses its own local Packet Forwarding Engines (PFE) and Route Engines (RE) to perform line-rate forwarding. Forwarding performance is distributed across all the QF Nodes, instead of being punted to a central control like a supervisor. Below is a list with a brief explanation of the three different kinds of node groups:

    • Server Node Group: consists of a single QF Node and only runs host-facing protocols such as LACP, LLDP, ARP and DCBX. Used to connect servers that do not require cross-node redundancy (ie. servers connected to a single Node). This is the default Node Group for QF Nodes.
    • Redundant Server Node Group: Consists of two QF Nodes and only runs host-facing protocols similar to a Server Node group. The difference is that servers can create LAGs across both QF Nodes in a Redundant Server Node group. Of the two Nodes in a RSNG, one is selected as the “active” RE. The other node is a standby and fails over to it should the active fail. Both Nodes utilize their PFEs for local forwarding.
    • Network Node Group: Consists of one or more Nodes (up to eight/sixteen* in future releases). This group runs your L2/L3 network-facing protocols such as Spanning Tree, OSPF, BGP and PIM. Only one Network Node group exists in a QFabric system. RE functions for a Network Node group are sent up to the Directors for control plane processing –

    By the way, to convert a QFX3500 or QFX3600 switch to become a QF Node and join a QFabric, simply run the following command & reload the box:

    root@qfabric> request chassis device-mode node-device
    Device mode set to `node-device' mode.
    Please reboot the system to complete the process.

    All interface-specific configuration uses the aliases assigned to each QF Node (default names uses each nodes serial number, this can be changed under the edit fabric aliases stanza). Below is a small JUNOS config snippet for a QFab:

    chassis {
        node-group NW-NG-0 {
            aggregated-devices {
                ethernet {
                    device-count 1;
                }
            }
        }
        node-group RSNG-1 {
            aggregated-devices {
                ethernet {
                    device-count 48;
                }
            }
        }
    }
    interfaces {
        NW-NG-0:ae0 {
            aggregated-ether-options {
                lacp {
                    active;
                }
            }
            unit 0 {
                family ethernet-switching {
                    port-mode trunk;
                    vlan {
                        members all;
                    }
                }
            }
        }
        Node-0:ge-0/0/12 {
            unit 0 {
                family ethernet-switching;
            }
        }
    ...

    This is where it becomes apparent that a QFabric “looks like” (from a configuration standpoint) a single giant switch.

    There’s quite a bit of moving parts and I’ve just scratched the surface here. Will be diving deep myself and will update my blog accordingly :).

    Thanks to Juniper for the excellent course CMQS. Other references used are the QFabric Architecture whitepaper and the QFabric deployment guides on Juniper’s website.

CCIP completed, onto a different brand of Koolaid

Earlier this month, I sat for my Qos 642-642 exam to complete my CCIP certification. Other than a few gripes with out-dated information, the exam went over pretty smoothly and I hammered out a pass. I’ve written previously of my motivations for obtaining the CCIP cert and am glad to have stuck with it. Even though the certification will officially retire in a week or so, a lot of the topics covered will also be on the CCIE R&S version 4.0 blueprint. I doubt I’m finished with BGP, MPLS and QoS so I’m keeping that knowledge tucked away for the time being 😉

Just one last note on CCIP, I would highly recommend Wendell Odom’s Cisco QoS Exam Cert Guide for anyone looking to learn about QoS on Cisco IOS. This is one of the best Cisco Press books I’ve read and continue to reference it for everything IOS QoS.

Now that I’ve a broad brush of Cisco R&S technologies with my CCNP and CCIP, I’ve decided to re-visit my Juniper studies. While we don’t work all that much with Juniper at $DAYJOB, we have Juniper gear in the lab to play with. Recently, I’ve been using EX4200 and EX4500 switches as well as working through Juniper’s free JNCIS-ENT study guide. Coming from a Cisco background and particular having gone through CCNP, I’m finding there’s a good amount of overlap. It’s just learning all the JUNOS hierarchies and “where is that feature” in JUNOS.

Upcoming posts will cover some basic JUNOS switching on EX and interoperating with Cisco Catalyst 3560/3750’s. I’ll also be finishing a lot of my draft posts from earlier this year covering BGP, MPLS and some vendor ranting 😛

Stay tuned.

MPLS VPN Label Basics – The LIB, the LFIB and the RIB(s)

LDP, or Label Distribution Protocol, is used to advertise label bindings to peers in an MPLS network.

The Label Information Base, or LIB, contains all received labels from remote peers and is similar to the IP RIB. Not all labels received from LDP neighbors are used since there will be a best path selected and to be used for forwarding for each prefix. Forwarding decisions are based on the Label Forwarding Information Base, or LFIB, once the best path towards the next-hop LSR is determined. How this is determined is based on the close relationship between the LIB, the LFIB and the IP routing table (RIB).

For clarity, we’ll be talking about non-ATM MPLS forwarding. ATM MPLS uses different LDP discovery, label retention and distribution methods because of ATM’s unique forwarding method and encapsulation(s).

Here’s our simple MPLS topology. We have two PE routers, connecting two customer sites. We also have a route reflector to reduce the number of IBGP connections required between PE routers. This is part of my MPLS lab so the irrelevant routers and configs will be omitted.

PE1 Router ID: 10.255.255.3/32
PE2 Router ID: 10.255.255.4/32
RR Router ID: 10.255.255.2/32

Routing within the MPLS network is provided by basic single-area IS-IS.

So how does MPLS build its Label FIB? First, let’s look at the VRF’s defined for this customer. We’ll be using VRF “Red” on both PE routers:

PE1#show ip vrf
  Name                             Default RD          Interfaces
  Red                              65000:1             Fa1/0
----
PE2#show ip vrf
  Name                             Default RD          Interfaces
  Red                              65000:1             Fa1/0

For VPNv4 routing between customer sites, MP-BGP is used to distribute label bindings for VRF routes. LDP will distribute label bindings for the Loopback0 BGP next-hop’s. OSPF is used between CE and PE routers.

On PE1, here are all the customer routes connected via Fa1/0

PE1#show ip route vrf Red ospf | in FastEthernet1/0
O IA    10.10.1.0/24 [110/2] via 10.1.1.2, 00:23:55, FastEthernet1/0
O       10.30.100.0/30 [110/101] via 10.1.1.2, 00:23:55, FastEthernet1/0
O       10.30.1.101/32 [110/2] via 10.1.1.2, 00:23:55, FastEthernet1/0

OSPF routes running in VRF Red are redistributed into MP-BGP under “address-family ipv4 vrf Red”.

PE1#show ip bgp vpnv4 rd 65000:1 10.10.1.0/24
BGP routing table entry for 65000:1:10.10.1.0/24, version 14
Paths: (1 available, best #1, table Red)
  Advertised to update-groups:
        1
  Local
    10.1.1.2 from 0.0.0.0 (10.255.255.3)
      Origin IGP, metric 2, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:65000:1 OSPF DOMAIN ID:0x0005:0x000000010200 
        OSPF RT:0.0.0.0:3:0 OSPF ROUTER ID:10.100.1.101:0
      mpls labels in/out 25/nolabel
PE1#

Here we can see the MPLS label binding that will be sent to other PE routers. PE routers with a VRF matching the same route targets will import these routes into the VRF of other sites.

In the MPLS LDP Forwarding table, an entry is created for these “local” VRF routes. That is, the routes reachable via the next-hop CE router:

PE1#show mpls forwarding-table vrf Red 10.10.1.0
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop    
Label  Label or VC   or Tunnel Id      Switched      interface              
25     No Label      10.10.1.0/24[V]   0             Fa1/0      10.1.1.2

This is the label that will be advertised to MP-BGP peers (in this case, reflected to PE2).

PE1 will also have a label binding for its own BGP next-hop IP address, which is the Loopback0 interface under the global routing table:

PE1#show mpls ldp bindings local 10.255.255.3 32
  lib entry: 10.255.255.3/32, rev 4
        local binding:  label: imp-null

This is advertised as an Implicit Null label, to avoid performing two lookups (once in the LFIB and another in the RIB for its connected prefix). Core P routers will have a label binding for this prefix:

CoreP#show mpls ldp bindings local
...
  lib entry: 10.255.255.3/32, rev 14
        local binding:  label: 17

In order for the correct labels to be used for forwarding, two labels will have to be used. The top label will be used to forward packets in the core (P) MPLS network to the BGP next-hop (either the loopback of PE1 or PE2, depending on the packet destination from the CE sites). The bottom label will be used to identify the VRF and outgoing interface to route packets towards the customer router(s).

So, for customer at Site B to reach network 10.10.1.0/24 at Site A, PE2 will use the following labels:

  • Label 17 for the transport label to PE1, received from MPLS core router(s); identified via RIB lookup in the VRF “Red” to identify next-hop IP address
  • Label 25 for the VPN label, received from PE1 via MP-BGP; identified in the VPNv4 BGP RIB

To verify:

Packet received on Fa1/0 destined for 10.10.1.1/24 from Site B router(s), performs VRF Red RIB lookup:

PE2#show ip route vrf Red ospf  
Routing Table: Red

     10.0.0.0/8 is variably subnetted, 9 subnets, 3 masks
O IA    10.10.1.0/24 [110/52] via 10.255.255.3, 00:46:32

PE2 identifies next-hop IP address, which is the BGP next-hop of PE1. Since it is traversing the MPLS network on the outgoing interface FastEthernet2/0 into the core, it needs to be labeled before transit:

PE2#show ip bgp vpnv4 rd 65000:1 10.10.1.0/24
BGP routing table entry for 65000:1:10.10.1.0/24, version 34
Paths: (1 available, best #1, table Red, RIB-failure(17))
  Not advertised to any peer
  Local
    10.255.255.3 (metric 20) from 10.255.255.2 (10.255.255.2)
      Origin IGP, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:65000:1 OSPF DOMAIN ID:0x0005:0x000000010200 
        OSPF RT:0.0.0.0:3:0 OSPF ROUTER ID:10.100.1.101:0
      Originator: 10.255.255.3, Cluster list: 10.255.255.2
      mpls labels in/out nolabel/25
PE2#show mpls ldp bindings 
...
lib entry: 10.255.255.3/32, rev 8
        local binding:  label: 17
        remote binding: lsr: 10.255.255.1:0, label: 17

Therefore, packets destined for customer Site A will be sent with the labels 17 and 25.

PE2#traceroute vrf Red 10.10.1.1

Type escape sequence to abort.
Tracing the route to 10.10.1.1

  1 10.10.1.9 [MPLS: Labels 17/25 Exp 0] 76 msec 52 msec 72 msec
  2 10.1.1.1 [MPLS: Label 25 Exp 0] 84 msec 40 msec 40 msec
  3 10.1.1.2 132 msec *  60 msec
PE2#

Below I will attempt to illustrate the decision process and relationship between all the entries in an MPLS router to demonstrate these relationships:

  1. An incoming packet from Site B, destined for 10.10.1.1, is received on PE2’s VRF interface Fa1/0.
  2. IP lookup is performed in the VRF table “Red” and identifies next-hop IP address known via global routing table. This route was redistributed from BGP into OSPF (hence the RIB failure) via PE1 next-hop of 10.255.255.3.
  3. BGP RIB lookup is performed to identify the VPN label. Under the VPNv4 address family, outgoing label is 25, as advertised by PE1
  4. Global RIB lookup is performed for BGP next-hop learned in VRF. Actual IP next hop in the MPLS core is identified (10.10.1.9) via outgoing interface FastEthernet2/0.
  5. Outgoing interface is an MPLS-enabled interface. LIB lookup performed to find bound address of the MPLS core next-hop of 10.10.1.9. Based on LDP neighbor that has bound IP address 10.10.1.9, remote label received from that LDP neighbor is used for transport label to PE1 loopback.
  6. LFIB entry created with Label 17, outgoing interface FastEthernet2/0 with next-hop IP address of 10.10.1.9 into core MPLS network and is routed onto PE1.

Example of “show mpls ldp neighbor” displays bound addresses for core P router(s). LIB entry selected for forwarding in LFIB is based on which LDP neighbor this next-hop IP address in the global RIB is bound to. In this case, only one LDP neighbor exists:

PE2#show mpls ldp neighbor 
    Peer LDP Ident: 10.255.255.1:0; Local LDP Ident 10.255.255.4:0
        TCP connection: 10.255.255.1.646 - 10.255.255.4.34846
        State: Oper; Msgs sent/rcvd: 118/119; Downstream
        Up time: 01:33:30
        LDP discovery sources:
          FastEthernet2/0, Src IP addr: 10.10.1.9
        Addresses bound to peer LDP Ident:
          10.10.1.1       10.255.255.1    10.10.1.5       10.10.1.9      
          10.10.1.13      
PE2#

In an MPLS VPN network, the label bindings received from remote peers (LIB), the label forwarding table (LFIB) and the various IP routing tables (VRF RIB, global RIB, BGP RIB, etc.) all work together in tandem to create the label stack used to forward packets from one VPN site to another. This is the basic forwarding paradigm of Multiprotocol Label Switching and enables service providers to provide L3VPN services to customers along with proper separation of customer routing via the use of VRF’s. References used in this post are Luc De Ghein’s MPLS Fundamentals book from Cisco Press and Cisco documentation, found at http://www.cisco.com/go/mpls.

BGP+MPLS Exam Passed! QoS and other things

Hi All,
I’ve been staying away from the Twitters and blogging to focus down on my BGP+MPLS composite exam. I wrote it this afternoon and passed, w00t! I wanted to give a HUGE thanks to Jarek Rek at his blog hackingcisco.blogspot.com. His labs are great to practice configuring Cisco IP routing and I recommend anyone preparing for CCNP ROUTE, CCIE R&S or anything routing-related to check it out. Thanks again Jarek!

So other than beating my chest, I will be finishing up some outstanding blog posts around my BGP and MPLS studies before moving on to my QOS exam. I’ve also been involved more and more with Juniper at work, along with trying to get up to speed with L2VPN technologies like basic EoMPLS. Metro Ethernet is a whole other rabbit hole that I wish to descend into eventually but at the moment, it’s still a bit of a mystery. It makes keeping up with blogging and goofing off at home challenging since I’m in study mode for CCIP while getting pulled in twenty different directions for real-world job stuff.

I’m currently looking for my next book go to through in prep of my QOS exam. My coworker had recommended Cisco Press’ “End to End QoS Network Design” while most of Learning@Cisco seems to recommend the IP Telephony QOS Exam study guide. That’s still up in the air until I review the exam topics. If anyone has a solid recommendation for 642-642, please let me know in the comments!

Last update, I picked up the newest edition of “TCP/IP Illustrated Volume 1”. Stevens book is often recommended by the experts and is considered the bible of Layers 4 and up. It’s a comprehensive tome and a great reference.

More technical posts coming shortly.

BGP Aggregate Addresses

This month I am studying in preparation for my CCIP BGP+MPLS exam (booked June 27th). I decided to go through with the CCIP certification, despite my annoyance with the new Service Provider track because the BGP, MPLS and QoS topics are covered in length on the CCIE Routing & Switching blueprints. I figure this is a good bridge to fill in the gaps that CCNP R&S leaves out (in particular MPLS and QoS, which isn’t covered at all in any of the CCNP blueprints).

As such, I’ve been able to dive into all the knobs and switches that BGP offers to control routing policy. For those who have gone through the newer CCNP R&S track, the BGP fundamentals are explained and covered enough to get engineers familiarized with its operations. There’s a lot of depth lacking in CCNP and for good reason…BGP can be a career in and of itself. In service provider environments, when you’re pulling half a million IPv4 routes from upstream peers and providing L3VPN services to your customers via MPLS, you need a protocol like BGP that can scale.

Route summarization, when half a million routes are available on the global Internet table, can help keep specific and unnecessary routes from propagating out to upstream providers and thus alleviate memory and CPU required for carrying these thousands of routes. To summarize a set of routes in BGP, you have a few options:

  • Manual static Null0 routes advertised in BGP
  • aggregate-address command

Let’s look at a scenario. This is taken out of a BGP topology I’ve been working on this week to help me gain a better understanding of some of the more advanced BGP topics.

Subnets*:

  • 100.100.255.0/24 for all CE-facing Point-to-Point links
  • 100.100.254.0/24 for BGP update souce loopbacks
  • 100.100.253.0/24 for all inter-AS Point-to-Point links
  • 100.100.200.0/24 allocated to Enterprise A from this ISP
  • 100.100.0.0/16 allocated to ISP from registry

*Note: this is my best guess of how an ISP would assign addressing in its network. Being an enterprise guy, I’ve yet to be exposed to any service provider network. For those with more experience, any corrections on this please let me know in the comments below 🙂

In this topology, we have one route reflector “RR” with IBGP running between RR and all the PE routers (just PE1 and PE2 in this case).
We want to aggregate all of the ISP’s routes to advertise upstream to Upstream SP at AS 200.

Below is the BGP RIB on our route reflector before any aggregation. These are all the routes advertised by the PE routers as well as any allocations given to customers who require more than a single address.

RR#sh ip bgp
BGP table version is 31, local router ID is 100.100.254.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
* i100.100.200.0/24 100.100.254.3            0    100      0 65501 i
*>i                 100.100.254.1            0    100      0 65501 i
*>i100.100.255.0/31 100.100.254.1            0    100      0 i
*>i100.100.255.8/31 100.100.254.3            0    100      0 i
!
!
AS200#sh ip bgp
BGP table version is 37, local router ID is 100.100.253.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.200.0/24 100.100.253.1                          0 400 i
*> 100.100.255.0/31 100.100.253.1                          0 400 i
*> 100.100.255.8/31 100.100.253.1                          0 400 i

As you can see, in a huge service provider network, the BGP RIB would be filled with any public IP addresses used to connect its customers to the outside world, as well as any allocations given by this ISP to its larger customers (such as Enterprise A in this case, which is dual homed at PE1 and PE2). Also included is the BGP RIB of the upstream AS 200 router, who receives these specific prefixes from AS 400.

Now let’s reduce the routing table by aggregating them into a summarized route. First, we’ll start by adding in a static route to the Null0 interface and advertise it in BGP:

! On RR:
conf t
 ip route 100.100.0.0 255.255.0.0 Null0
!
router bgp 400
 network 100.100.0.0 mask 255.255.0.0
!
RR#sh ip ro static
100.0.0.0/8 is variably subnetted, 11 subnets, 4 masks
  S 100.100.0.0/16 is directly connected, Null0
RR#sh ip bgp
BGP table version is 25, local router ID is 100.100.254.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   0.0.0.0                            32768 i
*>i100.100.255.0/31 100.100.254.1            0    100      0 i
*>i100.100.255.8/31 100.100.254.3            0    100      0 i

And to verify on the upstream AS:

AS200#sh ip bgp
BGP table version is 31, local router ID is 100.100.253.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   100.100.253.1            0             0 400 i
*> 100.100.200.0/24 100.100.253.1                          0 400 i
*> 100.100.255.0/31 100.100.253.1                          0 400 i
*> 100.100.255.8/31 100.100.253.1                          0 400 i

Since we’re still advertising the more specific routes inside the ISP AS 400, manual filtering will be required on the router reflector. This can be accomplished by a simple prefix list or route-map on RR.

! on RR
ip prefix-list OurAlloc permit 100.100.0.0/16 
!
! Match only our allocated address space
!
router bgp 400
 neighbor 100.100.253.0 prefix-list OurAlloc out

The problem with this approach is that, while it is fairly simple, does require you to manually filter any more-specific routes on the edge of your AS. Also, if you are serving multihomed customers with their own address allocation (independent of this ISP’s allocation), you will have to take those into account as well in your filtering.

The other way to aggregate a set of routes in BGP is through the aggregate-address command. This command not only creates a Null0 route automatically but also suppresses more-specific routes from the BGP RIB. Using only the aggregate-address summarization, upstream peers will only receive the aggregated route and not the individual more-specific prefixes.

! on RR
router bgp 400
 aggregate-address 100.100.0.0 255.255.0.0 summary-only
!
RR#sh ip bgp
BGP table version is 31, local router ID is 100.100.254.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   0.0.0.0                            32768 i
s i100.100.200.0/24 100.100.254.3            0    100      0 65501 i
s>i                 100.100.254.1            0    100      0 65501 i
s>i100.100.255.0/31 100.100.254.1            0    100      0 i
s>i100.100.255.8/31 100.100.254.3            0    100      0 i

AS200#sh ip bgp
BGP table version is 37, local router ID is 100.100.253.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
          r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   100.100.253.1            0             0 400 i

As you can see, after using the aggregate-address command on AS 400 RR, only our configured summarized address is advertised out to AS200. You can also see the suppressed routes in RR’s BGP RIB, since we used the “summary-only” parameter in the aggregate-address command. All more-specific routes are suppressed from being advertised to BGP peers thus reducing what used to be many routes to just what’s configured.

Aggregation, combined with proper filtering, should be performed wherever and whenever possible. As of today, CIDR Report indicates over 410,000 routes exist in the global table. With aggregation (as estimated by CIDR Report), as much as half of all the routes in existence today can be aggregated.

Cosmetic Bug: IS-IS Network Entity Title

cosmetic bug:

a software error condition that does not impact a system in any functional way; types of errors can include spelling mistakes, transient error messages, etc.

I thought I’d start a series of blog posts dedicated to what I call “cosmetic bugs” in terms of networking technology. What I mean by that is, things that we learn, see and do in networking without any reason as to the why, because it doesn’t impact a router, switch or protocol in anyway…Just that the why’s have somehow been lost in translation of the years.

One such case is related to the lovely link-state protocol IS-IS. IS-IS stands for “Intermediate System to Intermediate System” and was originally developed to facilitate routing between “intermediate systems” – synonymous with an IP router – over the OSI Connectionless Network Service (CLNS) protocol stack. It was later extended in RFC 1195 to support both OSI and TCP/IP networks (renamed to Integrated IS-IS or Dual IS-IS). Since the OSI protocol stack has been obsoleted by TCP/IP, IS-IS is typically used in service provider core networks due to its scalability and link-state properties.

Having taken CCNP BSCI in college and gone through ROUTE in my current profession, I’ve always been intrigued by the mystical awe that is the IS-IS protocol. Being a link-state routing protocol, IS-IS is similar to OSPF in that networks are learned through flooding of link-state information throughout a domain. However, since IS-IS originated from the ISO to work in tandem with the OSI protocol stack, certain “legacy” properties remain. As indicated in the title of this blog post, I just wanted to spend some time as to the “why” behind the Network Entity Title; also known as the IS-IS NET.

The NET is a configured identifier on IS-IS routers that defines a topology. It is a hexadecimal value and indicates both an area ID and a System ID.

An IS-IS NET is made up of Area ID and a System ID. The Area ID performs the same functions as it does with OSPF (with some key differences that I won’t go into in this blog post) and is topology-driven. The System ID performs the same functions as the Router ID does in OSPF. Unlike in OSPF, it does not have to be derived from an IP address nor requires an IP address to be configured on any interface to function. Also, unlike OSPF which sits at Layer 3 (ie. has an IP header below the OSPF header), IS-IS exists directly at Layer 2 (ie. IS-IS PDU header directly after Layer 2 header). To further compare the two, IS-IS NETs must be defined within a certain structure, whereas OSPF uses arbitary values for Area ID’s and Router ID’s. Some of the details I won’t go into just because it simply has nothing to do with the TCP/IP stack. If, like me, you’ve ever wondered why Cisco uses the same configuration example in all IS-IS documentation, hopefully I can shed some light on that. Let’s look at the structure of a NET to give us some more detail:

As indicated in the diagram above, the following rules must be followed when defining the NET:

  • AFI must be 1 byte
  • Area ID can be 0 to 12 bytes long
  • System ID must be 6 bytes long
  • SEL must be 1 byte

The reason for these “rules” is that a NET is a special version of an ISO network service access point (NSAP) address, familiar to anyone who has worked with ISO protocols.

The AFI, or the Authority & Format Identifier, holds no real value in a IP-only environment. In relation to ISO protocols, the AFI was used similarly to an OUI (Organizationally Unique Identifier) in a MAC address, which would have identified the assigning authority of the address. However, in an IP-only environment, this number has no meaning separate from the Area ID itself. Most vendors and operators tend to stay compliant with the defunct protocols by specifying an AFI of “49”. This is synonymous with RFC 1918 IP addresses – it is privately administered and not assigned to another one specific organization. While best practice, the AFI byte can be combined to format a single Area ID value and is left to the discretion of the network admin.

Area ID’s function just as they do in OSPF and are decimal-notated only.

System ID can be anything chosen by the administrator, similarly to an OSPF Router ID. However, best practice with NETs is to keep the configuration as simple as humanly possible. The System ID is typically derived from either the 48-bit MAC address of an interface (“0cad.83b4.03e9”) or an IP address such as configured on a loopback interface. When defining a System ID as derived from an IP address, you can use a few conversion methods since it must be 6 bytes in length and an IPv4 address is only 4 bytes long. One is to simply add enough zeros to fulfill the 6 byte requirement, which is the simplest. You can also convert an IP address to decimal or hexadecimal formats.

Loopback IP address of 10.255.255.200
NET System ID = 1025.5255.2000

The System ID is solely up to the administrator to choose and requires to be unique within a routing domain. MAC addresses are the easiest choice since MAC addresses are globally unique burned-in addresses and *should not* under normal circumstances be the same between different devices.

The final piece in a NET is the SEL byte, or the NSAP Selector byte. In ISO, this value is used to indicate an upper-layer function. Think of this as being similar to a TCP or UDP port number. In an IP-only network, where no upper-layer ISO protocols exist, an IP router will expect a SEL value of 0x00. This value should always be set to 0x00, which indicates the router itself is the “upper layer” protocol. The take away here is that the SEL is not relevant in an IP network and should be set to 00 to keep NET assignment simple.

*note: As pointed out by Marko Milivojevic on Twitter, a non-0 SEL value indicates a pseudonode. IS-IS on multiaccess networks elect a Designated Intermediate System (DIS). Think DR in OSPF. I’m leaving a lot of details out but just keep in mind that configuring a non-zero value for the SEL will throw you a syslog message since IOS will expect this to be configured as a 0. Non-zeros indicate pseudonodes, such as a DIS, which are “virtual nodes”. More on this later.

Below I’ll list some examples of NETs based on the above rules.

For NSAP format compliant NET, AFI of 49, Area ID of 0001, System ID of 0cad.83b4.03e9 (example MAC address) and a SEL of 00:


Router(config)#router isis
Router(config-router)#net 49.0001.0cad.83b4.03e9.00

Routers in different areas can simply use a different Area ID, no different than in OSPF. You just need to be sure the System ID is still unique, as shown below:


Router(config-router)#net 49.0002.0cad.83b4.03f0.00

For smaller networks with fewer areas, you can also define NETs according to this format:

this time using a loopback IP address of 172.31.255.254:
Router(config)#router isis
Router(config-router)#net 01.1723.1255.2540.00

An important note about NETs is that a router can only be part of ONE area. This is different than OSPF, which ABR’s will typically have at least one interface in area 0 and another interface in a standard or stub area. There are slight topology differences that account for this, which will be the topic of a future post.

The biggest thing to note when it comes to IS-IS NETs is to Keep-It-Simple-Stupid! Personally, I got hung up on the why a NET is always shown with an AFI value of 49. Details like this are just “cosmetic” – your IS-IS network will function just fine if you don’t follow ISO standards, since they’re really not relevant in an IP-only world. However, as you can see on Cisco’s website, best practices and simplicity are what determines what we’re told when learning the protocols. The “why” may not be important, but it’s still worth knowing a thing or two about it, even just to quell your own curiosity.

More on IS-IS in future post(s) – it’s worth knowing, being another tool in the Network Wizard’s tool belt.

EDIT: Thanks to Marko for his corrections and clarifications on some of the key terms and concepts. More posts in the future will be needed to explain IS-IS in more depth…stay tuned 😉