Basics of a QFabric

Earlier this month, I attended Juniper’s Configuring & Monitoring QFabric Systems in preparation for our customers interested in QFabric for their data centers. Having listened to Packet Pusher Show 51 on Juniper QFabric, I thought I had known all there is to know to QFabric. Throughout the course, I quickly realized that while I did get the “gist” of what QFabric looks like and what problems it solves, there is a bit to know on getting the system up and running. I suggest all of those interested to listen to the Packet Pushers show to at least get the basic idea of what composes a QFabric. Below I’ll list each piece and its function:

  • QFabric Nodes: Comparing the system to a traditional chassis, the QFabric Nodes are the equivalent to line cards. These provide the ports to your external devices such as servers, storage and networking devices (routers, firewalls and load balancers, etc). They are high-density 10GbE (in the case of QFX3500) and 40GbE (QFX3600) switches that can be positioned where your traditional top-of-rack switches might be in the data center. QF Node switches can be implemented in brownfield deployments and can be run as standalone ToR switches, supporting all the classic switch features such as STP, LAG, etc., until an organization decides to go forward with a full QFabric deployment.
  • QFabric Interconnect: Back to our chassis analogy, the Interconnects act as a backplane for the system. It’s sole purpose is to forward packets from one Node to the next. This is high-speed transport to interconnect (hence the name) everything in the fabric.
  • QFabric Directors: Lastly, thinking to our chassis example, this is the Routing Engine (RE) or supervisor of the system. The Director is responsible for managing the QFabric by providing the CLI to the admins and also handles the control plane side of things such as building routing and forwarding tables, as well as managing the QFabric devices. All of the work done to configure and monitor a QFabric system is done on your Directors.
  • Out-of-Band Control Plane (EX4200 in Virtual Chassis’s)*: An out-of-band control plane network is required to connect all the Nodes, Interconnects and Directors. Note that this network is only used within the QFabric for control and management plane communication between all your QF pieces. It does not interact with your existing OOB management network. Juniper provides configuration of EX4200 switches that are to be used for this network so no configuration *should* be performed on these switches. This network serves as an out-of-band control plane network so that no configuration, management, or Layer 2/Layer 3 network control goes over the data path.
  • *Note: For simplicity’s sake, Juniper recommends customers to follow the port cabling as detailed in the following techpubs. All EX4200 control plane switch configurations follow this cabling and you will most likely run into support issues if you do not follow this. As always, YMMV. Connecting the QF Directors, connecting the QF Interconnects, and connecting the QF Nodes to the control plane switches. Keep in mind that Juniper offers two different deployments of QFabric, -G and -M. Cabling may vary depending on which deployment you choose!

    Now that you have the basics of what makes up a QFabric, let’s look at some of the finer details of the system.

    Director Group/Cluster

    For any QFabric deployment, at least two QF Directors are required. QF Directors are grouped into Director Groups or clusters, which can load-balance certain functions between the two. Configuration, topology information, device status and state information is synchronized between all QF Directors in a Director Group (DG). The DG also hosts a number of Routing Engines (RE), each with a specific purpose. For example, DG run a Fabric Manager RE, which provides routing and forwarding functions to QF devices such as topology discovery, internal IP address assignment and inter-fabric communication. Another RE running on the DG is used for the Layer 3 functions of the Network Node group (see below). All REs are virtualized under the hood, running off of a Juniper CentOS hypervisor, and are shared across individual directors in either an active/active or active/standby setup (depending on the function required for the RE). Most of this is very under-the-hood and does not require any direct interaction. The only parts that most operators will be interested in is the single point of management for the entire QFabric. Your DG provides the JUNOS CLI as well as DNS, DHCP, NFS, SNMP, syslog and all your other expected management pieces on traditional Juniper switches.

    Topology & Device Discovery

    Devices are discovered via internal routing processes on each QF device. The Fabric Manager RE on the Director Group, as well as QF Nodes and Interconnects, use what Juniper calls “system discovery protocol”. This protocol is essentially IS-IS extended for use with QFabric, with each device sending out IS-IS-style Hellos across the both the control plane EX4200 VC’s and the 40Gbps/100Gbps* data path to discover one another. The end result is that each node knows about every other node and all data paths can be used for ingress-to-egress through the fabric, similar to multipathing in Layer 3. On the control plane side of things, instead of using simple signaling on a backplane for each “line card” and RE, QFabric is one big TCP/IP LAN and communicates as such. While I’ll leave this blog post with this simplistic explanation of the under-the-hood workings, I suggest reading Ivan’s excellent post at ipspace.net of QFabric’s inner BGP/MPLS-like functions. The internal workings are a little obfuscated from current literature and unfortunately I don’t have the SSH sessions saved from my time on the course. Things like the internal addressing (uses both 169.254.0.0/16 and 128.0.128.0/24 addresses) and routing will be the topic of a future post.

    *Note: Roadmap, currently only 40Gbps backplane.

    Node Groups

    Each Node in a QFabric is designated as part of one of three kinds of “node groups”. These node groups define what role and type of connectivity is required for the node. Note that each QF Node uses its own local Packet Forwarding Engines (PFE) and Route Engines (RE) to perform line-rate forwarding. Forwarding performance is distributed across all the QF Nodes, instead of being punted to a central control like a supervisor. Below is a list with a brief explanation of the three different kinds of node groups:

    • Server Node Group: consists of a single QF Node and only runs host-facing protocols such as LACP, LLDP, ARP and DCBX. Used to connect servers that do not require cross-node redundancy (ie. servers connected to a single Node). This is the default Node Group for QF Nodes.
    • Redundant Server Node Group: Consists of two QF Nodes and only runs host-facing protocols similar to a Server Node group. The difference is that servers can create LAGs across both QF Nodes in a Redundant Server Node group. Of the two Nodes in a RSNG, one is selected as the “active” RE. The other node is a standby and fails over to it should the active fail. Both Nodes utilize their PFEs for local forwarding.
    • Network Node Group: Consists of one or more Nodes (up to eight/sixteen* in future releases). This group runs your L2/L3 network-facing protocols such as Spanning Tree, OSPF, BGP and PIM. Only one Network Node group exists in a QFabric system. RE functions for a Network Node group are sent up to the Directors for control plane processing –

    By the way, to convert a QFX3500 or QFX3600 switch to become a QF Node and join a QFabric, simply run the following command & reload the box:

    root@qfabric> request chassis device-mode node-device
    Device mode set to `node-device' mode.
    Please reboot the system to complete the process.

    All interface-specific configuration uses the aliases assigned to each QF Node (default names uses each nodes serial number, this can be changed under the edit fabric aliases stanza). Below is a small JUNOS config snippet for a QFab:

    chassis {
        node-group NW-NG-0 {
            aggregated-devices {
                ethernet {
                    device-count 1;
                }
            }
        }
        node-group RSNG-1 {
            aggregated-devices {
                ethernet {
                    device-count 48;
                }
            }
        }
    }
    interfaces {
        NW-NG-0:ae0 {
            aggregated-ether-options {
                lacp {
                    active;
                }
            }
            unit 0 {
                family ethernet-switching {
                    port-mode trunk;
                    vlan {
                        members all;
                    }
                }
            }
        }
        Node-0:ge-0/0/12 {
            unit 0 {
                family ethernet-switching;
            }
        }
    ...

    This is where it becomes apparent that a QFabric “looks like” (from a configuration standpoint) a single giant switch.

    There’s quite a bit of moving parts and I’ve just scratched the surface here. Will be diving deep myself and will update my blog accordingly :).

    Thanks to Juniper for the excellent course CMQS. Other references used are the QFabric Architecture whitepaper and the QFabric deployment guides on Juniper’s website.

    Advertisements

3 Responses to Basics of a QFabric

  1. Amit says:

    Hi Thomas,

    I have a couple of questions hopefully you can answer-

    1) How does the FMRE assign IP addresses to QF/Nodes and QF/Interconnects over the Out-of-band interfaces?
    2) Does the FMRE assign a /30 over the data link interfaces between QF/Nodes and QF/Interconnects?

    Thanks.
    Amit.

    • tomcooperca says:

      Hi Amit,
      From what I can tell, this is part of Juniper’s “secret sauce”. The FM RE does the assignment behind the scenes and all the devices use a combination of the 169.254/16 and 128/8 addressing for “fabric management” and “fabric control”. Fabric management can include things like pushing configuration done on the Directors out to the individual Node RE’s and PFE’s. I’m not sure of the specifics yet and I don’t think Juniper has published anything yet on it. The fabric control plane is what Ivan alludes to in his blog post here (blog.ioshints.info/2012/09/qfabric-behind-curtain-i-was-spot-on.html). The FC-0 and FC-1 RE’s on the Director Group act as BGP route reflectors, which then peer with every RE internal to the fabric (including the Interconnects) to distribute all the route/forwarding tables. The FMRE

      I was able to get some time on QF today and you can see some of these under-the-hood workings by logging into the individual RE’s such as FM-0, FC-0 and FC-1. Will have to dissect it for a future post. 🙂

      Here’s a little snippet:

      qfabric-admin@FM-0> show interfaces bme0 extensive
      Physical interface: bme0, Enabled, Physical link is Up
      Interface index: 64, SNMP ifIndex: 1082654757, Generation: 1
      Type: Ethernet, Link-level type: Ethernet, MTU: 1500, Clocking: Unspecified,

      Logical interface bme0.0 (Index 4) (SNMP ifIndex 1082654945) (Generation 2)
      Flags: LinkAddress 0-0 Encapsulation: ENET2

      Protocol inet, MTU: 1482, Generation: 153, Route table: 1
      Flags: Is-Primary
      Addresses, Flags: Primary Is-Default Is-Preferred Is-Primary
      Destination: 128/2, Local: 128.0.0.1, Broadcast: 191.255.255.255,
      Generation: 12
      Addresses, Flags: Primary
      Destination: 128/2, Local: 128.0.0.4, Broadcast: 191.255.255.255,
      Generation: 7
      Addresses, Flags: None
      Destination: 128/2, Local: 128.0.32.0, Broadcast: 191.255.255.255,
      Generation: 2

      Logical interface bme0.2 (Index 6) (SNMP ifIndex 1082654972) (Generation 4)
      Flags: Encapsulation: ENET2
      Traffic statistics:
      Input bytes : 260802941
      Output bytes : 254917787
      Input packets: 2734662
      Output packets: 2749766
      Local statistics:
      Input bytes : 260802941
      Output bytes : 254917787
      Input packets: 2734662
      Output packets: 2749766
      Protocol inet, MTU: 1486, Generation: 156, Route table: 36736
      Flags: Is-Primary
      Addresses, Flags: Primary Is-Default Is-Preferred Is-Primary
      Destination: 128/8, Local: 128.0.0.1, Broadcast: 128.255.255.255,
      Generation: 4
      Addresses, Flags: Primary
      Destination: 128/8, Local: 128.0.128.34, Broadcast: 128.255.255.255,
      Generation: 14
      Addresses, Flags: Primary Is-Preferred
      Destination: 169.254/16, Local: 169.254.128.1,
      Broadcast: 169.254.255.255, Generation: 5
      Addresses, Flags: Primary
      Destination: 169.254/16, Local: 169.254.192.17,
      Broadcast: 169.254.255.255, Generation: 15

      As you can see, internally they use a bunch of logical interfaces under the bme0 management interface which is using the 169.254/16 and 128/8 addressing. How those are derived…sorry, couldn’t tell ya!

      I’ve got lots of output to parse through from what I collected today so I’ll try to formulate another post once I know more.

      HTH!

      • Amit says:

        Thanks. I guess we will be watching closely to find out the internal architecture of QFabric. It seems like even Dell is doing the same with their Dell Fabric Manager.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: