Basics of a QFabric

Earlier this month, I attended Juniper’s Configuring & Monitoring QFabric Systems in preparation for our customers interested in QFabric for their data centers. Having listened to Packet Pusher Show 51 on Juniper QFabric, I thought I had known all there is to know to QFabric. Throughout the course, I quickly realized that while I did get the “gist” of what QFabric looks like and what problems it solves, there is a bit to know on getting the system up and running. I suggest all of those interested to listen to the Packet Pushers show to at least get the basic idea of what composes a QFabric. Below I’ll list each piece and its function:

  • QFabric Nodes: Comparing the system to a traditional chassis, the QFabric Nodes are the equivalent to line cards. These provide the ports to your external devices such as servers, storage and networking devices (routers, firewalls and load balancers, etc). They are high-density 10GbE (in the case of QFX3500) and 40GbE (QFX3600) switches that can be positioned where your traditional top-of-rack switches might be in the data center. QF Node switches can be implemented in brownfield deployments and can be run as standalone ToR switches, supporting all the classic switch features such as STP, LAG, etc., until an organization decides to go forward with a full QFabric deployment.
  • QFabric Interconnect: Back to our chassis analogy, the Interconnects act as a backplane for the system. It’s sole purpose is to forward packets from one Node to the next. This is high-speed transport to interconnect (hence the name) everything in the fabric.
  • QFabric Directors: Lastly, thinking to our chassis example, this is the Routing Engine (RE) or supervisor of the system. The Director is responsible for managing the QFabric by providing the CLI to the admins and also handles the control plane side of things such as building routing and forwarding tables, as well as managing the QFabric devices. All of the work done to configure and monitor a QFabric system is done on your Directors.
  • Out-of-Band Control Plane (EX4200 in Virtual Chassis’s)*: An out-of-band control plane network is required to connect all the Nodes, Interconnects and Directors. Note that this network is only used within the QFabric for control and management plane communication between all your QF pieces. It does not interact with your existing OOB management network. Juniper provides configuration of EX4200 switches that are to be used for this network so no configuration *should* be performed on these switches. This network serves as an out-of-band control plane network so that no configuration, management, or Layer 2/Layer 3 network control goes over the data path.
  • *Note: For simplicity’s sake, Juniper recommends customers to follow the port cabling as detailed in the following techpubs. All EX4200 control plane switch configurations follow this cabling and you will most likely run into support issues if you do not follow this. As always, YMMV. Connecting the QF Directors, connecting the QF Interconnects, and connecting the QF Nodes to the control plane switches. Keep in mind that Juniper offers two different deployments of QFabric, -G and -M. Cabling may vary depending on which deployment you choose!

    Now that you have the basics of what makes up a QFabric, let’s look at some of the finer details of the system.

    Director Group/Cluster

    For any QFabric deployment, at least two QF Directors are required. QF Directors are grouped into Director Groups or clusters, which can load-balance certain functions between the two. Configuration, topology information, device status and state information is synchronized between all QF Directors in a Director Group (DG). The DG also hosts a number of Routing Engines (RE), each with a specific purpose. For example, DG run a Fabric Manager RE, which provides routing and forwarding functions to QF devices such as topology discovery, internal IP address assignment and inter-fabric communication. Another RE running on the DG is used for the Layer 3 functions of the Network Node group (see below). All REs are virtualized under the hood, running off of a Juniper CentOS hypervisor, and are shared across individual directors in either an active/active or active/standby setup (depending on the function required for the RE). Most of this is very under-the-hood and does not require any direct interaction. The only parts that most operators will be interested in is the single point of management for the entire QFabric. Your DG provides the JUNOS CLI as well as DNS, DHCP, NFS, SNMP, syslog and all your other expected management pieces on traditional Juniper switches.

    Topology & Device Discovery

    Devices are discovered via internal routing processes on each QF device. The Fabric Manager RE on the Director Group, as well as QF Nodes and Interconnects, use what Juniper calls “system discovery protocol”. This protocol is essentially IS-IS extended for use with QFabric, with each device sending out IS-IS-style Hellos across the both the control plane EX4200 VC’s and the 40Gbps/100Gbps* data path to discover one another. The end result is that each node knows about every other node and all data paths can be used for ingress-to-egress through the fabric, similar to multipathing in Layer 3. On the control plane side of things, instead of using simple signaling on a backplane for each “line card” and RE, QFabric is one big TCP/IP LAN and communicates as such. While I’ll leave this blog post with this simplistic explanation of the under-the-hood workings, I suggest reading Ivan’s excellent post at ipspace.net of QFabric’s inner BGP/MPLS-like functions. The internal workings are a little obfuscated from current literature and unfortunately I don’t have the SSH sessions saved from my time on the course. Things like the internal addressing (uses both 169.254.0.0/16 and 128.0.128.0/24 addresses) and routing will be the topic of a future post.

    *Note: Roadmap, currently only 40Gbps backplane.

    Node Groups

    Each Node in a QFabric is designated as part of one of three kinds of “node groups”. These node groups define what role and type of connectivity is required for the node. Note that each QF Node uses its own local Packet Forwarding Engines (PFE) and Route Engines (RE) to perform line-rate forwarding. Forwarding performance is distributed across all the QF Nodes, instead of being punted to a central control like a supervisor. Below is a list with a brief explanation of the three different kinds of node groups:

    • Server Node Group: consists of a single QF Node and only runs host-facing protocols such as LACP, LLDP, ARP and DCBX. Used to connect servers that do not require cross-node redundancy (ie. servers connected to a single Node). This is the default Node Group for QF Nodes.
    • Redundant Server Node Group: Consists of two QF Nodes and only runs host-facing protocols similar to a Server Node group. The difference is that servers can create LAGs across both QF Nodes in a Redundant Server Node group. Of the two Nodes in a RSNG, one is selected as the “active” RE. The other node is a standby and fails over to it should the active fail. Both Nodes utilize their PFEs for local forwarding.
    • Network Node Group: Consists of one or more Nodes (up to eight/sixteen* in future releases). This group runs your L2/L3 network-facing protocols such as Spanning Tree, OSPF, BGP and PIM. Only one Network Node group exists in a QFabric system. RE functions for a Network Node group are sent up to the Directors for control plane processing –

    By the way, to convert a QFX3500 or QFX3600 switch to become a QF Node and join a QFabric, simply run the following command & reload the box:

    root@qfabric> request chassis device-mode node-device
    Device mode set to `node-device' mode.
    Please reboot the system to complete the process.

    All interface-specific configuration uses the aliases assigned to each QF Node (default names uses each nodes serial number, this can be changed under the edit fabric aliases stanza). Below is a small JUNOS config snippet for a QFab:

    chassis {
        node-group NW-NG-0 {
            aggregated-devices {
                ethernet {
                    device-count 1;
                }
            }
        }
        node-group RSNG-1 {
            aggregated-devices {
                ethernet {
                    device-count 48;
                }
            }
        }
    }
    interfaces {
        NW-NG-0:ae0 {
            aggregated-ether-options {
                lacp {
                    active;
                }
            }
            unit 0 {
                family ethernet-switching {
                    port-mode trunk;
                    vlan {
                        members all;
                    }
                }
            }
        }
        Node-0:ge-0/0/12 {
            unit 0 {
                family ethernet-switching;
            }
        }
    ...

    This is where it becomes apparent that a QFabric “looks like” (from a configuration standpoint) a single giant switch.

    There’s quite a bit of moving parts and I’ve just scratched the surface here. Will be diving deep myself and will update my blog accordingly :).

    Thanks to Juniper for the excellent course CMQS. Other references used are the QFabric Architecture whitepaper and the QFabric deployment guides on Juniper’s website.