Network Programmability with NX-OS & Python – First Steps

*Note: Some basic introductory knowledge of Python will be assumed for these blogposts. If you’re new to Python, I suggest having a look at Al Sweigart’s Automate The Boring Stuff online book for a beginner’s guide. Also, I’ll come back and update this post with real working code samples that you can try it yourself – for now, I’m using loose Python-like syntax to demonstrate the behavior rather than the individual lines of code.

Preface

Over the last year, I’ve had a shift in focus (mostly thanks to $DAYJOB) in my daily responsibilities to focus in the datacentre space of networking. The largest part of these new responsibilities has been improving our deployments of network gear, reducing time-to-market and eliminating as many common errors caused by these rollouts as possible. Without too many details, these implementations are accomplished via manual work of logging into devices, applying a configuration and verifying the change was completed successfully. This involves all copy-pasta of CLI commands and has caused our department myriads of headaches (and late night phones calls).

Hardware & Software

We needed to deploy a lot of new gear. This was brought up by the common enterprise IT culprits for putting in new kit, such as end-of-life concerns  and adding capacity to the current network. I’m sure everyone has been in the same planning meetings, where you needed to replace lots of old, aging and creaky equipment as well as addressing new business requirements.

The platform of our choice has been the Cisco Nexus 9300 line of top-of-rack switches. After rolling the standard 3-tier network architecture for many years, we decided to switch gears and build a modern leaf-spine Clos fabric using Cisco’s VXLAN EVPN technology. This has given us great capacity for 10G server access and scalability using 40G & 100G in the spine & core layers.

The best benefit of rolling with this technology & newer hardware has been the programmatic interfaces that Cisco has included. This comes to us in the form of NX-API.

NX-API

Cisco’s NX-API is an interface that allows an operator to interact with a Nexus device via standard HTTP calls. While this by itself doesn’t seem to be all that useful (and admittedly for most network engineers, adds complexity to their workflow) , the real benefits comes in with the way in which NX-API interacts with the device’s state.

Take this simple example of a manual CLI configuration:

switch1# conf t
 switch1(config)# vlan 100
 switch1(config-vlan)# name MyVlan100
 switch1(config)# interface Vlan100
 switch1(config-if)# ip address 10.0.100.1/24
 switch1(config-if)# no ip redirects
 switch1(config-if)# ip arp timeout 300
 switch1(config-if)# int Ethernet1/4
 switch1(config-if)# switchport mode trunk
 switch1(config-if)# switchport trunk allowed vlan 100

How would you typically verify this config was applied?

show vlan id 100
show interface Vlan100
show interface Ethernet1/4

All good so far. Now, imagine you had to verify this configuration was active on 5 devices? What about 50 devices? You can see how this starts to get unruly extremely quickly. Unfortunately, this has been the reality for most network engineers for decades as our vendors have not exposed interfaces for us to interact with that return this in a structured manner.

Let’s say you had 36 Cisco Nexus 9K’s that you wanted to verify this exact configuration. With NX-API, you use these same commands and get something back like this:

{ 'result': 
 [
  { 'body': 
   [
    { 'TABLE_VLAN':
     [
      { 'ROW_VLAN':
       [ 
         { 'vlan_id': 100,
           'vlan_name': 'MyVlan100',
           'vlan_state': 'active' } ],
            ... 
          }
       ]
      }
    ]
   }
 ]
}

This particular example shows that same “show vlan id 100” command but formatted in JSON. NX-API takes most common CLI commands and puts wrappers on individual components of the data that is typically shown in the CLI output. What this allows you to do is, using a scripting language, to drill down to exactly the data you want and present it in a deterministic fashion. For example, the ‘vlan_id’ will always return an integer of 1-4095, ‘vlan_name’ will return a string of characters representing the name of the VLAN, etc. This means that you perform basic operations such as checking that the VLAN ID is greater than or less than a given number, that the VLAN name conforms to some pattern that you set for your naming conventions or that it’s even active at all. Here’s how you might use it in a script:

import requests
import json

url='http://YOURIP/ins'
switchuser='USERID'
switchpassword='PASSWORD'

myheaders={'content-type':'application/json-rpc'}
payload=[
  {
    "jsonrpc": "2.0",
    "method": "cli",
    "params": {
      "cmd": "show interface",
      "version": 1
    },
    "id": 1
  }
]
response = requests.post(url,data=json.dumps(payload), headers=myheaders,
                         auth=(switchuser,switchpassword)).json()

The above code snippet is directly taken from NX-API Sandbox.

Another nice feature of this API is the Cisco NX-API Sandbox. So long as you’ve added “feature nxapi” to the device configuration, you can open up your web browser and log straight into this sandbox (by simply opening http://switch_ip_address/). The Sandbox will also give you example code snippets for building the proper payload and sending this HTTP request in Python so you can put in your scripts. Check out Cisco’s Programmability Guide for a quick guide on using the Sandbox.

Once you have a few working scripts that are using the API, you can even send the configuration commands using similar HTTP calls. I’ll cover this more on a later post.

Consistent Data

With an API such as this available, you can combine your CLI-fu with some basic knowledge of Python data types to interact with a device in interesting ways. For example, if you pulled the list of BGP peers via “show ip bgp summary” and the returned JSON data, you can loop through what is essentially a list of dictionaries to pull out the data you’re looking to extract. That is what you see in the for loop in the code sample above.

for each list item in result['key1']['key2']['key3']['list_of_peers']:   
   # Whenever we loop over this particular list of dictionaries ('list_of_peers'), 
   # the dictionary I get back will almost always have the same set of keys that I can access
   print(item['bgp_peer_ip']) 
   print(item['bgp_asn'])
   print(item['bgp_prefixes_received'])

Every time you send that command to any switch, you’re always going to receive the same nested list of key-value pairs, to which you can loop over quickly in Python and pull out only the data you care about. No longer do you need to SSH to a device (either manually or via an SSH client library), authenticate to a device, execute the desired command and scrape through the output manually (again, either manually in a terminal window or parsing it as a giant string in your interpreter).

Conclusion

If you get overwhelmed from the code above and all the fancy brackets, don’t worry. A good starting point would be to go through a beginner Python course such as LPTHW and familiarize yourself with variable types (such as integers, strings, lists, dictionaries, etc) and how to interact with them. This also can be done in any language of choice. All modern programming languages and interpreters have the necessary libraries (typically in their standard libraries) for sending HTTP1.1 requests to an endpoint and parsing well-known notations such as JSON or XML.

When you combine that with what you already know about using network devices everyday, you can start automating your repetitive, boring and simple day-to-day tasks and/or troubleshooting. It can be extended from everything to gathering routing & interface statistics to parsing MAC address tables to verifying SNMP community strings. The possibilities truly are endless.

In my next post, I’ll focus on taking another step in using tools such as Jinja2 combined with NX-API to generate plain-text configuration files, which can help build your configs in an error-free & consistent way.

Encapsulating, shim headers, tunnelling – does it matter?

I overheard my manager today giving one of the new junior guys the run-down on basic encapsulation methods and general IP. Basically, how to TCP/IP. In explaining 802.1Q and VLAN tagging, my boss uttered the following phrase:

“…a switch encapsulates the frame with the VLAN number…”

It was at that point I ran over to his desk and put in my point (a bit of semantics, especially for a newbie) that technically, a 802.1Q tag actually modifies the Ethernet header and is more like a shim header.


An 802.1Q frame

But then GRE and IP tunnelling was brought up. And MPLS. And IPSec. And after much debate, I let the coaching continue and probably caused quite a bit of confusion for my green colleague. I’ve come to realize that I tend to fly off the handle and get a bit too far in a technical discussion that I lose those who might not have learnt or had exposure to the technology that myself and others know intimately.

Anyways, onto the whole encap-vs-tunnel-vs-shim debate. Here’s how I like to explain it and understand it myself as it applies generally:

  • encapsulation distinctly divides bits on the wire. When viewing a packet in Wireshark for example, an IP packet is encapsulated in an Ethernet header and an IP header, followed by whatever transport protocol (TCP, UDP or ICMP), and then finally application data. Encapsulation forms the basic structure of data packets, with a clear division of labor. Routers inspect IP headers for destination networks. Switches inspect Ethernet headers for destination MAC addresses.
  • tunnels utilize tunnel and multiple IP headers to create overlay networks riding over underlying infrastructure. GRE (with IPSec for encryption) is the most widely used tunnelling mechanism. It works great for connecting remote sites over the Internet. Internet routers only inspect the outer IP header, which is routed to the tunnel endpoint, at which point the far-end will strip the outer IP and GRE headers, and then make its own routing decisions base on the “inside” IP header. MPLS* also functions in a similar way for VPN applications.
  • shim headers are the muddiest of the bunch. MPLS is often called a shim header because it inserts a small 4 byte* (32 bit) header within a data packet, which is then processed and/or inspected by MPLS-enabled routers. It also doesn’t have only one location where it could appear. In the case of pure IP L3VPNs, it’s shimmed between inner- and outer IP headers. It could also appear between disparate Layer 2 headers, in the case of Any Transport over MPLS (AToM). 802.1Q is further harder to define since it modifies existing Ethernet headers. A frame could have multiple .1Q tags in the case of dot1q tunnelling (confused yet?).

* MPLS adds an additional 20 bits per label.

And yet, after writing those short descriptions of each, it’s apparent to me that all of those terms are muddy. You can’t always define a data type as one over the other, since you’ll find numerous exceptions and special use cases (that are not so special and very widely deployed) that break any rigidly-defined “layering rules”. Smarter people than me have agreed to that fact as it relates to explaining TCP and using dated reference models such as the OSI model. The IETF has also agreed that reference models
that adhere to strict onion layers as it relates to data networks hurts more than it helps.

But I digress. The concepts of encapsulation/decapsulation and tunnelling are central concepts that all networks use. Never will you (or should you) see an end host spit out an IP packet without its data link header (mostly Ethernet these days) along with its IP header and any associated transport/application data. It’s just the way TCP/IP evolved over its development several decades ago. And it’s the best we got. Sometimes, it’s less important about terminology and semantics, and more important of the overall goal said method is trying to achieve.

If I’m grossly mistaken, be sure to let me know in the comments. I’ll try my best not to harass anyone less technical and nerdy than myself with (sometimes) unimportant details.

OSPF sequence numbers – why 80 million is smaller than 70 million

So a bit of a specific topic today. Going through Doyle’s Routing TCP/IP Volume 1, I felt my brain melt as he went through explaining sequence numbers in link-state advertisements (in a general sense, not specific to just OSPF). He describes two types of number “spaces” – the range of possible values – to describe how protocols sequence their LSA’s.

Ignoring the historic bits, such as Radia Perlman’s “lollipop space”, which is essentially a combination of cycling numbers with a fixed initialization value (this was part of the first version of OSPF drafts – not relevant for OSPFv2 or anything else), numbering spaces either follow linearly or circular.

In linear spaces, numbers start at x and end at y. The issue with linear space is that you could potentially “run out” of number space. This could cause a link-state protocol to be unable to distinguish between LSA’s that are the most recent from the originating router or just LSA’s being flooded from one router to the next. Link-state protocols, when receiving an LSA with the highest possible sequence number, shut down and age out it’s link-state database (LSDB) to flush all the older LSA’s out. To mitigate this, the designers had to make sure the field for a sequence number was large enough so as to never reasonably hit that highest possible value (y). Both OSPFv2 and IS-IS uses this number space scheme.

Circular number spaces never end – once a maximum value number is reached, it “resets” back to the lower boundary of the space. Since IS-IS and OSPFv2 use linear spaces, this is included for completeness. Perlman’s lollipop scheme used both linear and circular as a combination but these are not included in modern link state protocols.

IS-IS uses a rather simple scheme for it’s number space. A router will originate it’s own directly-connected link states with a sequence number of one (0x00000001), with a maximum sequence number of 4.2 billion (0xFFFFFFFF). This is because the IS-IS field for sequence numbers in its LSP’s (link state packet) uses unsigned 32-bit integers. These values range from 1 – 4294967295 in decimal.

OSPF, on the other hand, uses signed 32-bit integers. While it uses the same scheme for number spaces as IS-IS (linear), the way the values are represented (especially on a router’s database outputs) is…different.

Observe:

Net Link States (Area 1)

Link ID         ADV Router      Age         Seq#       Checksum
192.168.1.112   10.0.0.112      1862        0x80000237 0x00D860
192.168.7.113   10.0.0.113      12          0x80000001 0x00E8F5

So…it starts are 80 000 000?

Obviously, the seq. number is represented in hexadecimal format…but why 0x80000001? Doesn’t that translate to 2 billion decimal? The detail to note is the fact that this field is a signed integer. That means the integers actually range from – 2147483648 to + 2147483648. When processing this field in binary, the CPU needs a way of comparing sequence numbers to determine which one is “higher” – in this case, closer to positive +2147483648.

Programming languages such as C/C++ must pay particular attention to integers declared as signed vs unsigned. Some google- and wiki-fu later, the reason we see sequence numbers starting at 0x80000001 (0x80000000 is reserved via the RFC standard) is because the left-most/most significant bit determines whether a number is represented as a positive value or a negative value. When the MSB is set, the integer is a negative value. When the MSB is not set, it is a positive integer.

 

So…
0x80000001 is 1000 0000 …. 0000 0001 in binary
Since the MSB is set, this is the “first” integer value in a 32-bit signed integer range. It doesn’t make sense to think of these values in decimal values, since this does indeed translate “directly” to 2 billion. These sequence numbers will increment 0x80000002….all the way to 0xFFFFFFFF (-1 in decimal). Incrementing one more time would start the sequence at decimal 0. This is because the MSB must become “unset” for it to represent positive values. The range then continues from 0x00000001 until 0x7FFFFFFE. Again, from the RFC, 0x7FFFFFFF is reserved (actually, an LSA received with this maximum possible sequence number triggers OSPF to flush its LSDB…more nuts and bolts to be expanded on later).

 

The choice of using signed vs unsigned gets kind of blurred between hardware and software. The use of signed integers simplifies ALU designs for CPUs and most (if not all) programming languages implement signedness in their integer data types…Why the IETF chose to use signed integers as part of the OSPFv2 spec? Who knows…

 

Anyways, this really bothered me for a couple days. I feel better now that it’s on paper. Any gross errors or omissions, leave it in the comments!

 

PS: More math-savvy folks will scream at this in regards to two’s complement role here with signed integer binary representation…I just wanted to know and jot down why IOS shows the starting sequence numbers in show ip ospf database as 0x80000001. So there you have it. Further reading for the curious

CCIE or bust! And other going-on’s

The time has come…

I’ve finally made the committed decision to pursue my number for CCIE Routing and Switching. Like most folks in networking, I’ve gotten to the point where I’m feeling quite confident in my skills; solid foundations with just a few cobwebs here and there to knock out (mostly due to re-focusing). This decision has come to me after moving on from my VAR support job, which covered the entire breadth of Cisco but prevented my skills from becoming specialized, to a network engineer doing implementation for a financial org. Since I’m settling in to the new job, I’ve come to realize that all the nitty-gritty routing and switching bits are what interest me the most. Sure, I’ve done a bit of this and a bit of that in other areas (mostly in wireless and data center) but I’m an R&S guy.

Which brings me to my next bit of personal news – I’ve now gone from support to implementation. For those in the NOC’s or VAR’s, I would highly recommend it as a next step after you’ve gotten your feet wet in the trenches of support. It’s nice to learn what happens when things break and how to resolve issues, however, in my humble opinion, in order to have that deep understanding, you have to be there to know *why* something is configured or designed a certain way. Delving into the world of real-world business challenges and requirements, as well as ITIL and change management (ugh, how I loathe thee…a “necessary evil” some may say), I know get to make decisions on how my network looks and how it functions to accomplish a certain goal. Whatever those goals may be, such as a new project or business requirement. For those who are looking to move up in the world of networking, implementation is required experience.

So, while I haven’t been blogging much here (seriously, just so much to learn and write about…some may say too much!), I will be focusing on hitting the books and lab prep. I’m shooting for a Q2 2014 target. Wish me luck!

PS: There are so many good blogs out there with CCIE “notes” – however, I could start banging out tidbits here and there for things that stump me or just bother me…More to come.

Modular Chassis vs Fixed Configuration Switches – Part 2 When/Where

Part 2 of my chassis vs fixed config switch showdown. In this post, I’ll provide some examples of where you might find both form factors deployed and some of the use cases for both. Click here for part 1 of this series.

 

Fixed Configuration

  • Campus

One of the more obvious places to find fixed configuration switches is in the campus network. This may include a campus of all sizes, from SMB to large enterprise. Here, the best bang for your buck lends itself to fixed-port switches since you typically will see “dumb switches” deployed in the access layer for port density and provide host connectivity to your network. Examples include lower-end Catalyst 2900 series and Catalyst 3600/3700 series switches.

Not only will you find these switches in the access of campus networks. Many smaller joints will use mid-range Catalyst 3750’s (for example) and stack them for distribution and/or core layer functions. Some of the advantages of using a switch stack for your core are distributed control plane across the stack, as well as port density for the rack space. If you’re connecting a bunch of dumb Layer 2 switches across your access, you can easily uplink them to slightly-beefier Layer 3 switches such as 3560’s and 3750’s arranged in a stack configuration. Of course, you are subject to hardware failures since these platforms do not have any redundancy built in to each member switch. However, for smaller organizations that just require the number of ports in a (relatively) small package, these do just fine.

  • Data Center

Fixed-port switches also find themselves in many “top-of-rack” (ToR) deployments. One only has to look as far as the Nexus 5000 Series, Juniper’s EX switching line, Brocade’s VDX line and countless other vendors. These switches typically have the 10 Gigabit density requires to connect large numbers of servers. Depending on your requirements, native Fibre Channel can also be used (in the case of Cisco Nexus 5500’s and Brocade VDX 6700’s) to provide FCoE connectivity down to servers. Rack space and port density again are factors when deploying these fixed configuration switches. In the case of data centers, the FCoE capabilities are also typically found on these platforms where they may not exist in chassis offerings.

  • ISP

One lesser-seen place to find fixed-port switches is in an ISP environment. Metro Ethernet comes to mind here. Providing Layer 2 VPN (L2VPN) services, ISP’s need equipment that can interface with the CPE and provide the connectivity up to the SP core. Cisco’s ME Series finds that place on this list, providing Q-in-Q/PBB as “dumb SP access”.

Modular Chassis

  • Campus

In the campus network, you’ll seen chassis’s in all places. On the one hand, with proper density and power requirements, you can easily deploy something like a Cisco Catalyst 4500E in a wiring closest for host connectivity. On the other hand, you would also see Catalyst 6500E’s in the campus core, providing high-speed switching with chassis-style redundancy to protect against single-device failures. Redundant power and supervisors help mitigate failures of one core device.

  • Data Center

In classical designs, the data center core is where you’d most likely see a chassis-based switch deployed. High-speed switching is the name of the game here, as well as resiliency against failures. Again, the redundant hardware found in chassis’s protect against failures in the data, control and management plane. Cisco Nexus 7000’s and Juniper’s high-end EX8200 platforms

In some designs, you may see a chassis switch deployed in an “End of Row” (EoR) fashion. This follows the design principles of the fixed-config ToR deployments, except here you could deploy a chassis switch to (again) improve redundancy. While definitely not required for all environments, if you couldn’t possibly allow a failure of your first switch that touches the end hosts, the extra redundancy (supervisors, power, switch fabrics, etc.) fit the bill.

  • ISP

Since I don’t work in the field of service providers, I’ll present what I feel are appropriate places to find a chassis-based switch. More than likely you’ll be using chassis-based routers here but I’ll include it for completeness. Cisco 6500/7600, ASR1000/9000’s, Juniper MX, all chassis offerings that you could see in an ISP’s network. This includes (but not limited to) ISP core for high-speed MPLS or IP switching/routing and PE deployments. This is where rich services provided by these offerings live, in order for service providers to be able to offer an array of MPLS-based services. This extends from L3VPNs to L2VPNs as well (either L3TPv3-based, EoMPLS-based or VPLS-based deployments). Being critical/core devices, I would imagine most service providers to require the level of fault-tolerance offered by a chassis-based system, especially with strict SLA’s with customers.

 

And there you have it. Hopefully, I’ve given you a good overview of the what, where, when, and why of fixed configuration and modular chassis switches. Given the state of things with our vendors, you can typically find any and all form factors provided by each and hopefully will choose the fit platform to fit the requirements of your environment. Cisco, Juniper, Brocade and HP are the first names that I can think of. There are other players as well including Arista, BNT, Dell, Huawei, Alcatel-Lucent, and a slew of others might fit specific markets and environments but may not cover the broad spectrum required for every network. As always, do your research and you’ll find what you need just fine.

 

Any corrections or feedback, you know what to do! Thanks for reading.

Modular Chassis vs Fixed Configuration Switches – Part 1 The Why

One thing that seems to be brought up a lot in conversation around the office, especially for newer folks entering the networking biz, is the choice of using larger modular chassis-based switches versus the smaller simpler fixed-configuration cousins. In fact, most people (myself included when I got my start) don’t even know that “chassis” are an option for switch platforms. This is completely understandable for the typical college and/or Cisco NetAcad. graduate, since the foundational education is focused almost exclusively on networking theory and basic IOS operations. So when, where and why do you use a chassis-based versus a fixed configuration switch?

PS: I’ve decided to make this a two-part series, due to the verbosity of the information. This article will focus on comparing the two different classes of switches and why you might use over the other. In the second part, I’ll provide some real world use cases for both and where they might be typically deployed.

In this article, I’ll be using the following Cisco platforms* for comparison between the two options:

*Note: I’ll be sticking to Cisco purely for simplicity’s sake. Other vendors such as Juniper, HP and Brocade carry their own lines of switching platforms with similar properties. With a bit of research, you can apply the same logic when evaluating the different platforms for your specific implementation.

WHY

Port Density

This is one of the more obvious variables. The Catalyst 6509, a 14RU 9-slot chassis, can have upwards of 384 (336 in dual-supervisor setups) Gigabit copper interfaces. These chassis can also utilize 10Gbps line cards, with those densities in just over 100 10GbE ports per chassis. However, it’ll depend on (especially on 6500’s) what supervisor modules you’re running along with your PFC/CFC/DFC daughtercards that will determine if those densities are a bit lower or higher than those numbers. This is where it’s critical to do your research before placing your orders or moving forward with implementations.

On the flip-side, fixed-configuration switches are just that – fixed chassis with limited modular functionality. Some exceptions apply, such as the Nexus 5500 switches that have slots to add additional modules to. However, generally speaking, WYSIWYG with these classes of switches. If we look at the same rack space of 14RU as with the previous example, that could potential be fourteen 48-port Gigabit Ethernet switches. A slew of Catalyst 3750’s gives you a whooping 672 ports in the same rack space. However, keep in mind that’s fourteen switches, as opposed to a single chassis-based switch. You’ll have to keep this in mind when putting this hardware into your topology (to be discussed below). Unless of course you plan to run your switches in a stack via technology such as Cisco StackWise, which helps reduce the burden of managing so many switches separately. Since Cisco stacks are restricted to a maximum of 9 switches per stack, you’ll be looking to manage at least two different switch stacks to match the 14RU and achieve the most port density in this comparison.

A side note for 10-Gig connectivity. The Nexus 5548UP can use up to 32 10GbE plus 16 with optional expansion module in a 1RU form factor. To compare to the 6509, that’s over 600 10GbE ports in a 14RU space. While it may be unfair to compare a newer Nexus series switch directly to a 6500 chassis, it’s comparison is purely for the discussion of form factor differences. A quick look at the Nexus 7009 chassis has similar 10GbE density as the 6509, which increasing densities with the larger chassis’s.

At the end of the day, fixed-configuration switches (generally speaking) packs more ports in the same amount of rack space as chassis switches.

Interface Selection

Typically, fixed-configuration switches are copper-based FastEthernet and/or GigE. Exceptions again exist, as with the Nexus 5500’s (Unified Ports models) being able to run their interfaces in Ethernet or native Fibre Channel. In your fixed-config’ers, you’ll also typically have higher-speed uplink ports that support higher-speed optics such as SFP/SFP+, GBIC, XENPAK and X2’x. And of course, there are SFP-based flavours that give you all fiber ports if that’s what tickles your fancy.

On your chassis-based switches, you will see that you have a wider choice of interface types. You’ll have your 10GbE line cards, which can be a combination of SFP/SFP+, XENPAK (older), X2, RJ45 and GBIC transceivers. You can also make use of DWDM/CWDM modules and transceivers for your single-mode long-range runs. Also, with 40 Gigabit Ethernet becoming more relevant and deployed, QSFP+ and CFP connectivity is an option as well (if the chassis in question can properly support it). The only restriction on chassis-based switches are what line cards you have to work with.

Due to the nature of fixed switches, it’s natural that a chassis comprised of modular line cards has more interface selection.

Performance

Here’s where things become a little more subtle. Again, for simplicity’s sake, I’ll restrict this section to the following models of switches:

  • Catalyst 6509 Chassis w/ Supervisor 720
  • Catalyst 3750-X 48-port Gigabit Switch
  • Nexus 5548UP Switch

Let’s start with the fixed switches. The fixed-configuration Catalyst 3750-X switches are equipped with 160 Gbps switch fabrics which should be ample capacity for line-rate gigabit speeds. Let’s also not forget that fabric throughput is not the only measure of performance. These switches, and most similarly designed fixed-configuration “access” switches, have smaller shared port buffers which can become problematic with very bursty traffic.

On the Nexus 5548UP’s, we see a different story. Being a 10GbE switch with a 960Gbps fabric, naturally we see the 5548 have much higher performance than it’s LAN cousins. Port buffers on the Nexus 5500’s are dedicated per port (640KB), allowing these switches to handle bursts very easily.

The Catalyst 6509, being a modular chassis-based switch, bases its performance on the Supervisor modules in use, as well as the specific line card+daughter card combination. For simplicity’s sake, let’s assume a Supervisor 720 (since the switch fabric is located on the supervisor in these switches, that’s 720Gbps switching capacity across the entire chassis) and X6748-GE-TX 48-port 1Gig ethernet modules. Due to the hardware architecture of these chassis, each slot is constrained to 40 Gbps capacity per slot, so slight over-subscription for 48x1Gbps line cards will occur. Luckily, each port is given a 1.3MB buffer so bursty traffic is handled just fine on these line cards. When using DFC daughter cards, line cards will even handle their own local switching and so won’t be constrained to the 40Gbps-per-slot restriction. This is because packets don’t need to traverse the backplane when switched port-to-port on the same line card. May I reiterate that for these kind of switches, do your homework. The performance of a chassis-based switch depends on more factors due to the combination of supervisor, switch fabric and line cards in use.

Redundancy/HA

One of the most obvious benefits of a chassis-based switch is redundant and highly-available hardware. The Catalyst 6500 is typically deployed with two Supervisor modules. By having two supervisors, you’re protected by failures in the data plane (due to redundant active/standby switch fabrics), the control plane (NSF/SSO) as well as the management plane (IOS is loaded on both supervisors and thus can continue to operate even when an active Sup fails). Chassis switches also utilize redundant power supplies to protect against electrical failures.

On the other side of the coin, you have fixed configuration switches. While some newer switches do utilize redundant power supplies, none of them use separate modules or components for data, control or management plane. They utilize on-board switch fabrics for forwarding as well as a single CPU for running your IOS images and control plane protocols.

Chassis here is the clear winner.

Services

Let me start this section by delving into what I mean by “services”. This includes features that are considered additional to basic switching. This will include support for things such as partial and/or full Layer 3 routing, MPLS, service modules such as firewalls and wireless LAN controllers (WLC) and other extras you may find on these platforms.

I think it’s safe to say that, generally speaking, fixed configuration switches are simple devices. As with the Catalyst line, with software, you can utilize full Layer 3 routing and most normal routing protocols such as OSPF, EIGRP and BGP. Keep in mind, however, that you will often be limited by TCAM capacity for IP routes so forget about full BGP tables and the like. However, they get the job done. One great benefit is support for Power over Ethernet (PoE), which is usually standard on copper-based fixed configuration switches.

The Nexus 5500’s are a bit of an exception. On the one hand, out of the box, they are Layer 2 only devices that can only support (albeit limited) Layer 3 with an expansion module. However, with the Unified Ports, they also support native Fibre Channel as well as modern NX-OS software. I would say that, for specific use cases, the compromise is quite reasonable. Elaborating on that will be in my next post.

The Catalyst 6500 is the champion of services. Being a modular chassis, Cisco developed many modules that were specifically designed with services in mind. This includes the Firewall Services Module (FWSM), Wireless Service Module (WiSM) controllers, SSL VPN module, and ACE/CSS load balancers. While these modules have fallen out of favour due to performance constraints of the chassis itself as well as lack of development interest and feature parity with standalone counterparts, the fact of the matter is that there are still many Cisco customers in the field with these in place. The WiSM, for example, is essentially a WLC4400 on a blade. Being able to convenient integrate that directly into a chassis saves rack space as well as ports by using the backplane directly to communicate with the wired LAN. Other services supported on the 6500 from a software standpoint include Virtual Switching System (VSS) (with use of the VS-SUP720 or SUP2T), full Layer 3 routing with large TCAM, MPLS and VPLS support (with proper line cards) and PoE.

The chassis win in this category due to the modular “swiss army knife” designs.

Cost

I’ll just briefly mention the cost comparison between the two configurations. You will typically see a chassis-based switch eclipse a fixed configuration switch in terms of cost, due to the complexity of the hardware design as well as all the modularity with chassis. You’ll always have a long laundry list of parts that will have to be purchased in order to build a chassis-based switch, including the chassis itself, power supplies, supervisors, line cards and daughter cards (if applicable). Fixed configuration switches typically have a much lower cost of entry, with only limited modularity with certain platforms.

And there you have it. Hopefully, this will give you an insight into why you might use one form factor over the other. In my next part, I’ll provide some use-case examples for each and where you may typically deploy one versus the other.

On preventing burn out and spreading yourself too thin

Networking is full of what I like to call “rabbit holes”. You start looking into a technology or a solution and before you know it, you’ve lost hours of time pouring over white papers, best-practice design guides, sample configurations, blog posts and labs. There’s a lot of pieces that make up the networks that we work with daily, from QOS to routing, switching, WAN, hardware architectures, protocols…the list goes on and on. Depending on your role in your organization, you could be working with a few technologies and platforms very intimately or you could be spread across multiple parts of the overall infrastructure.

Working for a VAR for multiple vendors, I find it difficult sometimes to find a middle ground between knowing enough about the equipment and environments we support to solve the problems our customers have, what’s to come from the vendors and getting the expert knowledge I crave. In the last year alone, I’ve touched probably everything under the sun from our lovely vendors, including data center gear, wireless, security and SP (only missing voice). While I certainly know a lot more than I did a year ago, I also find that I’m unable to really dive into any one part in particular.

Like I said, its highly dependent on your role in your organization. I’m sure there’s a lot of folks out there that would love to get away from the hundreds of ASA’s they support or the 6500’s that are still chugging along in their campus core (*shudder*) and wiring closets to get their hands on something new. A compromise between both extremes is, in my opinion, a sweet spot.

I’m constantly challenged, engaged on new projects and new solutions, realizing customer goals and solving complex technical issues. I just warn to my fellow networking colleagues that it’s very easy to spread yourself too thin. I find myself having to stop myself from sticking my nose into every new thing that comes into my office, just so that I can focus on what’s on my plate.

Don’t get me wrong, I’m loving the challenge and wouldn’t want to work in any other part of our IT industry. I just want to avoid being that “jack of all trades, master at none“. With all the new technologies coming out (especially in the data center), you got to keep your head above water from drowning in all the stuff that puts everything together.

Maybe it’s time for a change of pace or at least a change in attitude. I’m currently back reviewing my R&S to possibly put myself on the coveted path of the CCIE lab (I actually had a dream/nightmare about getting thrown into the lab exam…it was exciting but terrifying at the same time). I just hope that I don’t spread out too thin that I burn out. I’m sure we’ve all been there at some point or another.

I’d love to hear your thoughts on this so please leave a comment or send a tweet my way. In the meantime, keeping plumbing!

PS: I love Ivan’s post about knowledge and complexity. Given the nature of this post, I find it rings true to home/work a lot. Great advice from Mr. Pepelnjak as always.

Junos public/private key SSH authentication

Hi Everyone,
Just a quick one today. I was reconfiguring my lab SRX for direct SSH access and in the interest of security, wanted to use RSA public/private keys for authentication. I did my usual key generation using puttygen (sorry guys, Windows user here), copied the OpenSSH authorized_keys public key string that Junos uses, applied it to the user of my choice and off I went…or so I thought. Here was my initial configuration:

[edit]
admin@LabSRX# show system login
user admin {
    uid 2002;
    class super-user;
    authentication {
        encrypted-password "<plaintext passwd hash>"; ## SECRET-DATA
        ssh-rsa "ssh-rsa <key data>"; ## SECRET-DATA
    }
}

Seems simple enough. However, when I went to login using the private key that I had just created for this public key pair, my SRX complained:

Using username "admin".
Authenticating with public key ""
Server refused public-key signature despite accepting key!

Huh? I could’ve sworn that pair was correct. I tried generating another pair, just to be sure but the SRX still didn’t want to accept it.

After fiddling with the SSH protocol version and other non-related parameters, I logged into one of my work’s lab SRX’s to see if anyone was using RSA there.

Lo and behold, I forgot the one part in key string needed to authenticate with it: appending the user name to the public key string:

admin@LabSRX# show system login
user admin {
    uid 2002;
    class super-user;
    authentication {
        encrypted-password ...
        ssh-rsa "ssh-rsa <key data> admin"; ## SECRET-DATA
    }
}
[edit system login user admin]
admin@LabSRX# commit
commit complete

After my commit, I was able to use my private key to authenticate to the SRX.

You can have puttygen append the username using the “Key comment” field:

I did some digging around but couldn’t find any mention of this in the Junos documentation. My guess is that OpenSSH includes the username when using ssh-keygen in Linux/Unix. Regardless, just something I’ll have to remember when doing this again.

Basics of a QFabric

Earlier this month, I attended Juniper’s Configuring & Monitoring QFabric Systems in preparation for our customers interested in QFabric for their data centers. Having listened to Packet Pusher Show 51 on Juniper QFabric, I thought I had known all there is to know to QFabric. Throughout the course, I quickly realized that while I did get the “gist” of what QFabric looks like and what problems it solves, there is a bit to know on getting the system up and running. I suggest all of those interested to listen to the Packet Pushers show to at least get the basic idea of what composes a QFabric. Below I’ll list each piece and its function:

  • QFabric Nodes: Comparing the system to a traditional chassis, the QFabric Nodes are the equivalent to line cards. These provide the ports to your external devices such as servers, storage and networking devices (routers, firewalls and load balancers, etc). They are high-density 10GbE (in the case of QFX3500) and 40GbE (QFX3600) switches that can be positioned where your traditional top-of-rack switches might be in the data center. QF Node switches can be implemented in brownfield deployments and can be run as standalone ToR switches, supporting all the classic switch features such as STP, LAG, etc., until an organization decides to go forward with a full QFabric deployment.
  • QFabric Interconnect: Back to our chassis analogy, the Interconnects act as a backplane for the system. It’s sole purpose is to forward packets from one Node to the next. This is high-speed transport to interconnect (hence the name) everything in the fabric.
  • QFabric Directors: Lastly, thinking to our chassis example, this is the Routing Engine (RE) or supervisor of the system. The Director is responsible for managing the QFabric by providing the CLI to the admins and also handles the control plane side of things such as building routing and forwarding tables, as well as managing the QFabric devices. All of the work done to configure and monitor a QFabric system is done on your Directors.
  • Out-of-Band Control Plane (EX4200 in Virtual Chassis’s)*: An out-of-band control plane network is required to connect all the Nodes, Interconnects and Directors. Note that this network is only used within the QFabric for control and management plane communication between all your QF pieces. It does not interact with your existing OOB management network. Juniper provides configuration of EX4200 switches that are to be used for this network so no configuration *should* be performed on these switches. This network serves as an out-of-band control plane network so that no configuration, management, or Layer 2/Layer 3 network control goes over the data path.
  • *Note: For simplicity’s sake, Juniper recommends customers to follow the port cabling as detailed in the following techpubs. All EX4200 control plane switch configurations follow this cabling and you will most likely run into support issues if you do not follow this. As always, YMMV. Connecting the QF Directors, connecting the QF Interconnects, and connecting the QF Nodes to the control plane switches. Keep in mind that Juniper offers two different deployments of QFabric, -G and -M. Cabling may vary depending on which deployment you choose!

    Now that you have the basics of what makes up a QFabric, let’s look at some of the finer details of the system.

    Director Group/Cluster

    For any QFabric deployment, at least two QF Directors are required. QF Directors are grouped into Director Groups or clusters, which can load-balance certain functions between the two. Configuration, topology information, device status and state information is synchronized between all QF Directors in a Director Group (DG). The DG also hosts a number of Routing Engines (RE), each with a specific purpose. For example, DG run a Fabric Manager RE, which provides routing and forwarding functions to QF devices such as topology discovery, internal IP address assignment and inter-fabric communication. Another RE running on the DG is used for the Layer 3 functions of the Network Node group (see below). All REs are virtualized under the hood, running off of a Juniper CentOS hypervisor, and are shared across individual directors in either an active/active or active/standby setup (depending on the function required for the RE). Most of this is very under-the-hood and does not require any direct interaction. The only parts that most operators will be interested in is the single point of management for the entire QFabric. Your DG provides the JUNOS CLI as well as DNS, DHCP, NFS, SNMP, syslog and all your other expected management pieces on traditional Juniper switches.

    Topology & Device Discovery

    Devices are discovered via internal routing processes on each QF device. The Fabric Manager RE on the Director Group, as well as QF Nodes and Interconnects, use what Juniper calls “system discovery protocol”. This protocol is essentially IS-IS extended for use with QFabric, with each device sending out IS-IS-style Hellos across the both the control plane EX4200 VC’s and the 40Gbps/100Gbps* data path to discover one another. The end result is that each node knows about every other node and all data paths can be used for ingress-to-egress through the fabric, similar to multipathing in Layer 3. On the control plane side of things, instead of using simple signaling on a backplane for each “line card” and RE, QFabric is one big TCP/IP LAN and communicates as such. While I’ll leave this blog post with this simplistic explanation of the under-the-hood workings, I suggest reading Ivan’s excellent post at ipspace.net of QFabric’s inner BGP/MPLS-like functions. The internal workings are a little obfuscated from current literature and unfortunately I don’t have the SSH sessions saved from my time on the course. Things like the internal addressing (uses both 169.254.0.0/16 and 128.0.128.0/24 addresses) and routing will be the topic of a future post.

    *Note: Roadmap, currently only 40Gbps backplane.

    Node Groups

    Each Node in a QFabric is designated as part of one of three kinds of “node groups”. These node groups define what role and type of connectivity is required for the node. Note that each QF Node uses its own local Packet Forwarding Engines (PFE) and Route Engines (RE) to perform line-rate forwarding. Forwarding performance is distributed across all the QF Nodes, instead of being punted to a central control like a supervisor. Below is a list with a brief explanation of the three different kinds of node groups:

    • Server Node Group: consists of a single QF Node and only runs host-facing protocols such as LACP, LLDP, ARP and DCBX. Used to connect servers that do not require cross-node redundancy (ie. servers connected to a single Node). This is the default Node Group for QF Nodes.
    • Redundant Server Node Group: Consists of two QF Nodes and only runs host-facing protocols similar to a Server Node group. The difference is that servers can create LAGs across both QF Nodes in a Redundant Server Node group. Of the two Nodes in a RSNG, one is selected as the “active” RE. The other node is a standby and fails over to it should the active fail. Both Nodes utilize their PFEs for local forwarding.
    • Network Node Group: Consists of one or more Nodes (up to eight/sixteen* in future releases). This group runs your L2/L3 network-facing protocols such as Spanning Tree, OSPF, BGP and PIM. Only one Network Node group exists in a QFabric system. RE functions for a Network Node group are sent up to the Directors for control plane processing –

    By the way, to convert a QFX3500 or QFX3600 switch to become a QF Node and join a QFabric, simply run the following command & reload the box:

    root@qfabric> request chassis device-mode node-device
    Device mode set to `node-device' mode.
    Please reboot the system to complete the process.

    All interface-specific configuration uses the aliases assigned to each QF Node (default names uses each nodes serial number, this can be changed under the edit fabric aliases stanza). Below is a small JUNOS config snippet for a QFab:

    chassis {
        node-group NW-NG-0 {
            aggregated-devices {
                ethernet {
                    device-count 1;
                }
            }
        }
        node-group RSNG-1 {
            aggregated-devices {
                ethernet {
                    device-count 48;
                }
            }
        }
    }
    interfaces {
        NW-NG-0:ae0 {
            aggregated-ether-options {
                lacp {
                    active;
                }
            }
            unit 0 {
                family ethernet-switching {
                    port-mode trunk;
                    vlan {
                        members all;
                    }
                }
            }
        }
        Node-0:ge-0/0/12 {
            unit 0 {
                family ethernet-switching;
            }
        }
    ...

    This is where it becomes apparent that a QFabric “looks like” (from a configuration standpoint) a single giant switch.

    There’s quite a bit of moving parts and I’ve just scratched the surface here. Will be diving deep myself and will update my blog accordingly :).

    Thanks to Juniper for the excellent course CMQS. Other references used are the QFabric Architecture whitepaper and the QFabric deployment guides on Juniper’s website.

CCIP completed, onto a different brand of Koolaid

Earlier this month, I sat for my Qos 642-642 exam to complete my CCIP certification. Other than a few gripes with out-dated information, the exam went over pretty smoothly and I hammered out a pass. I’ve written previously of my motivations for obtaining the CCIP cert and am glad to have stuck with it. Even though the certification will officially retire in a week or so, a lot of the topics covered will also be on the CCIE R&S version 4.0 blueprint. I doubt I’m finished with BGP, MPLS and QoS so I’m keeping that knowledge tucked away for the time being 😉

Just one last note on CCIP, I would highly recommend Wendell Odom’s Cisco QoS Exam Cert Guide for anyone looking to learn about QoS on Cisco IOS. This is one of the best Cisco Press books I’ve read and continue to reference it for everything IOS QoS.

Now that I’ve a broad brush of Cisco R&S technologies with my CCNP and CCIP, I’ve decided to re-visit my Juniper studies. While we don’t work all that much with Juniper at $DAYJOB, we have Juniper gear in the lab to play with. Recently, I’ve been using EX4200 and EX4500 switches as well as working through Juniper’s free JNCIS-ENT study guide. Coming from a Cisco background and particular having gone through CCNP, I’m finding there’s a good amount of overlap. It’s just learning all the JUNOS hierarchies and “where is that feature” in JUNOS.

Upcoming posts will cover some basic JUNOS switching on EX and interoperating with Cisco Catalyst 3560/3750’s. I’ll also be finishing a lot of my draft posts from earlier this year covering BGP, MPLS and some vendor ranting 😛

Stay tuned.