Quantcast
Channel: John McAuley » Network Virtualization
Viewing all articles
Browse latest Browse all 10

VLAN Port Limit (Continuing to Gain an Understanding)

$
0
0

Wouldn’t you know it!  Just as we start digging into the VLAN port count (or VLAN port instance or STP logical interface) in preparation for our Vblock cloud deployment, we see a mysterious error pop up one of our Nexus 5020 clusters we deployed in our lab environment.  We use this environment to test the limits of our cloud pods as we scale the environments.  Here is the error we got:

STP-2-VLAN_PORT_LIMIT_EXCEEDED

After bringing the concerns up with VCE a few weeks ago surrounding VLAN port limits within UCSM and the Fabric Interconnects, we had some discussion around the topic and as you can see from earlier posts, we received lots of different answers as to how to arrive at this limit and what to do to get around it.  At the time, it was our understanding that this was a UCS specific challenge.

It turns out that it also applies to the Nexus switch products as we found out by seeing the error above and then dealing with the fix.  We have a somewhat official answer on how to calculate this within a UCS environment and I’ll address that in a following post.  However, I wanted to make sure everyone was aware of this limitation on the Nexus 5000 and Nexus 5500 series switches.  It’s a very real problem, especially in a service provider/multi-tenant type cloud environment.

On the Nexus switches, this number is calculated basically like this:

(VLANS_ON_TRUNK_1 + VLANS_ON_TRUNK_2 + VLANS_ON_TRUNK_X…..)

So in this small cloud pod that we experienced the error, we had 30 trunks each carrying all the VLANs configured within the environment which was a total of 112 VLANs.  This brought our total VLAN port count to 3360 on each Nexus 5020 in the pod.  The limit on the firmware we were running was an older version (this cloud pod has been running without failure for over two years).  The VLAN port limit on that version supported up to 3140.

Well obviously we were over this limit and the cloud pod was operating just fine with no issues.  So what is the effect of exceeding this limit?  We asked Cisco.  And what they told us was fairly alarming.  They said you could continue to provision new VLANs and new trunks.  However, some VLANs will end up without a spanning-tree instance, meaning if you have redundancy in the network with multiple paths for that VLAN, a loop would be created because spanning-tree would not be blocking for that VLAN.

To address this, our plan of action was:

  • Immediately put together a plan to prune the VLANs.  Since all the trunk ports were carrying all of the management VLANs and customer VLANs, we could prune lots of numbers off that count simply by carrying only the needed VLANs across the trunk.  This allowed us to get back to a safe number until the next step could be implemented.
  • After we had pruned as much as possible, we began planning for a firmware upgrade.  For the Nexus 5000 series, 5.0(3) in L2 configurations will bring our VLAN port limit up to 12000.  This will allow us to add a lot more blades and more VLANs without reaching that limit anytime soon (at least we hope we don’t reach it before the next update).  Obviously this was a major change for our cloud pod.  This environment had been up and running with no issues for over 2 years so this put us to the test.  Everything worked as planned thanks to full redundancy and lots of failure testing two years ago, but it’s still nerve racking to mess with a stable environment like this.
  • Our next step would be to convert the trunks from each ESX host into a single Virtual Portchannel.  Now that we had a version of code that supports VPC, we can use it on this cloud pod (tells you how long we’ve been doing cloud services).  You’ll see why this is critical below.

If you are experiencing this same problem or think you might in your environment, here are a few things you need to know.  Again, keep in mind that this is for the Nexus 5000 and 5500 specifically.  I’ll address the other parts of UCS in a future post:

  1. Portchannels and members of Virtual Portchannels (VPCs) count as one trunk!  This is major.  In our environment, this effectively reduces our trunk count by half.  Since we are dealing with small limits here, every number counts.  If I can shave 15 trunks carrying 112 VLANs each out of my environment, I just saved 1680 VLAN port instances.
  2. Trunks or portchannels created between a parent Nexus switch and a Fabric Extender (such as a Nexus 2148 in our case) count as a trunk in the overall count.
  3. Trunks or portchannels created on a FEX will count towards the trunk count on the parent Nexus.  In hindsight this seems obvious since we’re dealing with a single control plane and the FEX is viewed like a line card, but this surprised us a bit during our testing.
  4. SPAN ports don’t count towards the overall number.  We thought it would.  Part of our VLAN pruning plan was to take the trunks going to our cloud IDS appliances and only trunk the customer VLANs for the customers who had purchased that service, instead of allowing all VLANs for ease of provisioning in the future.  We did this, but the port count did not decrease, indicating that SPAN ports aren’t counted as trunks.  I guess this makes sense since it wouldn’t be carrying spanning-tree for these VLANs.
  5. Most importantly, and perhaps I should have put this first, you need to take this into consideration as you design your environment.  This is the kind of thing that no one talks about up front but it will sneak up on you and present itself at a time further down the road when you aren’t expecting any issues like this.  If you are in a service provider or multi-tenant cloud environment, this is critical because of the number of VLANs you need to carry.
  6. We are deploying vCD (vCloud Director).  I know they want you to use VCD-NI but anyone who has been in networking as long as I have understands the challenges of creating extremely large layer-2 bridging environments.  I’ve also read some pretty good posts where some guys have run very good packet captures in a VCD-NI environment and proven that there are some real security concerns.  If you’re in an enterprise environment where segmentation is nice but not business impacting if it breaks, that’s ok.  If you’re in a multi-tenant environment where a customer’s data is their business and if it gets compromised they’re out of business, that is another story.  I understand that Cisco is working on using IP encapsulation to accomplish the same type goals as VCD-NI without the L2-bridging challenges.  I think we’ll see something about that around VM World in a few weeks.
  7. For the Nexus 5500 series switches, the VLAN port limit is 14,500 so you get 2500 more than on the Nexus 5000s with the same code.


Viewing all articles
Browse latest Browse all 10

Latest Images

Trending Articles



Latest Images