I recently introduced VTP pruning on a LAN, and now I have some connectivity problems on certain VLANs. The more I look at the problems, the more I wonder whether there is some strange behavior in VTP pruning. The questions I need to answer are:
- Is pruning based on whether the switch has any downstream clients; that is, whether there are any active access ports or unpruned downstream trunks on the VLAN? Or is it based on whether there are any downstream CAM entries for the VLAN?
- Is it possible for a switch to prune a VLAN off a trunk that is the root port for that VLAN?
Here is my apparently anomalous situation. I have four VLANs, 21-24, that serve as point-to-point links between remote sites to carry server heartbeats. On each of these VLANs, there are only two hosts: one on each site. Here is the spanning-tree for VLAN 21:
CC80#show spanning-tree vlan 21
VLAN0021 Spanning tree enabled protocol rstp
Root ID Priority 24576
Address 0007.4f62.a014
Cost 15
Port 72 (Port-channel1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32789 (priority 32768 sys-id-ext 21)
Address 001b.2ae8.b280
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Gi0/16 Desg FWD 4 128.16 Edge P2p
Po1 Root FWD 3 128.72 P2p
Po2 Desg FWD 3 128.80 P2p
Notice one or two things about this. Firstly, Po1 is the root port. Secondly, that G0/16 is an “up” access port on this VLAN. Thirdly, I should mention that Po2 is a trunk to another access layer switch at the same level. But Po2 spends its time in blocking state on the other switch except in an emergency. That is why this switch prunes most VLANs off Po2, as we will see.
Now let us look at the trunks:
CC80#show int trunk
Port Mode Encapsulation Status Native vlan
Gi0/3 on 802.1q trunking 2
Po1 on 802.1q trunking 12
Po2 on 802.1q trunking 12
Port Vlans allowed on trunk
Gi0/3 1,169
Po1 1-2,5,12,21-26,169
Po2 1-2,5,12,21-26,169
Port Vlans allowed and active in management domain
Gi0/3 1,169
Po1 1-2,5,12,21-26,169
Po2 1-2,5,12,21-26,169
Port Vlansin spanning tree forwarding state and not pruned
Gi0/3 1,169
Po1 1-2,5,12,22,24-26,169
Po2 1
As expected, most of the VLANs are pruned off Po2 as te other end of Po2 is on STPblocking state. Ignore G0/3; this is a server trunk. The interesting thing is that the switch has pruned VLAN 21 from the root port trunk, Po1. Why? This has effectively cut this switch off from VLAN 21.
VLAN21 is pruned from both Po1 and Po2, and yet it has an access port on it. Now, that access port, G0/16, is apparently not receiving any MAC traffic from its connected host. There is nothing in the CAM table except the upstream switch. But it is still isolated, so it cannot see any traffic from the remote part of the VLAN, so it does not respond:
CC80#show mac-address-table dyn vlan 21
Mac Address Table
-------------------------------------------
Vlan Mac Address Type Ports
---- ----------- -------- -----
21 0016.c73d.a22b DYNAMIC Po1
Total Mac Addresses for this criterion: 1
One last possible clue. We have four VLANs, each with two host connections. Two of them work, two of them don’t. The difference is that they take different paths. Two of them are rooted on a 4506 running IOS 12.2(25)EWA2, and they work; they are not pruned anywhere between the two sites. The two that do not work are rooted on a 4003 running CatOS 8.4(5)GLX.
Update 19/04/2008:
I think I have sorted it out, and I think it is a bug in the root switch for VLAN 21. Unlike our other VLANs, VLAN 21 is rooted in a CatOS switch. Due to various circumstances in our network, it tripped over a bug.
The bug is related to one I found a couple of years ago. I found that in a CatOS switch, if you manually disallow the native VLAN (in my case VLAN 12) from a trunk, then it stops the trunk passing BPDUs for VLAN1 as well. At the time, that resulted in a 5-minute meltdown of my network.
Here are the notes I have made for this new bug. Sorry about the generalisations … I do not know the VTP protocol very well yet.
Normally, VTP signalling is carried on the native VLAN of each trunk. By default, the native VLAN is VLAN 1, but you are allowed other values. We use VLAN 12 as native, a VLAN that is unused anywhere on the network. Now, an IOS switch will never prune VLAN 1 from a trunk. Nor will it prune the native VLAN. However, CatOS has a bug: if the (non-1) native VLAN is unused, it will prune it from the trunk regardless of the fact that it is the native. Once the native VLAN is pruned, of course, the VTP signal cannot be propagated to other switches.
It happens that we have one CatOS switch in our core loop. That switch is the root for VLANs 21 and 23. (And fortunately only for VLANs 21 and 23.) Because that root switch had pruned the native VLAN from its trunks, it was no longer able to send VTP unprune signals for VLANs 21 and 23 to its neighbors. Its neighbors therefore pruned VLANs 21 and 23 from the trunks to the root. The result was that there was no connectivity in VLANs 21 and 23, and every switch pruned all ports on those VLANs.
I resolved the problem by rolling back the VTP pruning.