Kevin Dorrell, CCIE #20765

26 Apr 2008

NMC 17.9.3 – NAT related bug?

Filed under: IOS Bugs, NAT — dorreke @ 15:50

I cannot do the NAT part of this lab.  On R6, as soon as I put NAT on either of the Fa subinterfaces, it locks up.  Stone dead.  No ping responses, no adjacencies, nothing.

R6#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R6(config)#int f0/0.30
R6(config-subif)#ip nat out

Strange, because R3 seems to be happy with it.  (Apart from complaining it took too long, but then 12.4(2)T always does that when you introduce NAT.  It only seems to do that first time you introduce ip nat inside or ip nat outside; subsequent interfaces are OK)

R3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R3(config)#int s0/0
R3(config-if)#ip nat out
*Apr  8 16:20:00.186: %SYS-3-CPUHOG: Task is running for (2003)msecs, more than (2000)msecs
(1/0),process = Exec.
-Traceback= 0x813C6850 0x813C496C 0x813C874C 0x813C8A1C 0x813C8AF8 0x813C48B0 0x813C4954
 0x813C9120 0x813C48B0 0x813C4954 0x813C7384 0x813C48B0 0x813C4954 0x813C49DC 0x813F9A08
*Apr  8 16:20:00.939: %LINEPROTO-5-UPDOWN: Line protocol on Interface NVI0, changed state to up
R3(config-if)#int s0/0.134
R3(config-subif)#ip nat out
R3(config-subif)#int f0/0
R3(config-if)#ip nat in
R3(config)#ip nat inside source static

I wonder what happens if I introduce ip nat inside on Lo106 first, as a dummy.  No, that locks it up as well.  I wonder whether my problem on R6 is anything to do with the ISL trunking?  Tried shutting down Fa0/0 and then introducing ip nat inside on Fa0/0.20.  Same thing … total lockup.

The nearest I could find in the bug database is CSCse48814.  But that applies specifically to RSP routers, and only when using NBAR, and only mentions ip nat outside.  That one is fixed in 12.4(10.4)T.

It’s just as well they use that special bug-free version of IOS in the real exam!


I tried the same config on another router.  Same model (2611XM), same IOS (12.4(2)T advent), same hardware config (WIC-1T in slot 0).  This one allowed me ip nat inside, although still with the CPUHOG warning.  Maybe I have a faulty router in my stack.  Not good news.  (But then neither would a bug be good news.)  I shall look out for future instances of this problem.

20 Apr 2008

Oh-Oh! Have I hit a bug?

Filed under: General, IOS Bugs, IPv6 — dorreke @ 18:02

Following NMC Lab16, I reloaded my stack, only to find R1 is in a reload loop.  This isn’t the first time it has happened, and it was R1 last time as well.  I wonder if I have a hardware problem, or whether it is an IOS bug.  Last time I managed to fix it by loading without the NVRAM, just like a password recovery, then loading the config once the router was running.  This time, the technique didn’t work; as soon as I pasted in the old config, it would crash again.

So instead, I loaded up the router without the NVRAM config, then added the config section by section.  I narrowed down the crash to a distribute-list in my IPv6 RIP process.  The distribute list was called “default-only”, and I think it didn’t like that for some reason.  It referred to a prefix list that has only the default route in it. Anyway, I shouldn’t have needed it, because the SHOWiT solution is much more elegant: ipv6 rip RIP default-information only.

As someone recently commented, it’s just as well they use that special bug-free version of IOS for the CCIE exam labs!


19 Apr 2008

This, that, and the other

Filed under: IOS Bugs, LAN Switching, VTP — dorreke @ 13:27

Added the resolution to my posting a couple of weeks ago about a VTP pruning problem.  I came to the conclusion it was a bug in CatOS.  CatOS will quite happily prune the native VLAN of a trunk if it thinks it is not needed.  However, in doing so, it can screw up the VTP pruning (or more specifically, grafting) mechanisms for other VLANs.  I need to investigate this further to make a more coherent explanation, but I’m sure that it the basis of the problem.  Here is me discussing it on NetPro.



09 Apr 2008

VTP Pruning : Is this a bug?

Filed under: IOS Bugs, LAN Switching, VTP — dorreke @ 14:29

I recently introduced VTP pruning on a LAN, and now I have some connectivity problems on certain VLANs.  The more I look at the problems, the more I wonder whether there is some strange behavior in VTP pruning.  The questions I need to answer are:

  1. Is pruning based on whether the switch has any downstream clients; that is, whether there are any active access ports or unpruned downstream trunks on the VLAN?  Or is it based on whether there are any downstream CAM entries for the VLAN?
  2. Is it possible for a switch to prune a VLAN off a trunk that is the root port for that VLAN?

Here is my apparently anomalous situation.  I have four VLANs, 21-24, that serve as point-to-point links between remote sites to carry server heartbeats.  On each of these VLANs, there are only two hosts: one on each site.  Here is the spanning-tree for VLAN 21:

CC80#show spanning-tree vlan 21
 VLAN0021   Spanning tree enabled protocol rstp
   Root ID    Priority    24576
              Address     0007.4f62.a014
              Cost        15
              Port        72 (Port-channel1)
              Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

   Bridge ID  Priority    32789  (priority 32768 sys-id-ext 21)
              Address     001b.2ae8.b280
              Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
              Aging Time 300
 Interface        Role Sts Cost      Prio.Nbr Type
 ---------------- ---- --- --------- -------- --------------------------------
 Gi0/16           Desg FWD 4         128.16   Edge P2p
 Po1              Root FWD 3         128.72   P2p
 Po2              Desg FWD 3         128.80   P2p

Notice one or two things about this.  Firstly, Po1 is the root port.  Secondly, that G0/16 is an “up” access port on this VLAN.  Thirdly, I should mention that Po2 is a trunk to another access layer switch at the same level.  But Po2 spends its time in blocking state on the other switch except in an emergency.  That is why this switch prunes most VLANs off Po2, as we will see.

Now let us look at the trunks:

CC80#show int trunk
 Port        Mode         Encapsulation  Status        Native vlan
 Gi0/3       on           802.1q         trunking      2
 Po1         on           802.1q         trunking      12
 Po2         on           802.1q         trunking      12

 Port        Vlans allowed on trunk
 Gi0/3       1,169
 Po1         1-2,5,12,21-26,169
 Po2         1-2,5,12,21-26,169

 Port        Vlans allowed and active in management domain
 Gi0/3       1,169
 Po1         1-2,5,12,21-26,169
 Po2         1-2,5,12,21-26,169

 Port        Vlansin spanning tree forwarding state and not pruned
 Gi0/3       1,169
 Po1         1-2,5,12,22,24-26,169
 Po2         1

As expected, most of the VLANs are pruned off Po2 as te other end of Po2 is on STPblocking state.  Ignore G0/3; this is a server trunk.  The interesting thing is that the switch has pruned VLAN 21 from the root port trunk, Po1.  Why?  This has effectively cut this switch off from VLAN 21.

VLAN21 is pruned from both Po1 and Po2, and yet it has an access port on it.  Now, that access port, G0/16, is apparently not receiving any MAC traffic from its connected host.  There is nothing in the CAM table except the upstream switch.  But it is still isolated, so it cannot see any traffic from the remote part of the VLAN, so it does not respond:

CC80#show mac-address-table dyn vlan 21
           Mac Address Table
 Vlan    Mac Address       Type        Ports
 ----    -----------       --------    -----
   21    0016.c73d.a22b    DYNAMIC     Po1

 Total Mac Addresses for this criterion: 1

One last possible clue.  We have four VLANs, each with two host connections.  Two of them work, two of them don’t.  The difference is that they take different paths.  Two of them are rooted on a 4506 running IOS 12.2(25)EWA2, and they work; they are not pruned anywhere between the two sites.  The two that do not work are rooted on a 4003 running CatOS 8.4(5)GLX.

Update 19/04/2008:

I think I have sorted it out, and I think it is a bug in the root switch for VLAN 21. Unlike our other VLANs, VLAN 21 is rooted in a CatOS switch. Due to various circumstances in our network, it tripped over a bug.

The bug is related to one I found a couple of years ago. I found that in a CatOS switch, if you manually disallow the native VLAN (in my case VLAN 12) from a trunk, then it stops the trunk passing BPDUs for VLAN1 as well. At the time, that resulted in a 5-minute meltdown of my network.

Here are the notes I have made for this new bug. Sorry about the generalisations … I do not know the VTP protocol very well yet.

Normally, VTP signalling is carried on the native VLAN of each trunk. By default, the native VLAN is VLAN 1, but you are allowed other values. We use VLAN 12 as native, a VLAN that is unused anywhere on the network. Now, an IOS switch will never prune VLAN 1 from a trunk. Nor will it prune the native VLAN. However, CatOS has a bug: if the (non-1) native VLAN is unused, it will prune it from the trunk regardless of the fact that it is the native. Once the native VLAN is pruned, of course, the VTP signal cannot be propagated to other switches.

It happens that we have one CatOS switch in our core loop. That switch is the root for VLANs 21 and 23. (And fortunately only for VLANs 21 and 23.) Because that root switch had pruned the native VLAN from its trunks, it was no longer able to send VTP unprune signals for VLANs 21 and 23 to its neighbors. Its neighbors therefore pruned VLANs 21 and 23 from the trunks to the root. The result was that there was no connectivity in VLANs 21 and 23, and every switch pruned all ports on those VLANs.

I resolved the problem by rolling back the VTP pruning.

Blog at