Kevin Dorrell, CCIE #20765

02 Apr 2008

Gobsmacked : BGP route selection algorithm

Filed under: BGP — dorreke @ 10:44

I’m gobsmacked.  (For non-speakers of London English, “gobsmacked” literally means “punched in the mouth”, but is used as a metaphor for “severely taken aback”.  In case you were wondering.) UPDATE.  Apparently my etymology was wrong, and the word originated in Liverpool, with a possible Scottish Gaelic origin for the word “gob”, or “gab” in Irish.  See here.)

I thought I knew the BGP path selection algorithm, but I have just read something that has turned my world upside down.  (OK, now I’m exaggerating a bit.)

I had always thought the BGP path selection algorithm whittles down the alternative paths by elimination until only one candidate remains.  But according to Wendell Odom’s CCIE Certification Guide, page 445, that is not so.  It appears that each of the 12 steps (or however many there are) is considered separately.  If any step fails to produce a unique clear winner, then all the paths are carried forward to the next step.  That is, unless there is a clear winner, the “non-best” paths are not eliminated.  To quote the book:

For example, imagine that R1 has five routes to 9.0.0.0/10, two with AS_PATH length 3 and the others with AS_PATH length 5. The decision process did not determine a best route before reaching step 4 (AS_PATH length). Step 4’s logic can determine that two routes are better than the others because they have a shorter AS_PATH length. However, because this step does not produce the single winner, the process moves on to the next step, and all five routesare still considered as candidates to be the best route. Each step either produces a winner or moves on to the next step, examining all routes to that NLRI at the next step.

This has some strange consequences that I would like to test.  Suppose we have two paths:

  • Path A, LOCAL_PREF = 100, AS_PATH length = 3
  • Path B, LOCAL_PREF = 80, AS_PATH length = 2

Now, according to the decision algorithm, Path A wins on local preference.  Now, suppose we get a new path coming in:

  • Path C, LOCAL_PREF = 100, AS_PATH length = 4

According to the algorithm as described by Wendell Odom, what should happen?

Well, at the LOCAL_PREF stage, there are two candidates that should win: A and C.  So, according to the algorithm, all 3 paths get passed on to the next step.  Let us suppose that none of them are locally injected.  So we arrive at the AS_PATH step.  Which path does it choose?  Path B, because that has the shortest AS_PATH.

So …. the arrival of path C has caused us to discard path A, and install path B in its place.  Is that fair?

P.S.

After a long discussion on NetPro, we came to the conclusion that this was a mistake in the book.  The algorithm works just as I thought it did up to now.  I am honoured that Russ White joined in the conversation.  Russ is Cisco’s “Mr. Routing”.  So that is pretty conclusive.  Phew!

Advertisements

3 Comments »

  1. Dear Kevin you may Un-gobsmack yourself.
    Here’s what the RFC has to say:

    RFC 4271

    9.1.2.2. Breaking Ties (Phase 2)

    In its Adj-RIBs-In, a BGP speaker may have several routes to the same destination that have the same degree of preference. The local speaker can select only one of these routes for inclusion in the associated Loc-RIB. The local speaker considers all routes with the same degrees of preference, both those received from internal peers, and those received from external peers.
    The following tie-breaking procedure assumes that, for each candidate route, all the BGP speakers within an autonomous system can ascertain the cost of a path (interior distance) to the address depicted by the NEXT_HOP attribute of the route, and follow the same route selection algorithm.

    The tie-breaking algorithm begins by considering all equally preferable routes to the same destination, and then selects routes to be removed from consideration. The algorithm terminates as soon as only one route remains in consideration. The criteria MUST be applied in the order specified.

    Several of the criteria are described using pseudo-code. Note that the pseudo-code shown was chosen for clarity, not efficiency. It is not intended to specify any particular implementation. BGP implementations MAY use any algorithm that produces the same results as those described here.

    **** a) Remove from consideration all routes that are not tied for
    having the smallest number of AS numbers present in their
    AS_PATH attributes. ********** Note that when counting this number, an
    AS_SET counts as 1, no matter how many ASes are in the set.
    b) Remove from consideration all routes that are not tied for
    having the lowest Origin number in their Origin attribute.
    c) Remove from consideration routes with less-preferred
    MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable
    between routes learned from the same neighboring AS (the
    neighboring AS is determined from the AS_PATH attribute).
    Routes that do not have the MULTI_EXIT_DISC attribute are
    considered to have the lowest possible MULTI_EXIT_DISC value.
    This is also described in the following procedure:
    for m = all routes still under consideration
    for n = all routes still under consideration
    if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m))
    remove route m from consideration

    Comment by Nick — 02 Apr 2008 @ 11:58

  2. Interesting. So my world may not be turned upside down after all. Strange, because Wendell Odom seems very insistant on this fact in the book. Strange, because the rest of the book seems pretty accurate. I’m looking forward to seeing any other responses on NetPro. It’s a mystery!

    Comment by dorreke — 02 Apr 2008 @ 12:16

  3. Wendell is a great author, but who doesn’t make mistakes?

    There was another couple with Multiple STP and BGP timers in that book as far as I remember.

    Comment by Nick — 02 Apr 2008 @ 12:24


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: