Whispers & Screams
And Other Things

OSPF DR and BDR Election

Participating routers in an OSPF network have varying roles to play in ensuring that the processes which route the data around the network maintain a true and accurate picture of the network topology. The two primary roles in this structure are the Designated Router (DR) and the Backup Designated Router (BDR). Their roles, whilst primarily aimed at maintaining network function from a routing perspective, are equally focused on ensuring that network bandwidth used to accomplish this is used judiciously.

An Election


Consider a multi-access environment (such as LAN or MAN), where three or more routers are connected together. If all the routers in the OSPF network had to form adjacencies with every other OSPF router present, forming a fully meshed OSPF adjacency network, the resultant conceptualised routing mesh would be overly chatty flooding Link State Advertisements (LSAs) with each and every other OSPF router. As a direct result, router CPU load and network bandwidth would be consumed wastefully. OSPF is designed to be far more efficient in its use of these resources, and, in order to prevent this from happening, OSPF holds an election process to determine and fill roles named Designated Router (DR) and a Backup Designated Router (BDR) so that the workload of propagating routing information around the network is more effectively managed. Election for the DR and BDR is determined primarily on the Router Priority (which by default is 1) and the Router ID. If the value of the router interface priority is changed to 0, it prevents that router from becoming the DR or the BDR.

Router priority can be adjusted on Cisco routers on a per interface basis. The Router ID however is a 32-bit number that uniquely identifies the router in the Autonomous System. One algorithm for Router ID assignment is to choose the largest or smallest IP address assigned to the router. If a router's OSPF Router ID is changed, the router's OSPF software should be restarted before the new Router ID takes effect. Before restarting in order to change its Router ID, the router should flush its self-originated LSAs from the routing domain or they will persist for up to MaxAge minutes. Cisco uses a method that some other vendors choose to follow, but it is not a requirement. If you have a loopback interface, since that's the most stable interface on your router, that will be used. If there is no loopback interface, the highest IP address on the router is used. If there is more than one loopback then the highest of them is used. In many elections in OSPF, the higher RID wins. this is the logic for choosing higher over lower. It should be noted however that you can manually specify an RID that isn't even a valid ip address such as 224.1.1.1.

Drothers


The DR is the router which receives LSAs and other updates when there is a change in the inter-neighbor communications. These LSAs are sent out by the DROTHERS routers (all non-DR/BDR routers), and consequently, any further updates are propagated by the DR to the rest of the DROTHERS routers. The show ip ospf neighbor command, when executed, indicates the non-DR/BDR routers as DROTHERS. Every network segment in OSPF has a DR and a BDR.

The Process


What actually happens is that whenever there is a change in network routing status, instead of flooding each and every path with LSAs advertising new information about network topology, the update is only sent to the DR. The DR then takes on this job and floods the routers in its network segment with the update. If the DR fails or is not functioning, the BDR takes over. When this happens, the BDR replaces the existing DR as the new Designated Router, and a new BDR is elected.

Continue reading
332 Hits
0 Comments

The EIGRP (Enhanced Interior Gateway Routing Protocol) metric

EIGRP (Enhanced Interior Gateway Routing Protocol) is a network protocol that lets routers exchange information more efficiently than was the case with older routing protocols. EIGRP which is a proprietary protocol evolved from IGRP (Interior Gateway Routing Protocol) and routers using either EIGRP and IGRP can interoperate because the metric (criteria used for selecting a route) used with one protocol can be translated into the metrics of the other protocol. It is this metric which we will examine in more detail.

Using EIGRP, a router keeps a copy of its neighbour’s routing tables. If it can’t find a route to a destination in one of these tables, it queries its neighbours for a route and they in turn query their neighbours until a route is found. When a routing table entry changes in one of the routers, it notifies its neighbours of the change. To keep all routers aware of the state of neighbours, each router sends out a periodic “hello” packet. A router from which no “hello” packet has been received in a certain period of time is assumed to be inoperative.

EIGRP uses the Diffusing-Update Algorithm (DUAL) to determine the most efficient (least cost) route to a destination. A DUAL finite state machine contains decision information used by the algorithm to determine the least-cost route (which considers distance and whether a destination path is loop-free).

Figure 1




The Diffusing Update Algorithm (DUAL) is a modification of the way distance-vector routing typically works that allows the router to identify loop free failover paths.  This concept is easier to grasp if you imagine it geographically. Consider the map of the UK midlands shown in Figure1. The numbers show approximate travel distance, in miles. Imagine that you live in Glasgow. From Glasgow, you need to determine the best path to Hull. Imagine that each of Glasgow’s neighbours advertises a path to Hull. Each neighbour advertises its cost (travel distance) to get to Hull. The cost from the neighbour to the destination is called the advertised distance. The cost from Glasgow itself is called the feasible distance.
In this example, Newcastle reports that if Glasgow routed to Hull through Newcastle, the total cost (feasible distance) is 302 miles, and that the remaining cost once the traffic gets to Newcastle is only 141 miles. Table1 shows distances reported from Glasgow to Hull going through each of Glasgow’s neighbours.

Table 1




Glasgow will select the route with the lowest feasible distance which is the path through Newcastle.

If the Glasgow-Newcastle road were to be closed, Glasgow knows it may fail over to Carlisle without creating a loop. Notice that the distance from Carlisle to Hull (211 miles) is less than the distance from Glasgow to Hull (302 miles). Because Carlisle is closer to Hull, routing through Hull does not involve driving to Carlisle and then driving back to Glasgow (as it would for Ayr). Carlisle is a guaranteed loop free path.

The idea that a path through a neighbour is loop free if the neighbour is closer is called the feasibility requirement and can be restated as "using a path where the neighbour's advertised distance is less than our feasible distance will not result in a loop."

The neighbour with the best path is referred to as the successor. Neighbours that meet the feasibility requirement are called feasible successors. In emergencies, EIGRP understands that using feasible successors will not cause a routing loop and instantly switches to the backup paths.

Notice that Ayr is not a feasible successor. Ayr's AD (337) is higher than Newcastle's FD (302). For all we know, driving to Hull through Ayr involves driving from Glasgow to Ayr, then turning around and driving back to Glasgow before continuing on to Hull (in fact, it does). Ayr will still be queried if the best path is lost and no feasible successors are available because potentially there could be a path that way; however, paths that do not
meet the feasibility requirement will not be inserted into the routing table without careful consideration.

EIGRP uses a sophisticated metric that considers bandwidth, load, reliability and delay. That metric is:




[latex]256, *, left(K_1, *, bandwidth ,+, dfrac {K_2 ,*, bandwidth}{256 - load}, +, K_3 ,*, delayright), *,dfrac {K_5}{reliability ,+, K_4}[/latex]


Although this equation looks intimidating, a little work will help you understand the maths and the impact the metric has on route selection.

You first need to understand that EIGRP selects path based on the fastest path. To do that it uses K-values to balance bandwidth and delay. The K-values are constants that are used to adjust the relative contribution of the various parameters to the total metric. In other words, if you wanted delay to be much more relatively important than bandwidth, you might set K3 to a much larger number.

You next need to understand the variables:

    • Bandwidth—Bandwidth is defined as (100 000 000 / slowest link in the path) kbps. Because routing protocols select the lowest metric, inverting the bandwidth (using it as the divisor) makes faster paths have lower costs.

 

    • Load and reliability—Load and reliability are 8-bit calculated values based on the performance of the link. Both are multiplied by a zero K-value, so neither is used.

 

    • Delay—Delay is a constant value on every interface type, and is stored in terms of microseconds. For example, serial links have a delay of 20,000 microseconds and Ethernet lines have a delay of 1000 microseconds. EIGRP uses the sum of all delays along the path, in tens of microseconds.



By default, K1=K3=1 and K2=K4=K5=0. Those who followed the maths will note that when K5=0 the metric is always zero. Because this is not useful, EIGRP simply ignores everything outside the parentheses. Therefore, given the default K-values the equation becomes:




[latex]256, *, left(1, *, bandwidth ,+, dfrac {0 ,*, bandwidth}{256 - load}, +, 1 ,*, delayright), *,dfrac {0}{reliability ,+, 0}[/latex]


Substituting the earlier description of variables, the equation becomes 100,000,000 divided by the chokepoint bandwidth plus the sum of the delays:




[latex]256, *, left(dfrac {10^7}{min(bandwidth)}, +,sum,dfrac {delays}{10}right)[/latex]


As a final note, it is important to remember that routers running EIGRP will not become neighbours unless they share K-values. That said however you really should not change the K-values from the default without a compelling reason.

Continue reading
322 Hits
0 Comments

Could ants power Web3.0 to new heights? OSPF v’s ANTS

Having recently completed my latest M.Eng block on the subject of “Natural and Artificial Intelligence“, I became aware of advances made in the recent decade towards a new paradigm of network traffic engineering that was being researched. This new model turns its back on traditional destination based solutions, (OSPF, EIGRP, MPLS) to the combinatorial problem of decision making in network routing  favouring instead a constructive greedy heuristic which uses stochastic combinatorial optimisation. Put in more accessible terms, it leverages the emergent ability of sytems comprised of quite basic autonomous elements working together, to perform a variety of complicated tasks with great reliability and consistency.

In 1986, the computer scientist Craig Reynolds set out to investigate this phenomenon through computer simulation. The mystery and beauty of a flock or swarm is perhaps best described in the opening words of his classic 1986 paper on the subject:

The motion of a flock of birds is one of nature’s delights. Flocks and related synchronized group behaviors such as schools of fish or herds of land animals are both beautiful to watch and intriguing to contemplate. A flock … is made up of discrete birds yet overall motion seems fluid; it is simple in concept yet is so visually complex, it seems randomly arrayed and yet is magnificently synchronized. Perhaps most puzzling is the strong impression of intentional, centralized control. Yet all evidence dicates that flock motion must be merely the aggregate result of the actions of individual animals, each acting solely on the basis of its own local perception of the world.

An analogy with the way ant colonies function has suggested that the emergent behaviour of ant colonies to reliably and consistently optimise paths could be leveraged to enhance the way that the combinatorial optimisation problem of complex network path selection is solved.

The fundamental difference between the modelling of a complex telecommunications network and more commonplace problems of combinatorial optimisation such as the travelling salesman problem is that of the dynamic nature of the state at any given moment of a network such as the internet. For example, in the TSP the towns, the routes between them and the associated distances don’t change. However, network routing is a dynamic problem. It is dynamic in space, because the shape of the network – its topology – may change: switches and nodes may break down and new ones may come on line. But the problem is also dynamic in time, and quite unpredictably so. The amount of network traffic will vary constantly: some switches may become overloaded, there may be local bursts of activity that make parts of the network very slow, and so on. So network routing is a very difficult problem of dynamic optimisation. Finding fast, efficent and intelligent routing algorithms is a major headache for telcommunications engineers.

So how you may ask, could ants help here? Individual ants are behaviourally very unsophisticated insects. They have a very limited memory and exhibit individual behaviour that appears to have a large random component. Acting as a collective however, ants manage to perform a variety of complicated tasks with great reliability and consistency, for example, finding the shortest routes from their nest to a food source. 



These behaviours emerge from the interactions between large numbers of individual ants and their environment. In many cases, the principle of stigmergy is used. Stigmergy is a form of indirect communication through the environment. Like other insects, ants typically produce specific actions in response to specific local environmental stimuli, rather than as part of the execution of some central plan. If an ant’s action changes the local environment in a way that affects one of these specific stimuli, this will influence the subsequent actions of ants at that location. The environmental change may take either of two distinct forms. In the first, the physical characteristics may be changed as a result of carrying out some task-related action, such as digging a hole, or adding a ball of mud to a growing structure. The subsequent perception of the changed environment may cause the next ant to enlarge the hole, or deposit its ball of mud on top of the previous ball. In this type of stigmergy, the cumulative effects of these local task-related changes can guide the growth of a complex structure. This type of influence has been called sematectonic. In the second form, the environment is changed by depositing something which makes no direct contribution to the task, but is used solely to influence subsequent behaviour which is task related. This sign-based stigmergy has been highly developed by ants and other exclusively social insects, which use a variety of highly specific volatile hormones, or pheromones, to provide a sophisticated signalling system. It is primarily this second mechanism of sign based sigmergy that has been successfully simulated with computer models and applied as a model to a system of network traffic engineering.

In the traditional network model, packets move around the network completely deterministically. A packet arriving at a given node is routed by the device which simply consults the routing table and takes the optimum path based on its destination. There is no element of probability as the values in the routing table represent not probabilities, but the relative desirability of moving to other nodes.

In the ant colony optimisation model, virtual ants also move around the network, their task being to constantly adjust the routing tables according to the latest information about network conditions. For an ant, the values in the table are probabilities that their next move will be to a certain node.The progress of an ant around the network is governed by the following informal rules:

    • Ants start at random nodes.

 

    • They move around the network from node to node, using the routing table at each node as a guide to which link to cross next.

 

    • As it explores, an ant ages, the age of each individual being related to the length of time elapsed since it set out from its source. However, an ant that finds itself at a congested node is delayed, and thus made to age faster than ants moving through less choked areas.

 

    • As an ant crosses a link between two nodes, it deposits pheromone however, it leaves it not on the link itself, but on the entry for that link in the routing table of the node it left. Other ‘pheromone’ values in that column of the nodes routing table are decreased, in a process analogous to pheromone decay.

 

    • When an ant reaches its final destination it is presumed to have died and is deleted from the system.R.I.P.



Testing the ant colony optimisation system, and measuring its performance against that of a number of other well-known routing techniques produced good results and the system outperformed all of the established mechanisms however there are potential problems of the kind that constantly plague all dynamic optimisation algorithms. The most significant problem is that, after a long period of stability and equilibrium, the ants will have become locked into their accustomed routes. They become unable to break out of these patterns to explore new routes capable of meeting new conditions which could exist if a sudden change to the networks conditions were to take place. This can be mitigated however in the same way that evolutionary computation introduces mutation to fully explore new possibilities by means of the introduction of an element of purely random behaviour to the ant.

‘Ant net’ routing has been tested on models of US and Japanese communications networks, using a variety of different possible traffic patterns. The algorithm worked at least as well as, and in some cases much better than, four of the best-performing conventional routing algorithms. Its results were even comparable to those of an idealised ‘daemon’ algorithm, with instantaneous and complete knowledge of the current state of the network.

It would seem we have not heard the last of these routing antics…. (sorry, couldnt resist).

Continue reading
741 Hits
1 Comment