11 Detecting bad behaviors

While it is important to strengthen security in order to prevent network intrusions, and therefore misbehaviors, it is also useful to use audit tools to detect these possible misbehaviors in the network. As remarked by Schneier [140], a prevention-only strategy works only if the prevention mechanism is perfect. In this case, we accept the possibility that, despite the security measures, an intrusion can be successfully carried out; what we want now is to identify the intruder as soon as it starts perturbing the network, and neutralize it. The misbehaving node can be a former legitimate node that has been compromised, or an intruder node that managed to join the network. Note that node misbehavior may also be not due to malice, e.g. in the case of malfunctioning or battery exhaustion. Nevertheless also these misbehaviors should be detected and stopped, since they can damage the correct functioning of the network.

Once a node detects a misbehaving node, it alerts the network. Upon reception of the alert, the other legitimate nodes should take a corrective action to exclude the misbehaving node to participate further in the network.

The techniques mentioned can be used with or without an infrastructure to sign messages, i.e. both on secured or unsecured routing protocols. They do not replace message signatures and other security measures; they work in parallel to them. Of course, the danger in not using signatures is that an attacker could revert the technique against the network, falsely accusing well-behaving nodes and/or issuing alerts with a spoofed originator.

11.1 State of the art

Different ways exist for countering attacks such as the blackhole (in which, we recall, a node causes a Denial of Service by failing forwarding packets in accordance to the protocol) or similar. An example is to overhear transmissions to detect incorrect forwarding behavior, as in the Watchdog/Pathrater [104] or in CONFIDANT [20]. Bloodhound [103] is a modified version of the Watchdog, designed in order to patch a security flaw in SRP which makes SRP vulnerable to the invisible node attack. Another issue is the WATCHERS [19, 70] approach based on the principle of conservation of flow. Other ways use detection through acknowledgements [122] or probe packets [26].

11.1.1 Watchdog/Pathrater

The Watchdog/Pathrater system is designed as an extension to DSR. It is composed of a module called Watchdog that identifies misbehaving nodes, and a module called Pathrater that computes a route avoiding these nodes. These modules are run by each node in the network.

The Watchdog listen promiscuously to the next node’s transmission, checking that the node correctly forwards the packet it has received. The Watchdog can also detect if the node has tampered with the payload. This is done by comparing the listened packet to a buffer of recently sent packets. The Pathrater processes the information obtained from the Watchdog to rate the reliability of every other node it knows in the network, and to calculate a path metric obtained by averaging the node ratings in the path. The packets are then routed through the path with the highest metric.

This system cannot be reverted against the network because such behavior would be easily detected. In a path A - X - B - C - D, node X (misbehaving) could falsely report that node B is not forwarding packets. However, the acknowledgement of a message from A to D travels correctly from D to A (node X cannot drop neither packets nor their acknowledgement because A and B would detect this misbehavior), and then A becomes aware that B is not misbehaving because it is included in the path.

We consider a path A - B - C. The weaknesses of this system is that the Watchdog running in node A may fail in identifying a misbehaving node in some condition.

11.1.2 CONFIDANT

CONFIDANT (Cooperation Of Nodes: Fairness In Dynamic Ad hoc NeTworks) is a protocol similar to the Watchdog/Pathrater. It is developed with analogy to an ecological system, where natural selection ensures the survival of those elements of population who bears a grudge against the elements which does not behave altruistically. Hence, CONFIDANT too is based on cooperation between well-behaving nodes, and isolation of misbehaving nodes.

The protocol is composed of four modules which are run in each node: the Monitor, to listen to neighbors’ transmissions and identify misbehaviors; the Reputation System, to rate the nodes; the Path Manager, to create and delete paths according to a security metric derived from the nodes’ ratings; and the Trust Manager, to manage the trust level of the nodes and issue alarms.

11.1.3 WATCHERS

The principle of conservation of flow in a network states: “All data bits sent to a node, and not destined to that node, must exit the node”, or “An input must either be absorbed or sent on as an output”. WATCHERS (Watching for Anomalies in Transit Conservation: a Heuristic for Ensuring Router Security) is a distributed protocol based on the principle of conservation of flow. Each participating node checks that incoming packets have been correctly routed, and counts the data bits passing through neighbor nodes. The results are periodically reported to other participating nodes, in order to allow each node to check if its neighbors have respected the principle. If a node finds that one of its neighbors is misrouting packets or violating the principle, it stops sending packets to the misbehaving node.

A drawback of this method is that proving that the proper number of packets was forwarded by a node, does not prove that the proper packets were sent, as it cannot be proven that packets have not been tampered with.

11.2 A trust system for OLSR

As said previously, a node can be able to notice if its neighbor is not forwarding packets; in this case, the noticing node may take appropriate action and propagate the alert to the other nodes.

However, a critical problem when exchanging trust evaluation between nodes is how to distinguish false alarms from good ones. Compromised nodes may issue false alarms regarding legitimate nodes, in order to exclude them from the network and therefore cause a Denial of Service. In the CONFIDANT protocol, the problem is lessened by timeout and subsequent recovery of nodes that have behaved well for a specific period of time.

The problem lies in the difficulty to evaluate node A’s statement about B. If A states that B is misbehaving, is A a good node that reports B’s misbehavior, or is B good and A is compromised? Hence the situation is symmetric. In fact, once a node has fallen in enemy’s hands, as long as its messages strictly follow the protocol and the node uses its private key to sign its messages, these messages appear to be perfectly valid (and in fact they are) and are accepted by the rest of the network. We nevertheless present here our approach, which could partly solve this dilemma, and which consists in a balanced accusation system and link removal.

In our opinion, there is a criterion that allows us to distinguish between compromised and non-compromised nodes: as cracking a PKI costs highly in terms of efforts and time, we may suppose that compromised nodes are outnumbered by good nodes. We can then use this advantage by requiring that alerts must be confirmed by more than one node. There is safety in number. With this approach in mind, we outline a protocol for misbehavior detection, which imitates the Pathrater and which uses an evaluation and rating system of accusations.

11.2.1 Specifications

A global Trust Table lists each node in the network, with a numerical value associated to each node and representing its level of trust. This value means how much this node should be trusted to be well-behaving. Every node maintains a local copy of the Trust Table.

When a node detects a neighbor misbehaving, it immediately alerts the network by broadcasting an accusation message containing:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Accused Neighbor Address                    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           Timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      :                           Signature                           :
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 11.1:

Accusation message format.

Every node also keeps an Accusation Table storing all past heard accusations broadcast on the network. Upon reception of an accusation, a node handles and evaluates it independently. This avoids the need of maintaining a centralized entity (which could be a weakness) evaluating the trust. If broadcast is done properly, each node should have the same copy of the Accusation Table.

Optionally, the accusation message could contain also a valid Proof, as described in Section 9, consisting of a HELLO message reporting a symmetric link with the neighbor node. This of course can be done only if the accusation system is implemented over the ADVSIG infrastructure. The advantage of the Proof is that it certifies the accuser node being in the neighborhood of the accused node, and therefore limits the damage a compromised node could do by maliciously accusing non-neighbor nodes (possibly, every other node in the network).

11.2.2 Punishment and reward

A node has a trust rating of τ_s at the boot-up. This trust rating changes according to the behavior and reported accusations against the node.

If a node has at least n_i accusations against him (all accusations coming from n_i different accusers) within a time interval t_i, its rating is decreased by α_i and its accusers are rewarded by increasing their rating by ρ_i. However, if within t_i the network fails to find n_i accusations against the node, these accusations are dropped; in this case the node’s rating is increased by ρ_i′ and the m nodes which accused it (m < n_i) are punished by decreasing their rating by ′
αmi

. With this, we implement a reward and punishment system which remunerates nodes which behave correctly and sincerely report misbehaviors, and punishes false accusers in the opposite way. Each set i of values can be decided depending on the type of misbehavior detected: see Table 11.1 for an example.


Index	Misbehavior

i = 0	Failure in reporting a misbehavior
i = 1	Failure in forwarding
i = 2	Malformed control message
i = 3	Stale timestamp
i = 4	Invalid signature
i = 5	Identity spoofing
i = 6	DoS (message bombing) attempted

Table 11.1:

Misbehaviors, in order of increasing severity.

When the trust level of a node drops negative, the node is considered as bad (i.e. compromised or malfunctioning) and specific actions should be taken against it. The list of possible actions to take is discussed in Section 11.2.3.

An accusation from a node is automatically discarded if the node has already sent an accusation (as reported in the Accusation Table) within a time τ_i ≥ t_i.

As packet collisions and transmission errors happen even on a network where no node is compromised, a well-behaving node might be accused and eventually have its trust rate dropping below zero after a finite time. To avoid this, the rating of all nodes is raised by a bonus β_e every time interval t_e. The values β_e and t_e are computed depending on the probability p_e of a packet collision or transmission error. These values should be set once at network boot-up and not changed anymore, as otherwise this could make the network automatically adjusting as more and more nodes become compromised.

The value of the n_i variable must be chosen as proportional to the density of the network, and depending on the balance we want between different kinds of protection. An attacker could compromise a number of nodes, use them to surround a good node and accuse it until the good node is considered bad, then pass to another good node and so on. A high value for n_i means that a bigger number of accusers is needed against a presumed misbehaving node. Therefore, the network is better protected against the aforementioned attack, as the attacker needs to compromise a bigger number of nodes. However, we must take into account the fact that the network might fail in finding n_i witnesses at the same time t_i for the same event, and therefore that more misbehaviors are unreported. On the other hand, setting a low value for n_i causes more misbehaviors to be detected, but makes the aforementioned attack easier to accomplish.

11.2.3 Detection of a misbehaving node: countermeasures

We have now spotted a node that, with high probability, is misbehaving due to compromission or malfunctioning. However we are not sure which of the two possibilities is correct. A malfunctioning node may fail to route packets, but its other functionalities like message emission may or may not be affected as well. On the other hand, a compromised node is likely to behave as much maliciously as it can, and should therefore be removed from the network definitely and as soon as possible.

As a safety rule, we may choose to exclude the misbehaving node, and deny routing packets through it. This exclusion is operated by requiring that all routes containing the guilty node are removed from all routing tables. This exclusion may be definitive or may be revoked after a complete recovery over the failing node, e.g. after human intervention.

11.2.4 Variations on the theme of trust evaluation

Another strategy is evaluating trust locally instead of globally; this involves performing link removal instead of node removal. This strategy states that in the case of a successful accusation, the link between the accuser and the accused node must be removed. This is done by the accusing node, by removing the misbehaving node from its routing tables and not accepting messages anymore from the misbehaving node. In this case, there is no need to broadcast accusations; a different Trust Table is maintained independently by each node, and the boot-up trust rate τ_s is set to a smaller value. This strategy is probably safer, as it better maintains network availability. As we do not know whether of the two nodes is misbehaving, we cut the link between them. Should a node misbehave for more than a short time, it isolates itself from the rest of the network.

Another criterion, unsuitable for OLSR but which could be applied to a source routing protocol, is to use the trust rating as a routing metric. A route rating is calculated as the sum (or the average, or another appropriate mathematical operation) of the trust ratings of all nodes included in the path, and the route with the highest rating is chosen. This is similar to the SAR protocol [163] which organizes nodes in a trust hierarchy and incorporates security ranks of nodes into routing metrics; the difference is that trust ratings in SAR are not dynamic.

11.2.5 Precise checks on flow conservation

An additional measure for misbehavior check, which requires all traffic to be authenticated, applies the principle of conservation of flow enunciated in Section 11.1.3 to perform precise checks on flows. Network flow conservation checks are the basis for a set of algorithms that essentially count packets received and sent by a node to each of its neighbors. These counts verify if the node is exhibiting proper routing behavior. The total number of packets which come into a node to be relayed should be equal to the total number of relayed packets coming out from the same node.

When applied to two neighbor nodes A and B, the principle of conservation of flow states that the number of packets sent by A to B must be equal to the number of packets received by B from A.

Now, when wanting to verify the behavior of a third node C being both A’s and B’s neighbor, the packet counts should be consistent considering each pair involving C and its neighbors, i.e. C -A and C -B. Additionally, nodes A and B should compute statistics about packets in transit through C, where a transit packet of C is a packet that is neither destinated to nor originating from C. On a node C, the principle of conservation of flow hence translates into “The sum of the number of transit packets sent to C by all other nodes must be equal to the sum of the number of transit packets sent by C to all other nodes”.

Therefore, a more reliable misbehavior detection is obtained through precise flow conservation checks. Each pair of neighbors (X,Y ) records the following information about the packets from X to Y :

Each quantity can be seen either from the point of view of node X or node Y . For instance, r_X→Y is denoted r_X→Y[X] for X’s perspective, and r_X→Y[Y ] for Y ’s perspective. Similarly, m_X→Y[X] is the number of packets misrouted by X to Y from X’s perspective, which should normally be 0.

The complete relation for flow conservation in a node Z is therefore specified as: output - input = produced packets - consumed packets. This quantity is not necessarily zero. Therefore we have that the number of packets sent by Z to neighbors, minus the number of packets sent by neighbors to Z, is equal to the number of packets sent by Z originating from Z, minus the number of packets sent to Z destinated to Z, minus the number of packets sent to Z that Z judged misrouted.

This can be translated into Equation 11.1 for a node Z and the set of its neighbors N_i:

The weakness of this technique is that it ensures the proper number of packets is exchanged, but it does not make any assumption about the content of the packets. An expensive possible solution would be to keep a digest list or Bloom filter [14], instead of a counter, about the packets received and sent.

11.3 A last word about enforcing security

An important point that should not be forgotten is that security often adds redundancy to an architecture, and the burden it gives may be exploited by an adversary to cause damage to the system. For instance, where a secure protocol, which implements message signatures, is deployed, an attacker may send a large number of malformed signatures over the network, in order to keep the nodes busy verifying the signatures and therefore to perform a Denial of Service. This problem would not exist in the non-secure version of the protocol, where the nodes would simply discard the malformed messages sent by the attacker. The following scheme, adapted from Garfinkel [51], shows how each new security countermeasure can be exploited for a new attack, by illustrating at each step a different profile of the attacker and the defender.

This simply means that the security level must be tuned according to the previsible attacks and to the desired level of protection: a military network would obviously need a greater defence infrastructure than that of a public network. Furthermore, securing a network (or, in a general way, a system) is a dynamic process. This process must make use of several and different tools, in order to be ready to counteract different types of assaults.

It should also be noted, as shown by the elusive and elegant jellyfish attack (Section 3.1.1), that misbehaviors may be very difficult to detect. We give here an example of a very theoretical attack against OLSR that, while strictly respecting the protocol, has the effect of muting a node completely. The OLSR protocol adds an amount of jitter to the interval at which control messages are generated. This is done in order to avoid emitting messages at the same time, and hence provoking packet collisions. We may safely assume that, in any implementation of OLSR, the jitter is randomly generated using a PRNG (Pseudo Random Number Generator)¹, which does not give real-random numbers. If the implementation of the PRNG is not carefully chosen, an attacker could replicate its results and command a compromised node to use the same jitter as that which a neighbor node is using. As a consequence, the compromised node could synchronize its transmissions with those of its neighbor, causing message collisions and impeding the neighbor from communicating.

This behavior, while being fully protocol-compliant, can severely degrade the network functioning. Nonetheless, it is an impractical and nonrealistic attack. Denials of Service can be carried on the physical layer as well (e.g. by radio interferences), where they are much easier to carry out.

Chapter 11Detecting bad behaviors