Key Change Strategies for TCP-MD5

The TCP-MD5 option is most commonly used to secure BGP sessions between routers. However, changing the long-term key is difficult, since the change needs to be synchronized between different organizations. Worse yet, if the keys are out of sync, it may break the connection between the two routers, rendering repair attempts difficult. The proper solution involves some sort of key management protocol. Apart from the complexity of such things, RFC 2385 was not written with key changes in mind. In particular, there is no KeyID field in the option, which means that even a key management protocol would run into the same problem. Fortunately, a heuristic permits key change despite this protocol deficiency. The change can be installed unilaterally at one end of a connection; it is fully compatible with the existing protocol.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .

Separate algorithms are necessary for transmission and reception. Reception is easier; we explain it first.

A receiver has a list of valid keys. Each key has a (conceptual) timestamp associated with it. When a segment arrives, each key is tried in turn. The segment is discarded if and only if it cannot be validated by any key in the list. In principle, there is no need to test keys in any particular order. For performance reasons, though, a simple most-recently-used (MRU) strategy -- try the last valid key first -- should work well. More complex mechanisms, such as examining the TCP sequence number of an arriving segment to see whether it fits in a hole, are almost certainly unnecessary. On the other hand, validating that a received segment is putatively legal, by checking its sequence number against the advertised window, can help avoid denial of service attacks. The newest key that has successfully validated a segment is marked as the "preferred" key; see below. Implicit in this scheme is the assumption that older keys will eventually be unneeded and can be removed. Accordingly, implementations SHOULD provide an indication of when a key was last used successfully.

Transmission is more complex, because the sender does not know which keys can be accepted at the far end. Accordingly, the conservative strategy is to delay using any new keys for a considerable amount of time, probably measured in days. This time interval is the amount of asynchronicity the parties wish to permit; it is agreed upon out of band and configured manually. Some automation is possible, however. If a key has been used successfully to validate an incoming segment, clearly the other side knows it. Accordingly, any key marked as "preferred" by the receiving part of a stack SHOULD be used for transmissions. A sophisticated implementation could try alternate keys if the TCP retransmission counter gets too high. (This is analogous to dead gateway detection.) In particular, if a key change has just been attempted but such segments are not acknowledged, it is reasonable to fall back to the previous key and issue an alert of some sort. Similarly, an implementation with a new but unused key could occasionally try to use it, much in the way that TCP implementations probe closed windows. Doing this avoids the "silent host" problem discussed in . This should be done at a moderately slow rate. Note that there is an ambiguity when an acknowledgment is received for a segment transmitted with two different keys. The TCP Timestamp option can be used for disambiguation.

Suppose only one end of the connection has this algorithm implemented. The new key is provisioned on that system, with a start time far in the future -- sufficiently far, in fact, that it will not be used spontaneously. After the key is ready, the other end is notified, out-of-band, that a key change can commence. At some point, the other end is upgraded. Because it does not have multiple keys available, it will start using the new key immediately for its transmission, and will drop all segments that use the old key. As soon as it tries to transmit, the upgraded side will designate the new key as preferred, and will use it for all of its transmissions. Note specifically that this will include retransmissions of any segments rejected because they used the old key. There is a problem if the unchanged machine is a "silent host" -- a host that has nothing to say, and hence does not transmit. The best way to avoid this is for an upgraded machine to try a variety of keys in the event of repeated unacknowledged packets, and to probe for new unused keys during silent periods, as discussed in . Alternatively, application-level KeepAlive messages may be used to ensure that neither end of the connection is completely silent. See, for example, Section 4.4 of or Section 3.5.4 of .

Double-ended operations are similar, save that both sides deploy the new key at about the same time. One should be configured to start using the new key at a point where it is reasonably certain that the other side would have it installed, too. Assuming that has in fact happened, the new key will be marked "preferred" on both sides.

As noted, implementations should monitor when a key was last used for transmission or reception. Any monitoring mechanism can be used; most likely, it will be one or both of a MIB object or objects and the vendor's usual command-line mechanism for displaying data of this type. Regardless, the network operations center should keep track of this. When a new key has been used successfully for both transmission and reception for a reasonable amount of time -- the exact value isn't crucial, but it should probably be longer than twice the maximum segment lifetime -- the old key can be marked for deletion. There is an implicit assumption here that there will not be substantial overlap in the usage period of such keys; monitoring systems should look for any such anomalies, of course.

As implied in , this is an interim strategy, intended to make TCP-MD5 operationally usable today. We do not suggest or recommend it as a long-term solution. In this section, we make some suggestions about the design of a future TCP authentication option. The first and most obvious change is to replace keyed MD5 with a stronger MAC . Today, HMAC-SHA1 is the preferred choice, though others such as UMAC should be considered as well. A new authentication option should contain some form of a Key ID field. Such an option would permit unambiguous identification of which key was used to create the MAC for a given segment, sparing the receiver the need to engage in the sort of heuristics described here. A Key ID is useful with both manual and automatic key management. (Note carefully that we do not prescribe any particular Key ID mechanism here. Rather, we are stating a requirement: there must be a simple, low-cost way to select a particular key, and it must be possible to rekey without tearing down long-lived connections.) Finally, an automated key management mechanism should be defined. The general reasoning for that is set forth in ; specific issues pertaining to BGP and TCP are given in .

In theory, accepting multiple keys simultaneously makes life easier for an attacker. In practice, if the recommendations in are followed, this should not be a problem. New keys must be communicated securely. Specifically, new key messages must be kept confidential and must be properly authenticated. Having multiple keys makes CPU denial-of-service attacks easier. This suggests that keeping the overlap period reasonably short is a good idea. In addition, the Generalized TTL Security Mechanism , if applicable to the local topology, can help. Note that most of the time, only one key will exist; virtually all of the remaining time there will be only two keys in existence.

There are no IANA actions required. The TCP-MD5 option number is defined in , and is currently listed by IANA.

I'd like to thank Ron Bonica, Randy Bush, Ross Callon, Rob Evans, Eric Rescorla, and Sam Weiler for their comments and inspiration.