Bounds on the time to reach agreement in the presence of timing uncertainty
Abstract
Upper and lower bounds are proved for the real time complexity of the problem of reaching agreement in a distributed network, in the presence of process failures and inexact information about time. It is assumed that the amount of (real) time between any two consecutive steps of any nonfaulty process is at least c1 and at most CZ; thus, C = c2/c1 is a measure of the timing uncertainty. It is also assumed that the time for message delivery is at most d. Processes are assumed to fail by stopping, so that process failures can be detected by timeouts. Let T denote the worst-case time to detect a failure, i.e., the elapsed time between tlhe failure of some process p and the time when all correct processes determine that p has failed; a straightforward approach yields T roughly eqr.lid to Cd. Letting denote the number of faults to be tolerated, a simple adaptation of an (+ 1)-rownd syn-chronous agreement algorithm takes time (+ l)T, or roughly (f1 l) Cd. The first principal result of this paper is an agreement algorithm in which the worst-case time T for a timeout is incurred at most once, yielding a running time of approximately 2ffi + T in the worst case, where 6 is an upper bound on the message delay that actually occurs in a given execution. This represents a significant reduction in-complexity in case C G 1 or δ L d. The second principal result is a lower bound of (f - l)d + Cd on the running time of any agreement algorithm, for the case where 6 = d; this is close to the upper bound of 2fd + Cd for this case.