# Approximating edit distance efficiently

## Abstract

Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than l, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)2/3 gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. l gap problem, where l can be as small as O (k2) - still with a constant sketch - but works only for strings that are mildly "non-repetitive". Finally, we develop an n3/7-approximation quasi-linear time algorithm for edit distance, improving the previous best factor of n3/4 [5]; if the input strings are assumed to be non-repetitive, then the approximation factor can be strengthened to n1/3. © 2004 IEEE.