Leftmost longest regular expression matching in reconfigurable logic
Abstract
Regular expression (regex) matching is an essential part of text analytics and network intrusion detection systems. The leftmost longest regex matching feature enables finding a leftmost derivation of an input text and helps resolve ambiguities that can arise in natural-language parsing. We show that leftmost longest regex matching can be efficiently performed in a dataflow pipeline by combining a recently proposed regex-matching architecture with simple last-in first-out (LIFO) buffers and streaming filter units, without creating significant back-pressure or using costly sorting operations. The techniques we propose can be used to compute overlapping and non-overlapping leftmost longest and rightmost longest regex matches. In addition, we show that the latency of the LIFO buffers can be hidden by overlapping the processing of subsequent input streams, without replicating the buffer space. Experiments on an Altera Stratix IV FPGA show a 200-fold improvement of the processing rates compared with a multithreaded software implementation.