Publication
FPL 2013
Conference paper

Hardware-accelerated regular expression matching for high-throughput text analytics

View publication

Abstract

Advanced text analytics systems combine regular expression (regex) matching, dictionary processing, and relational algebra for efficient information extraction from text documents. Such systems require support for advanced regex matching features, such as start offset reporting and capturing groups. However, existing regex matching architectures based on reconfigurable nondeterministic state machines and programmable deterministic state machines are not designed to support such features. We describe a novel architecture that supports such advanced features using a network of state machines. We also present a compiler that maps the regexs onto such networks that can be efficiently realized on reconfigurable logic. For each regex, our compiler produces a state machine description, statically computes the number of state machines needed, and produces an optimized interconnection network. Experiments on an Altera Stratix IV FPGA, using regexs from a real life text analytics benchmark, show that a throughput rate of 16 Gb/s can be reached. © 2013 IEEE.

Date

Publication

FPL 2013