About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Journal of Signal Processing Systems
Paper
Feature-rich Regular Expression Matching Accelerator for Text Analytics
Abstract
The volume of textual data accessible on our planet is increasing every day. Extracting information hidden in this “Big Data” is a computationally intensive task. A key step of information extraction is the conversion of free text into a structured format. This step is typically achieved using regular expressions (regexs) and dictionaries. Unlike network intrusion detection systems, information extraction systems detect and report where precisely the specific and relevant information starts and ends within text documents. To improve precision and to eliminate ambiguity, regex matchers used in information extraction systems must support start and end offset position reporting, capturing groups, and specific regex-matching semantics, such as leftmost matching. This work describes a scalable regex-matching accelerator that supports such advanced regex-matching features and can be efficiently implemented in reconfigurable logic. Experiments on proprietary and open source regex sets comprising hundreds of regexs demonstrate an up to sixfold improvement of the area-delay product with respect to previous work.