About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ASAP 2013
Conference paper
A high-speed and large-scale dictionary matching engine for Information Extraction systems
Abstract
Dictionary matching is a commonly used operation in Information Extraction (IE) systems. It involves matching a set of strings in a document against a dictionary of pre-defined patterns. In this paper, we describe a high performance and scalable hardware architecture to enable high throughput dictionary matching on very large dictionaries for text analytics applications. Our hardware accelerator employs a novel hashing based approach instead of commonly used deterministic finite automata (DFA) based algorithms. A limitation of the DFA based approaches is that they typically process one character every cycle, while the proposed hash based scheme can process a string token every cycle, thus achieving significantly higher processing throughput than the DFA based implementations. Our measurement results based on a prototype implementation on an Altera Stratix IV FPGA device indicate that our hardware dictionary matching engine can process typical document streams at a processing rate of ∼1.5GB/s (∼12 Gbps) while simultaneously allowing support for large dictionary sizes containing up to ∼100K patterns, thus making it very useful for IE workload acceleration. © 2013 IEEE.