Annotation-based finite-state transducers on reconfigurable devices
With the ever growing amount of unstructured data, high-speed content analysis becomes ever more important. Enabling efficient search functions to help locate specific and relevant information hidden in this big data is a crucial task of today's enterprise systems, and can lead to valuable insights. A key component of content analysis systems are text parsers, which transform unstructured text data into structured information. Cascaded grammars offer a popular and powerful representation of text parsers by enabling the definition of more complex patterns in terms of simpler ones in a hierarchical fashion. This work presents a compilation framework to generate an optimized FPGA pipeline from a cascaded grammar description. We also describe the system integration and the way FPGA-based accelerators can be used as part of larger analysis tasks within Unstructured Information Management Application (UIMA) pipelines. We compare the performance of the hardware-accelerated system and a commercial software implementation using real-life UIMA pipelines from the healthcare domain. We show that the FPGA-accelerated system processes the parsing stage of a UIMA pipeline up to 31 times faster than the software implementation running on a high-end server, which results in an acceleration of up to 5 times for the complete pipeline.