About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Computer Science - Research and Development
Paper
DotStar: Breaking the scalability and performance barriers in parsing regular expressions
Abstract
Regular expressions (shortened as regexp ) are widely used to parse data, detect recurrent patterns and information, and are a common choice for defining configurable rules for a variety of systems. In fact, many data-intensive applications rely on regexp matching as the first line of defense to perform on-line data filtering. Unfortunately, few solutions can keep up with the increasing data rate and complexity of sets containing hundreds of expressions. In this paper we present DotStar (.*), a complete algorithmic solution and a software tool-chain, that can compile large sets of regexp into an automaton that can take advantage of the vector/SIMD extensions available on many commodity multi-core processors. DotStar relies on several algorithmic innovations to transform the user-provided regexp set into a sequence of manageable intermediate representations. The resulting automaton is both space and time efficient, and can search in a single pass without backtracking. The experimental evaluation, performed on a family of state-of-the-art processors, shows that DotStar can efficiently handle both small sets of regexp, used in protocol parsing, and larger sets designed for Network Intrusion Detection Systems (NIDS), achieving a performance between 1 and 5 Gbit/sec per core. © 2010 Springer-Verlag.