About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IS&T/SPIE Electronic Imaging 2003
Conference paper
Document structure analysis algorithms: A literature survey
Abstract
Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among the physical or logical components of a document page can be described by an ordered tree structure and can be modeled by a tree grammar which describes the page at the component level in terms of regions or blocks. This paper provides a detailed survey of past work on document structure analysis algorithms and summarize the limitations of past approaches. In particular, we survey past work on document physical layout representations and algorithms, document logical structure representations and algorithms, and performance evaluation of document structure analysis algorithms. In the last section, we summarize this work and point out its limitations.