Publication
IS&T/SPIE Electronic Imaging 1996
Conference paper

Document recognition: an attribute grammar approach

Abstract

A formulation of a hierarchical page decomposition technique for technical journal pages using attribute grammars is presented. In this approach, block-grammars are recursively applied until a page is classified into its most significant sub-blocks. While a grammar devised for each block depends on its logical function, it is possible to formulate a generic description for all block grammars using attribute grammars. This attribute grammar formulation forms a generic framework on which this syntactic approach is based, while the attributes themselves are derived from publication-specific knowledge. The attribute extraction process and the formulation itself are covered in this paper. We discuss an application of attribute grammars to a document analysis problem, the extraction of logical, relational information from the image of tables.