About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Computer Graphics and Image Processing
Paper
Block segmentation and text extraction in mixed text/image documents
Abstract
The segmentation and classification of digitized printed documents into regions of text and images is a necessary first processing step in document analysis systems. It is shown that a constrained run length algorithm is well suited to partition most documents into areas of text lines, solid black lines, and rectangular {ballot box}es enclosing graphics and halftone images. During the processing these areas are labeled and meaningful features are calculated. By making use of the regular appearance of text lines as textured stripes, a linear adaptive classification scheme is constructed to discriminate text regions from others. © 1982 Academic Press, Inc.