Michael W. Blasgen, Richard G. Casey, et al.
CACM
The segmentation and classification of digitized printed documents into regions of text and images is a necessary first processing step in document analysis systems. It is shown that a constrained run length algorithm is well suited to partition most documents into areas of text lines, solid black lines, and rectangular {ballot box}es enclosing graphics and halftone images. During the processing these areas are labeled and meaningful features are calculated. By making use of the regular appearance of text lines as textured stripes, a linear adaptive classification scheme is constructed to discriminate text regions from others. © 1982 Academic Press, Inc.
Michael W. Blasgen, Richard G. Casey, et al.
CACM
Friedrich M. Wahl
Computer Graphics and Image Processing
Ronnie K. L. Poon, Kwan Y. Wong
ACM SIGIR Forum
Lionel M. Ni, Kwan Y. Wong, et al.
IEEE TC