Publication
IEEE TC
Paper

An Interactive System for Reading Unformatted Printed Text

View publication

Abstract

A system intended to provide input of printed text to com- puters is applied to published patents, annotated law reports, and technical journals. The principal improvement over previous methods is the elimination of training sets by relying on rejected characters to classify subsequent patterns, with intervention of the operator at the end of each run to attribute alphabetic identities to the classes. Another new feature is the application of a sequential search procedure based on accumulated symbol frequencies to speed classification of the approximately 400 different symbols encountered in a typical publication. An interactive mode of operation for formatting, scan control, labeling, and post-editing is programmed along with the classification process on an experimental system comprising a small digital computer, an opaque page scanner and monitor, a data-entry tablet, a graphic display console, auxiliary storage units, and a fast digital correlator. Results are reported on some 150 experimental runs on a total of about 30 000 characters from a dozen different source documents. Recognition rates of the order of 99.75 percent are achieved without resorting to post-editing. © 1971, IEEE. All rights reserved.

Date

Publication

IEEE TC