Enabling analysis and measurement of conventional software development documents using project-specific formalism
Abstract
We describe a new approach to modeling and analyzing software development documents that are typically written using conventional office applications. Our approach brings automation to content extraction, quality checking and measurement of massive document artifacts that tend to be handled by labor-intensive manual work in industry today. Rather than seeking an approach based on creation or rewriting of contents using more rigid, machine-friendly representations such as standardized formal models and restricted languages, we provide a method to deal with the diversity of document artifacts by making use of project-specific formalism that exists in target documents. We demonstrate that such project-specific formalism often tends to "naturally" exist at syntactic levels, and it is possible to define a "document model", a logical data representation gained by transformation rule from the physical, syntactic structure to the logical, semantic structure. With this transformation, various quality checking rules for completeness, consistency, traceability, etc., are realized by evaluating constraints for data items in the logical structure, and measurement of these quality aspects is automated. We developed a tool to allow a user to easily define document models and checking rules, and provide the insights on transformations when defining document models for various industry specification documents written in word processor files, spreadsheets and presentations. We also demonstrate the use of natural language processing can improve document modeling and quality checking by compensating for a weakness of formalism and applying analysis to specific parts of the target documents. © 2011 IEEE.