Essential issues in the design of shared document/image libraries
Abstract
We consider what is needed to create electronic document libraries which mimic physical collections of books, papers, and other media. The quantitative measures of merit for personal workstations - cost, speed, size of volatile and persistent storage - will improve by at least an order of magnitude in the next decade. Every professional worker will be able to afford a very powerful machine, but databases and libraries are not really economical and useful unless they are shared. We therefore see a two-tier world emerging, in which custodians of information make it available to network-attached workstations. A client-server model is the natural description of this world. In collaboration with several state governments, we have considered what would be needed to replace paper-based record management for a dozen different applications. We find that a professional worker can anticipate most data needs and that (s)he is interested in each clump of data for a period of days to months. We further find that only a small fraction of any collection will be used in any period. Given expected bandwidths, data sizes, search times and costs, and other such parameters, an effective strategy to support user interaction is to bring large clumps from their sources, to transform them into convenient representations, and only then start whatever investigation is intended. A system-managed hierarchy of caches and archives is indicated. Each library is a combination of a catalog and a collection, and each stored item has a primary instance which is the standard by which the correctness of any copy is judged. Catalog records mostly refer to 1 to 3 stored items. Weighted by the number of bytes to be stored, immutable data dominate collections. These characteristics affect how consistency, currency, and access control of replicas distributed in the network should be managed. We present the large features of a design for network document/image library services. A prototype is being built for State of California pilot applications. The design allows library servers in any environment with an ANSI SQL database; clients execute in any environment; communications are with either TCP/IP or SNA LU 6.2.