Publication
Journal of Proteome Research
Paper

Clinical and pharmacogenomic data mining: 2. A simple method for the combination of information from associations and multivariances to facilitate analysis, decision, and design in clinical research and practice

View publication

Abstract

The physician and researcher must ultimately be able to combine qualitative and quantitative features from a variety of combinations of observations on data of many component items (i.e., many dimensions), and hence reach simple conclusions about interpretation, rational courses of action, and design. In the first paper of this series, it was noted that such needs are challenging the classical means of using statistics. Hence, the paper proposed the use of a Generalized Theory of Expected Information or "Zeta Theory". The conjoint event [a,b,c,..] is seen as a rule of association for a,b,c,.. associated with a rule strength l(a;b;c;...) = ζ(s,o[a,b,c,..]) - ζ (s,e[a,b,c,...]), where ζ is the incomplete Zeta Function. Here, o[a,b,c,...] is the observed, and e[a,b,c,..] the expected, frequency of occurrence of conjoint event [a,b,c,...]. The present paper explores how output from this approach might be assembled in a form better suited for decision support. Related to this is the difficulty that the treatment of covariance and multivariance was previously rendered as a "fuzzy association" so that the output would fall into a similar form as the true associations, but this was a somewhat ad hoc approach in which only the final l() had any meaning. Users at clinical research sites had subsequently requested an alternative approach in which "effective frequencies" o[] and e[] calculated from the above variances and used to evaluate l() give some intuitive feeling analogous to the association treatment, and this is explored here. Though the present paper is theoretical, real examples are used to illustrate application. One clinical-genomic example illustrates experimental design by identifying data which is, or is not, statistically germane to the study. We also report on some impressions based on applying these techniques in studies of real, extensive patient record data which are now emerging, as well as on molecular design data originally studied in part to test the ability to deduce the effects of simple natural patient sequence variations ("SNPs") on patient protein activity. On the basis of these study experiences, methods of rationalizing and condensing the rules implied by associations and variances between data, as well as discussion of the difficulty of what is meant by "condensed", are presented in the Appendix.

Date

Publication

Journal of Proteome Research

Authors

Share