Hypothesis testing in the high privacy limit
Binary hypothesis testing under the Neyman-Pearson formalism is a statistical inference framework for distinguishing data generated by two different source distributions. Privacy restrictions may require the curator of the data or the data respondents themselves to share data with the test only after applying a randomizing privacy mechanism. Using mutual information as the privacy metric and the relative entropy between the two distributions of the output (post-randomization) source classes as the utility metric (motivated by the Chernoff-Stein Lemma), this work focuses on finding an optimal mechanism that maximizes the chosen utility function while ensuring that the mutual information based leakage for both source distributions is bounded. Focusing on the high privacy regime, an Euclidean information-theoretic (E-IT) approximation to the tradeoff problem is presented. It is shown that the solution to the E-IT approximation is independent of the alphabet size and clarifies that a mutual information based privacy metric preserves the privacy of the source symbols in inverse proportion to their likelihood.