About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
AISTATS 2018
Conference paper
Probability – Revealing samples
Abstract
In the most popular distribution testing and parameter estimation model, one can obtain information about an underlying distribution D via independent samples from D. We introduce a model in which every sample comes with the information about the probability of selecting it. In this setting, we give algorithms for problems such as testing if two distributions are (approximately) identical, estimating the total variation distance between distributions, and estimating the support size. The sample complexity of all of our algorithms is optimal up to a constant factor for sufficiently large support size. The running times of our algorithms are near-linear in the number of samples collected. Additionally, our algorithms are robust to small multiplicative errors in probability estimates. The complexity of our model lies strictly between the complexity of the model where only independent samples are provided and the complexity of the model where additionally arbitrary probability queries are allowed. Our model finds applications where once a given element is sampled, it is easier to estimate its probability. We describe two scenarios in which all occurrences of each element are easy to explore once at least one copy of the element is detected.