Achieve data privacy and clustering accuracy simultaneously through quantized data recovery
This paper develops a data collection and processing framework that achieves individual users’ data privacy and the operator’s information accuracy simultaneously. Data privacy is enhanced by adding noise and applying quantization to the data before transmission, and the privacy of an individual user is measured by information-theoretic analysis. This paper develops a data recovery and clustering method for the operator to extract features from the privacy-preserving, partially corrupted, and partially observed measurements of a large number of users. To prevent cyber intruders from accessing the data of many users, it also develops a decentralized algorithm such that multiple data owners can collaboratively recover and cluster the data without sharing the raw measurements directly. The recovery accuracy is characterized analytically and showed to be close to the fundamental limit of any recovery method. The proposed algorithm is proved to converge to a critical point from any initial point. The method is evaluated on recorded Irish smart meter data and UMass smart microgrid data.