FPVI: A scalable method for discovering privacy vulnerabilities in microdata
Abstract
Governments are increasingly interested in making their data accessible through open data platforms to promote transparency and economic growth. At the same time, recent efforts towards personalized healthcare and smart transportation aim to analyze individuals' data, such as electronic medical records and user mobility patterns, to derive important insights. The implementation of a smart city largely depends on the ability to extract knowledge from person-specific data. This, however, may come at a cost to individuals' privacy. In this paper we propose FPVI, a fast algorithm for discovering privacy vulnerabilities in relational data. FPVI operates in a multi-threaded fashion to index and scan the data for vulnerabilities, while pruning the search space to boost performance. Our experimental evaluation shows that FPVI outperforms the state-of-the-art method and can analyze datasets of 11 million records and 20 attributes in less than 9 minutes.