Distance-Aware encoding of numerical values for privacy-preserving record linkage
Abstract
In this work, we propose Bit Vectors (BV), an accurate, distance-preserving encoding scheme for representing numerical data values in privacy-preserving tasks. Although many methods have been proposed in the literature for encoding strings, the problem of encoding numerical values has not been effectively addressed yet. In Privacy-Preserving Record Linkage (PPRL), a number of data custodians encode their records and submit them to a trusted third-party that is responsible to identify those records that refer to the same real-world entity. BV is supported by a strong theoretical foundation for embedding numerical values into an anonymization space in a way that preserves the initial distances. Key components of this embedding process are (a) the employed hash functions which, by utilizing random intervals, they allow for approximate matching, and (b) the threshold that is required by the distance computations, which we prove that can be specified in a way that guarantees accurate results.