Title
Privacy-preserving distributed clustering
Author
Erkin, Z.
Veugen, T.
Toft, T.
Lagendijk, R.L.
Publication year
2013
Abstract
Clustering is a very important tool in data mining and is widely used in on-line services for medical, financial and social environments. The main goal in clustering is to create sets of similar objects in a data set. The data set to be used for clustering can be owned by a single entity, or in some cases, information from different databases is pooled to enrich the data so that the merged database can improve the clustering effort. However, in either case, the content of the database may be privacy sensitive and/or commercially valuable such that the owners may not want to share their data with any other entity, including the service provider. Such privacy concerns lead to trust issues between entities, which clearly damages the functioning of the service and even blocks cooperation between entities with similar data sets. To enable joint efforts with private data, we propose a protocol for distributed clustering that limits information leakage to the untrusted service provider that performs the clustering. To achieve this goal, we rely on cryptographic techniques, in particular homomorphic encryption, and further improve the state of the art of processing encrypted data in terms of efficiency by taking the distributed structure of the system into account and improving the efficiency in terms of computation and communication by data packing. While our construction can be easily adjusted to a centralized or a distributed computing model, we rely on a set of particular users that help the service provider with computations. Experimental results clearly indicate that the work we present is an efficient way of deploying a privacy-preserving clustering algorithm in a distributed manner. © 2013 Erkin et al.; licensee Springer.
Subject
Communication & Information
ISEC - Information Security
TS - Technical Sciences
Infostructures
Informatics
Information Society
Cryptography
Data processing
Database systems
Distributed computing models
Distributed structures
Ho-momorphic encryptions
Information leakage
Privacy preserving
Social environment
Clustering algorithms
To reference this document use:
http://resolver.tudelft.nl/uuid:f465c037-86c2-41dc-a313-fd1fe2cd044e
DOI
https://doi.org/10.1186/1687-417x-2013-4
TNO identifier
503179
Source
Eurasip Journal on Information Security, 2013, 1-15
Document type
article