The k-anonymity Problem is Hard
Paola Bonizzoni, Gianluca Della Vedova and Riccardo Dondi
Abstract
The problem of publishing personal data without giving up privacy is
becoming increasingly important. An interesting formalization recently
proposed is the k-anonymity. This approach requires that the rows in a
table are clustered in sets of size at least k and that all the rows in
a cluster are related to the same tuple, after the suppression of some
records. The problem has been shown to be NP-hard when the values are
over a ternary alphabet, k=3 and the rows length is unbounded. In this
paper we give a lower bound on the approximation of two restrictions of
the problem, when the records values are over a binary alphabet and k=3,
and when the records have length at most 8 and k=4, showing that these
restrictions of the problem are APX-hard.