In the era of data deluge, it is an important task to encourage data usage in an organization. The endeavor for analyzing data, however, could breach an individual’s privacy. In order to expand the scope of data provision while managing the risk of exposure of personal information, professional and technical efforts are required. Recently, the Statistics Research Institute (SRI) has been continuously studying on statistical techniques for controlling the disclosure risk of personal information when providing data, which is called data privacy. Following the previous technical report, Park et al.(2018), this study deals with the experiment of applying differential privacy to real cases of disseminating statistics along with intensive simulations.
In this study, the related theories were first summarized so that the readers who are familiar with the previous studies can understand the application of differential privacy, and then some algorithms for generating differentially private histograms were introduced. Also, simulations for creating a differentially private histogram were conducted, and the results applied to the actual data were presented. In addition, we briefly discussed the possibility of utilizing the synthetic data from differentially private histograms. We hope that the technical efforts performed in this study would contribute to the overall social data sharing and personal information protection activities.
Key words: Differential privacy, synthetic data, disclosure risk, information loss, data privacy