NRF: A Naive Re-identification Framework

Shubhra Kanti, Karmaker Santu, Vincent Bindschadler, ChengXiang Zhai, and Carl A. Gunter recently published a paper titled NRF: A Naive Re-identification Framework:

The promise of big data relies on the release and aggregation of data sets. When these data sets contain sensitive information about individuals, it has been scalable and convenient to protect the privacy of these individuals by de-identification. However, studies show that the combination of de-identified data sets with other data sets risks re-identification of some records. Some studies have shown how to measure this risk in specific contexts where certain types of public data sets (such as voter roles) are assumed to be available to attackers. To the extent that it can be accomplished, such analyses enable the threat of compromises to be balanced against the benefits of sharing data. For example, a study that might save lives by enabling medical research may be enabled in light of a sufficiently low probability of compromise from sharing de-identified data. In this paper, we introduce a general probabilistic re-identification framework that can be instantiated in specific contexts to estimate the probability of compromises based on explicit assumptions. We further propose a baseline of such assumptions that enable a first-cut estimate of risk for practical case studies. We refer to the framework with these assumptions as the Naive Re-identification Framework (NRF). As a case study, we show how we can apply NRF to analyze and quantify the risk of re-identification arising from releasing de-identified medical data in the context of publicly-available social media data. The results of this case study show that NRF can be used to obtain meaningful quantification of the re-identification risk, compare the risk of different social media, and assess risks of combinations of various demographic attributes and medical conditions that individuals may voluntarily disclose on social media.

ACM Workshop on Privacy in an Electronic Society (WPES ’18), Toronto, Canada, October 2018.  DOI: 10.1145/3267323.3268948

Property Inference Attacks

Karan Ganju, Qi Wang, Wei Yang, Carl A. Gunter, and Nikita Borisov recently published a paper titled Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations:

With the growing adoption of machine learning, sharing of learned models is becoming popular. However, in addition to the prediction properties the model producer aims to share, there is also a risk that the model consumer can infer other properties of the training data the model producer did not intend to share. In this paper, we focus on the inference of global properties of the training data, such as the environment in which the data was produced, or the fraction of the data that comes from a certain class, as applied to white-box Fully Connected Neural Networks (FCNNs). Because of their complexity and inscrutability, FCNNs have a particularly high risk of leaking unexpected information about their training sets; at the same time, this complexity makes extracting this information challenging. We develop techniques that reduce this complexity by noting that FCNNs are invariant under permutation of nodes in each layer. We develop our techniques using representations that capture this invariance and simplify the information extraction task. We evaluate our techniques on several synthetic and standard benchmark datasets and show that they are very effective at inferring various data properties. We also perform two case studies to demonstrate the impact of our attack. In the first case study we show that a classifier that recognizes smiling faces also leaks information about the relative attractiveness of the individuals in its training set. In the second case study we show that a classifier that recognizes Bitcoin mining from performance counters also leaks information about whether the classifier was trained on logs from machines that were patched for the Meltdown and Spectre attacks.

ACM Computer and Communications Security (CCS ’18), Toronto Canada, October 2018.  DOI:10.1145/3243734.3243834

Meaningful healthcare security

Juhee Kwon and Eric Johnson recently published an article aimed at the question Does “meaningful-use” attestation improve information security performance? 

Certification mechanisms are often employed to assess and signal difficult-to-observe management practices and foster improvement. In the U.S. healthcare sector, a certification mechanism called meaningful-use attestation was recently adopted as part of an effort to encourage electronic health record (EHR) adoption while also focusing healthcare providers on protecting sensitive healthcare data. This new regime motivated us to examine how meaningful-use attestation influences the occurrence of data breaches. Using a propensity score matching technique combined with a difference-in-differences (DID) approach, our study shows that the impact of meaningful-use attestation is contingent on the nature of data breaches and the time frame. Hospitals that attest to having reached Stage 1 meaningful-use standards observe fewer external breaches in the short term, but do not see continued improvement in the following year. On the other hand, attesting hospitals observe short-term increases in accidental internal breaches but eventually see long-term reductions. We do not find any link between malicious internal breaches and attestation. Our findings offer theoretical and practical insights into the effective design of certification mechanisms.

The full paper appears in in MIS Quarterly. Vol. 42, No. 4 (December), 1043-1067, 2018. DOI: 10.25300/MISQ/2018/13580