Let’s look at how some current methods of data sharing measure up:
- Full access to raw data, or sharing unfiltered data, has high analytic utility, and high ease-of-setup, but very low privacy.
- Data pseudonymization – removing the associations between an individual’s identity and their PI – has high analytic utility, but ease-of-setup is lower, as the data custodian must configure appropriate protocols to de-identify the dataset. The level of privacy is also low, due to the risk of re-identification.
- Data masking, shuffling, suppression, and removing outliers are all tools used to protect PI; they vary in the level of protection offered, the analytic utility of the data released, and ease of setup. These tactics may require specialized software and skills, which makes set-up more complex.
- A low-risk approach using the risk-based de-identification model sacrifices analytic utility for privacy, and is difficult to set up; conversely, a high-risk approach provides high analytic utility at the expense of privacy.