Data deidentification involves the removal of personal identifiers such that the data cannot be re-identified.
In addition to removing the 18 Protected Health Information (PHI) identifiers, accurate de-identification requires the removal of indirect identifiers that by themselves are not unique, but could be used with other information to create identifiers. Data de-identification becomes more even complex if the data may need to be re-identified by the study in the future.
We recommend that studies work with a data deidentification expert recognized by the Data Trust in order to assure that their data are truly de-identified.
Often, studies find that they cannot work with a fully de-identified data set and instead need to work with a Limited Data Set.
In some cases, especially when widely sharing data, studies may need to anonymize the data. Data anonymization involves purposeful “blurring” or “obfuscation” of the data in order to further reduce the possibility of reidentification. Date shifting is an example of a data anonymization technique.
Useful Materials to Learn More
- A guide for protecting identifiers in human subjects data: https://guides.library.jhu.edu/protecting_identifiers
- A handout for de-identification Tips: https://osf.io/7fpmw/
- De-identifying Human Subjects Data for Sharing Online Module: https://dataservices.library.jhu.edu/training-workshops/research-data-management-sharing/de-identifying-human-subjects-data-for-sharing/