De-identify data to share it safely

HIPAA doesn’t restrict sharing of de-identified PHI

It’s been 20 years since congress enacted the Health Insurance Portability and Accountability Act (HIPAA), and handling protected health information (PHI) is only becoming more complex with advancing technology. The Privacy Rule outlines how your company as a covered entity can share health information. De-identify PHI to share health information without compromising patient privacy.

You just need to figure out how.

It might seem odd that HIPAA doesn’t restrict sharing of de-identified health information. Here are 2 things to help clear it up:

De-identified health information isn’t recognizable. That means it isn’t personal anymore. Remember, PHI has identifiers (like name or date of birth) and treatment, payment, or condition information (like billing information or procedure codes). Once you remove the identifiers, the data is just health information.
Health information is helpful. By never allowing PHI to be disclosed, HIPAA would be limiting potentially beneficial studies. And they acknowledge this. This is the named rationale for de-identification: it “supports the secondary use of data for comparative effectiveness studies, policy assessment, life sciences research, and other endeavors.”

De-identify data with the Safe Harbor method

The Safe Harbor method relies on two primary steps:

Remove identifiers. Without identifiers, you take the “P” out of “PHI.” The Office for Civil Rights (OCR) organized a workshop to create a concrete checklist of 18 identifiers.
Resolve actual knowledge. Actual knowledge is when your company knows recipients of the de-identified data can re-identify it. Note that this isn’t the same as knowing that there are theoretically ways to re-identify data.

Remove identifiers

We talked in our post last week about how HIPAA is a framework rather than a checklist for securing PHI. Well, the Safe Harbor method is a checklist—mostly. There are 18 identifiers that you’ll need to remove if your company chooses the Safe Harbor method.

You’ll need to remove the following:

Names
Geographic subdivisions smaller than state—except for the first 3 digits of zip codes, given
1. The combined population for zip codes with the same first 3 digits is over 20,000,
  AND
2. The first 3 digits of zip codes with combined populations under 20,000 is replaced with 000—until the 2010 census data is released, these 17 restricted zip codes are: 036, 059, 063, 102, 203, 556, 692, 790, 821, 823, 830, 831, 878, 879, 884, 890, and 893
Dates directly related to the individual (e.g., birthday, death date, or admission date)
1. You may keep the year of a date (e.g., 2016 instead of July 7, 2016), IF
2. You group any year indicative of someone over 89 into a category of age 90 or older
Telephone numbers
Fax numbers
Email addresses
Social security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and serial numbers (e.g., license plate)
Device identifiers and serial numbers
Web Universal Resource Locators (URLs)
Internet Protocol (IP) addresses
Biometric identifiers (e.g., fingerprints)
Full-face photographs and any comparable images
Any other unique identifying number, characteristic, or code
1. Number—e.g., a clinical trial record number
2. Characteristic—e.g., a unique occupation
3. Code—e.g., a barcode

Note: You’ll need to remove the individual’s identifiers, of course. But you’ll also need to remove identifiers of the individual’s relatives, employers, and household members.

Resolve actual knowledge

This is the bit that isn’t a checklist, but it’s probably simpler than you think.

Combined with the 18^th identifier on the checklist, this step requires a bit more analysis.

Here are some examples of actual knowledge:

The records your company is releasing contain career data. If the positions are general enough, this probably won’t be a problem. But say one of the records is from a CEO of a local company. That single piece of information is enough to identify him or her. That would be considered a unique characteristic. In the event that your company knew this, it would be actual knowledge, and the data would not be considered de-identified. You may have to omit that individual’s record all together.
Someone tells you that the recipient has a way to re-identify the data. Don’t take just anyone’s word for it, though. Check with the recipient to make sure they don’t have a way to re-identify the data. Once they assure you that they don’t, you can release the data to them, confident that your patients’ privacy isn’t in jeopardy.

We mentioned it above, but we’ll mention it again here: theoretical knowledge does not equal actual knowledge. There are plenty of people who aren’t satisfied by the de-identification methods offered by HIPAA. Rest assured, though, because separate studies have shown there was less than a 0.25 percent chance that PHI de-identified using either the Safe Harbor or Expert Determination methods could be re-identified using public information.

De-identify data with the Expert Determination method

Where the Safe Harbor method is a checklist, the Expert Determination method is a framework. For this method, you’ll need (you guessed it) an expert. In a nutshell, this method involves someone with experience determining that whatever remains of the data couldn’t identify an individual. They’ll record their methods and results to justify their conclusion that the data has been sufficiently de-identified.

There are more decisions inherent in the Expert Determination method:

Find an expert: There aren’t any degree requirements for experts. So, you’ll want to find someone with a background in mathematics, statistics, etc. who has trained under others who have performed de-identification.
Decide how to de-identify data: Your expert will be able to determine the most secure way to de-identify your data. Combining various methods will ensure the most privacy while giving recipients as much data as is helpful and possible.
Assess data recipients: The Safe Harbor method produces data that will likely be considered de-identified across the board. On the other hand, the Expert Determination method produces data that is de-identified for a specific recipient.

Find the right experts

Your experts won’t all have the same academic degree or amount of experience. They just need to know what they’re doing and how to document it. Outside of the data itself, documentation is the concrete evidence of your de-identification.

During the process, make sure you’re working alongside the experts to get them everything they need. In the official HIPAA guide, the OCR lays out a basic, three-step workflow for experts to follow:

Research: The experts work in conjunction with your company to determine which information is most important to the recipients. This step requires a bit of work alongside your recipients. And you may have to come back to this step several times.
Application: The experts apply what they’ve found from their research. Methodically, they’ll mitigate identifiers and other unique characteristics, codes, or numbers.
Testing: The experts test the de-identified data. If they’re able to re-identify the data based on what’s left and other public data, they’ll return to step one.

Throughout the workflow, keep communication lines with your experts open.

Determine how to de-identify the data

Where the Safe Harbor method redacts information, the Expert Determination method has multiple options. Depending on the data set and the recipient, you might suppress, generalize, or randomize the data.

Suppress: This is the same as redacting the information. Whatever information the expert determines is too telling is left out. Going back to an example above, you could include the CEO’s record if you repressed the career section.
Generalize: Generalizing data puts it in ranges. By saying ages 10 to 15 instead of individual ages, you protect individual privacy while allowing recipients to draw age-specific conclusions.
Randomize: Instead of providing a range of ages, randomization selects an age from within a set. Someone who is 12 might fall into the range of 10 to 15, resulting in a randomized age of 14. The general age remains while ensuring the individual’s privacy.

You may simply generalize all identifiers, or you may suppress select identifiers for select individuals and randomize the majority of identifiers. Frequently, your experts will use a combination of these techniques to de-identify your data.

Know your recipients

Because the Expert Determination method is not a one-size-fits-all approach, the same data set will likely need different items de-identified to different degrees for different recipients.

Guaranteeing that your recipient doesn’t share the data is important. Although HIPAA doesn’t require a data use agreement, your company may decide to require one. Such an agreement can prohibit many things, including re-identification.

Of course, this is a pivotal part of the de-identification process. You’ll want to review the recipient before releasing the final, de-identified information.

Apply this to your business

Know your recipients. Knowledge is power, right? Familiarize yourself with previous de-identified data recipients. Do they still have the data, or have they destroyed it per your data use agreement?
Look at de-identified data. Does your company use the Safe Harbor or Expert Determination method? By looking at a piece of de-identified data, you might be able to tell if you need to switch methods.
Try to re-identify data. If your company keeps a code to re-identify data, where is it and who knows it? Make sure it hasn’t fallen into the wrong person’s hands.

Free guide: Checklist to help you de-identify your PHI

2 methods to safely de-identify protected health information (PHI)