Data Redaction: What it is, why you need it, and how to start.

It could have been avoided.

Have you ever published sensitive customer data on your website? The Medical Council of New South Wales didn’t think they had until Google indexed a report containing Protected Health Information (PHI).

Oops.

The lack of a proper redaction methodology caused a huge mess. The employee who published the data simply placed a black box over an area of the PDF report containing the PHI data. For better or worse, search engines can find what is hidden from our eyes. The client’s information was still in the PDF file – the black box only hid it from anyone looking at the report on their website.

Google’s bots indexed the PHI data along with the rest of the web page, returning the information in search results. That, my friends, is a recipe for disaster.

What is Redaction?

Redact: to edit, or prepare for publishing.

It’s a simple concept but can be quite difficult to implement. Like so many simple things, this notion of editing and removing information before publishing or distributing information has a deep impact on security.

Basic Example:

Before: I would like to go to the park.

After: [Redacted] would like to go to the park.

For all you know, there was a specific person’s name in the [Redacted] portion. The point is that the original content was edited; typically in preparation for publishing or disbursement. Redaction isn’t simply removing the ability to see sensitive data, it is removing all traces of the content.

As pointed out by Law Technology Today, it is not just the process of redacting that is important, but redacting properly so as not to leave any leaks in your data trail.

Here you see a graph from Google Trends showing the growth in searches for the term “redact”. There weren’t any noticeable searches before 2005. Stringent regulations and a litigious culture ensure that redaction is growing and will be around for a long time to come.

Redaction Image_5

Why is Redaction Important?

All companies have sensitive data that needs to be secured and it can likely be boiled down to 3 categories: Employee Data, Customer Data, or Company Data. Revoking access to sensitive information is the first step securing PII, PHI, & PCI. Redacting unnecessary information brings an elevated level of security for you and your customers.

Consider this example:

Your business has stored customer PII. You might only need this information for a short time before it sits idle, unused. As long as this information is in available, you are at risk – in some cases great risk.

There are many circumstances where redaction will mitigate your risk, but each may require a different workflow.

Follow this guide to help your company stay safe.

Methods of Redaction

Page location/region specific redaction

Let’s assume you receive standardized reports that contain personally identifiable information, protected health information, or another type of sensitive data. If your formats are consistent, you’ll need a tool that can strip out the content in these areas.

As mentioned in the case of failed redactions, these changes need to be made at all levels of the content. If it is stored in a database as well as in the source file, both locations must have the content redacted to remove the content.

Redaction Methods

Pattern Matching Redaction

You might be receiving information in a variety of formats or need to scan your entire repository for a particular kind of data. One effective method of redaction in this environment is to use pattern matching to identify sensitive data.

To use this method, determine different patterns the data contains.

For example, social security numbers will follow this pattern xxx-xx-xxxx. You might have a sequence of numbers and letters that make up account numbers that should be redacted at some point.

Here are a few examples of patterns you can search for:

Social Security Numbers: xxx-xxx-xxxx
Credit Card Numbers: xxxx xxxx xxxx xxxx or 3xxx xxxxxx xxxxx
Email addresses: xxxx@xxxx.com/org/net/(other TLDs)

In additional to these universal patterns, your organization will have their proprietary patterns for account numbers and other identifying information.

Manual Redaction

In the scenarios requiring redacting once it’s been used for its intended purpose, you may need to redact manually. If at all possible, try to avoid manual redaction. Incomplete redactions leave sensitive data intact in places you didn’t know existed. It’s best to leave this to automated processes.

If the data the you’re redacting doesn’t follow any recognizable pattern, moves around within a report, or is otherwise unable to be automatically redacted you will have to do so manually. Make sure you are following a documented process to ensure you redact the information completely. Ask yourself whether an auditor would accept your process as legal and thorough. If you’re making up your own rules and thinking, “it’s probably fine…”, it’s probably not.

When to redact your content

Every organization is going to have different requirements for information retention and workflows that work along with redaction requirements. That said, there are five primary times when you might want to redact content.

1. Upon Acquisition

When you receive your reports, you may want to redact sensitive data immediately. Perhaps the information is not pertinent to the job function of certain employees viewing the report. (As an example, would someone in finance need to know diagnostic codes outlined in some insurance documents? Perhaps not.) If this is the case for you, use the two methods of redaction we discussed above.

Page Region Redaction
Pattern Matching Redaction

Making use of these two methods will help you remove sensitive data not necessary for your employees to access. Any areas that cannot be automatically redacted can be done manually.

This process will run before the report becomes available to anyone in the organization. You will want to have someone with proper security clearance double-check the report so you can rest assured that the report has been properly redacted before it is released.

2. Prior to distribution

Your reports may be distributed to various people inside or outside of the organization. If all data is necessary in your initial workflow, it won’t be appropriate to redact it at the time of acquisition. When it is time to share it with others, however, it may be time to redact the sensitive data.

Again, you could use any of the methods to accomplish your redaction needs. It may make more sense in ad-hoc document distribution to manually redact the portions that should not be shown to those receiving the report. I say this because the data that’s important and useful likely changes based on who is using the document.

When to redact content

3. After the work is complete

Just like shredding old documents once they aren’t needed anymore, you can redact sensitive data in your reports upon completion of tasks that rely on that specific information. This ensures you can still retain the bulk of the report content without the heightened risk of keeping the PII or other types of data.

For these scenarios, since this is a part of a routine, having an automated process makes the most sense. Be careful…routine operations is where we sometimes cut corners especially when we must execute them manually. That’s dangerous in redaction. Systematizing the process will prevent cut corners and security loop-holes.

4. Prior to archive

When a given report has reached the end of its useful life and you’re prepping to archive it, redacting the identifying information lowers the risk of retaining the rest of the information. Many organizations have automated archiving procedures that should be coupled with a redaction tool to strip out sensitive information.

Seeing as this will ideally be an automated process for a wide array of content, you’ll need a tool that can handle multiple redaction rules for various documents.

For example:

You may wish to redact a particular property within the report when it is archived while leaving other information intact. This allows you to retrieve the most crucial information if necessary.

5. Prior to Disposal

Eventually, the time comes to destroy and dispose of old reports and documents containing PII or PHI data. Redacting as much of the information as feasible prior to deleting leaves you with the highest level of security. Not only have you removed the reports from your repository, but you first redacted any customer or private company data, making recovering any of the information increasingly difficult.

Putting your customers at ease by outlining this procedure is going above and beyond normal data disposal standards, and will give them extra assurance that you have properly handled this sensitive data.

How to apply this in your business:

First, determine if you have the proper tools to start redacting report content. If it means starting out with a manual process, you can still get started.
Call together stakeholders from different disciplines in your business. Find out what types of information you need to consider redacting and which reports might contain that information. If you need help scanning your current repository, you might start by having someone in IT write a script that will find data matching certain patterns like SSN’s or your internal account numbers. This should give you a grasp as to where to start looking.
Decide what method(s) you will use to redact the information. Page region, pattern matching, or manual redaction.
Decide when the data should be redacted: Upon acquisition, before distribution, after the work is complete, prior to archiving, or prior to disposal.
Once you have these action items sorted out, it’s time to implement your redaction process.