Tuesday, October 28, 2014

Tokenization has solved PCI night mare - Can it solve PII and PHI as well ?

Recently the topic handling sensitive information PII / PHI in most enterprises has been gaining a lot of heat. Since the industry has converged and solved the storing of CC numbers using tokenization techniques, the general theme from both the software vendors and internal implementation folks has been to apply the same hammer to this not so identical peg.
Well when you have only one hammer, every thing that you see is just a NAIL.

PII is much different than PCI. PII is also creatively confused with no clear demarcation of what fields constitute a linkage to a person

Name - First Name and Last Name
Nick Name
Email Address
Phone numbers
Residential address
Photo
Social profile - Linked In, Facebook, Twitter handle
National ID
SSN
Tax ID
Date of Birth
Place of Birth
Passport number
IMEI number of a cell phone
FingerPrint
Mothers Maiden Name
Users IP address
Medical
Financial
Educational
Utility Records
Drivers License Number
Vehicle Registration plate number

Ok this is a huge list, now most applications that face customers have some or many pieces of such information embedded in their data model, message model, object model, canonical models.
If they have to be treated like a CC number in a PCI domain, what that does it really mean ?
Do you think the statement makes any sense ?

Now flip the coin around and think of scenarios where you have to deal with the above fields in  your select statement, where clause, joins, orderby, filters - most common answer should be all the time.

If you were to compare the frequency of performing a select operation with a complex where clause on the above fields as compared to a CC number it would reveal the complexity behind the subject of tokenization Even though the practice of storing SSN in the clear is frowned upon there are many organizations that still store them in the clear, and some times these are multi billion dollar corporations.

The PII data comes into an organization via online channels ( user registration / interaction ) or via offline channels ( batch registrations, bulk imports etc). They are used in online transaction processing and post online usage they are used for a variety of reporting features within every organization.

So what is the right choice to deal with this problem ?
How do you store these sensitive information securely ?
How do you move these sensitive information across  Offline registrations to Online transaction processing to Online analytical processing.
How do you manage backups, tapes, decommissioning storage etc ?
How to make the storage of these sensitive information search friendly for use cases such as wild card searches, like searches, soundex searches etc.

i have some solutions in play and in past experiences, im keen to know how others are solving this today
Please post your opinions