Tuesday, October 28, 2014

Tokenization has solved PCI night mare - Can it solve PII and PHI as well ?

Recently the topic handling sensitive information PII / PHI in most enterprises has been gaining a lot of heat. Since the industry has converged and solved the storing of CC numbers using tokenization techniques, the general theme from both the software vendors and internal implementation folks has been to apply the same hammer to this not so identical peg.
Well when you have only one hammer, every thing that you see is just a NAIL.

PII is much different than PCI. PII is also creatively confused with no clear demarcation of what fields constitute a linkage to a person

Name - First Name and Last Name
Nick Name
Email Address
Phone numbers
Residential address
Photo
Social profile - Linked In, Facebook, Twitter handle
National ID
SSN
Tax ID
Date of Birth
Place of Birth
Passport number
IMEI number of a cell phone
FingerPrint
Mothers Maiden Name
Users IP address
Medical
Financial
Educational
Utility Records
Drivers License Number
Vehicle Registration plate number

Ok this is a huge list, now most applications that face customers have some or many pieces of such information embedded in their data model, message model, object model, canonical models.
If they have to be treated like a CC number in a PCI domain, what that does it really mean ?
Do you think the statement makes any sense ?

Now flip the coin around and think of scenarios where you have to deal with the above fields in  your select statement, where clause, joins, orderby, filters - most common answer should be all the time.

If you were to compare the frequency of performing a select operation with a complex where clause on the above fields as compared to a CC number it would reveal the complexity behind the subject of tokenization Even though the practice of storing SSN in the clear is frowned upon there are many organizations that still store them in the clear, and some times these are multi billion dollar corporations.

The PII data comes into an organization via online channels ( user registration / interaction ) or via offline channels ( batch registrations, bulk imports etc). They are used in online transaction processing and post online usage they are used for a variety of reporting features within every organization.

So what is the right choice to deal with this problem ?
How do you store these sensitive information securely ?
How do you move these sensitive information across  Offline registrations to Online transaction processing to Online analytical processing.
How do you manage backups, tapes, decommissioning storage etc ?
How to make the storage of these sensitive information search friendly for use cases such as wild card searches, like searches, soundex searches etc.

i have some solutions in play and in past experiences, im keen to know how others are solving this today
Please post your opinions

Saturday, February 8, 2014

The opinion on NoSQL

A general impression among the cool kids in the block is traditional RDBMS really stink and organizing data as columnar structures or key value pairs is really incredible and provides great set of functionality and performance that traditional RDBMS has suffered.
So in analyzing the NoSQL evolution, there are 3 main paradigms under which the products can be classified

  1. Key Value pair - Eg Redis
  2. Document Style - Eg Mongo Couch
  3. ColumnFamily - Eg Cassandra

The underlying filesystem / storage structures span a wide variety from the Bigtable variants, Dyanmo, BSON ( Binary JSON used in Mongo) to ColumnFamily as in Cassandra or B-Tree as in CouchDB.

The main argument is NoSQL is schema less, there is no concept of a table with set number of columns and the data rather can be unstructured.
I say, fair enough, its a good fit for such data.
The problem i have is this movement which is NO-SQL for a reason is marching very fast towards SQL. 
The classic constructs of a SQL databases such as Primary Keys, a Query Language, 
Right from CQL of Cassandra to Hive or even the dead Unql the NoSQL community is struggling to provide SQL like interfaces to query the nosql stores.

Does that not reveal a systemic issue, the conversion of Relational data modelers into a nosql data model experts is no overnight task, Organizations who have undertaken this path right, first take the RDBMS experts who understand the business, data and relationships and put them through a strong NoSQL journey. This includes formal training, in house presentations, vendor demos, vendor talks, local user group participations, attending conferences, knowledge share with other organizations doing similar things etc.
This creates a good energy in the grass roots of an organization to enrich the appetite to take on workloads and model them in the NoSQL paradigm where its an appropriate fit.
Most common failures in NoSQL implementation i have been seeing are in projects where an Oracle RBDMS expert is asked to build a Cassandra CF overnight or a J2EE development lead is asked to design a mongo or couchdb schema.
Both are recipes for failures. What is needed is some serious investment in hiring talent, cross pollinating existing DBMS talent to appreciate the NoSQL model and organically create transformation for new workloads, migration of old workloads etc.

One note though, the NoSQL paradigm is very interesting for good use cases, but i feel the NoSQL movement is in a state of denial as the problems of traditional DBMS are re-manifesting itself into the NoSQL world.





Thursday, January 30, 2014

SOA BPM and More....

In a world of simplistic architecture deliveries, there exists business needs to create fairly sophisticated BPM architectures using very innovative technology, product and architecture trends. Its one of my new interest dimensions and apart from cloud, high volume, high scale applications i would be devoting my time to immerse in complex BPMS/SOA systems and will post research notes on them.
In the world of BPM and SOA the very first thought leaders to mention is Paul.C.Brown and his tremendous contributions to this community, he is inspiring and motivational. Kudos to him.
Paul has some very interesting material in his total architecture site about BPM and SOA and its inter-dependencies, commonly mis-understood terminologies and design paradigms. Its a great read.
Extending how SOA and BPM can be implemented in a rapid delivery model using delivery methodologies such as Agile / Scrum and also combine modern concepts such as CI/CD and leverage devops in cloud using BPM cloud offerings would be a cool challenging work.
Often times they are very heavy investment upfront in setting up the foundation / plumbing and takes a lot of time to get the projects off the ground the momentum gets lost in expending energy in the heavy lifting involved in the upfront stages of planning and foundational elements of such projects.
An interesting domain would be to explore how to standup such delivery teams fairly quickly, automate the development needs and onboard resources to be productive in delivering either BPM processess, SOA services, UI page flows, Composite apps, Composable services just like how traditional web apps are put together using end-to-end full stack devs or a distributed team that does UI, webapp, services and data accesss in separate teams.
This is going to be an interesting challenge to solve for and will post what i find.

Wednesday, January 8, 2014

Top 10 Cloud stories of 2013

http://yourstory.com/2013/12/top-10-cloud-platforms-2013/#

Interesting to note the the stories from 2 to 10 are fighting to compete with 1