All things Architecture

Tuesday, October 28, 2014

Tokenization has solved PCI night mare - Can it solve PII and PHI as well ?

Recently the topic handling sensitive information PII / PHI in most enterprises has been gaining a lot of heat. Since the industry has converged and solved the storing of CC numbers using tokenization techniques, the general theme from both the software vendors and internal implementation folks has been to apply the same hammer to this not so identical peg.
Well when you have only one hammer, every thing that you see is just a NAIL.

PII is much different than PCI. PII is also creatively confused with no clear demarcation of what fields constitute a linkage to a person

Name - First Name and Last Name
Nick Name
Email Address
Phone numbers
Residential address
Photo
Social profile - Linked In, Facebook, Twitter handle
National ID
SSN
Tax ID
Date of Birth
Place of Birth
Passport number
IMEI number of a cell phone
FingerPrint
Mothers Maiden Name
Users IP address
Medical
Financial
Educational
Utility Records
Drivers License Number
Vehicle Registration plate number

Ok this is a huge list, now most applications that face customers have some or many pieces of such information embedded in their data model, message model, object model, canonical models.
If they have to be treated like a CC number in a PCI domain, what that does it really mean ?
Do you think the statement makes any sense ?

Now flip the coin around and think of scenarios where you have to deal with the above fields in your select statement, where clause, joins, orderby, filters - most common answer should be all the time.

If you were to compare the frequency of performing a select operation with a complex where clause on the above fields as compared to a CC number it would reveal the complexity behind the subject of tokenization Even though the practice of storing SSN in the clear is frowned upon there are many organizations that still store them in the clear, and some times these are multi billion dollar corporations.

The PII data comes into an organization via online channels ( user registration / interaction ) or via offline channels ( batch registrations, bulk imports etc). They are used in online transaction processing and post online usage they are used for a variety of reporting features within every organization.

So what is the right choice to deal with this problem ?
How do you store these sensitive information securely ?
How do you move these sensitive information across Offline registrations to Online transaction processing to Online analytical processing.
How do you manage backups, tapes, decommissioning storage etc ?
How to make the storage of these sensitive information search friendly for use cases such as wild card searches, like searches, soundex searches etc.

i have some solutions in play and in past experiences, im keen to know how others are solving this today
Please post your opinions

Saturday, February 8, 2014

The opinion on NoSQL

A general impression among the cool kids in the block is traditional RDBMS really stink and organizing data as columnar structures or key value pairs is really incredible and provides great set of functionality and performance that traditional RDBMS has suffered.
So in analyzing the NoSQL evolution, there are 3 main paradigms under which the products can be classified

Key Value pair - Eg Redis
Document Style - Eg Mongo Couch
ColumnFamily - Eg Cassandra

The underlying filesystem / storage structures span a wide variety from the Bigtable variants, Dyanmo, BSON ( Binary JSON used in Mongo) to ColumnFamily as in Cassandra or B-Tree as in CouchDB.

The main argument is NoSQL is schema less, there is no concept of a table with set number of columns and the data rather can be unstructured.

I say, fair enough, its a good fit for such data.

The problem i have is this movement which is NO-SQL for a reason is marching very fast towards SQL.

The classic constructs of a SQL databases such as Primary Keys, a Query Language,

Right from CQL of Cassandra to Hive or even the dead Unql the NoSQL community is struggling to provide SQL like interfaces to query the nosql stores.

Does that not reveal a systemic issue, the conversion of Relational data modelers into a nosql data model experts is no overnight task, Organizations who have undertaken this path right, first take the RDBMS experts who understand the business, data and relationships and put them through a strong NoSQL journey. This includes formal training, in house presentations, vendor demos, vendor talks, local user group participations, attending conferences, knowledge share with other organizations doing similar things etc.

This creates a good energy in the grass roots of an organization to enrich the appetite to take on workloads and model them in the NoSQL paradigm where its an appropriate fit.

Most common failures in NoSQL implementation i have been seeing are in projects where an Oracle RBDMS expert is asked to build a Cassandra CF overnight or a J2EE development lead is asked to design a mongo or couchdb schema.

Both are recipes for failures. What is needed is some serious investment in hiring talent, cross pollinating existing DBMS talent to appreciate the NoSQL model and organically create transformation for new workloads, migration of old workloads etc.

One note though, the NoSQL paradigm is very interesting for good use cases, but i feel the NoSQL movement is in a state of denial as the problems of traditional DBMS are re-manifesting itself into the NoSQL world.

Thursday, January 30, 2014

SOA BPM and More....

In a world of simplistic architecture deliveries, there exists business needs to create fairly sophisticated BPM architectures using very innovative technology, product and architecture trends. Its one of my new interest dimensions and apart from cloud, high volume, high scale applications i would be devoting my time to immerse in complex BPMS/SOA systems and will post research notes on them.
In the world of BPM and SOA the very first thought leaders to mention is Paul.C.Brown and his tremendous contributions to this community, he is inspiring and motivational. Kudos to him.
Paul has some very interesting material in his total architecture site about BPM and SOA and its inter-dependencies, commonly mis-understood terminologies and design paradigms. Its a great read.
Extending how SOA and BPM can be implemented in a rapid delivery model using delivery methodologies such as Agile / Scrum and also combine modern concepts such as CI/CD and leverage devops in cloud using BPM cloud offerings would be a cool challenging work.
Often times they are very heavy investment upfront in setting up the foundation / plumbing and takes a lot of time to get the projects off the ground the momentum gets lost in expending energy in the heavy lifting involved in the upfront stages of planning and foundational elements of such projects.
An interesting domain would be to explore how to standup such delivery teams fairly quickly, automate the development needs and onboard resources to be productive in delivering either BPM processess, SOA services, UI page flows, Composite apps, Composable services just like how traditional web apps are put together using end-to-end full stack devs or a distributed team that does UI, webapp, services and data accesss in separate teams.
This is going to be an interesting challenge to solve for and will post what i find.

Wednesday, January 8, 2014

Wednesday, November 27, 2013

Using Cache to improve performance of web scale properties

Recently viewed a Reinvent Session on Cache and its importance on delivering superior quality user experience from response times and page loads.
The layered cake and peeling the onion approach of HTML 5 Browser cache + Edge / CDN Cache + Web Server / front end LB cache + App server / Application cache + DB cache is a powerful combination to produce good and effective end user responses.
Add to that the independent analysis by HTTPArchive.org and WebPageTest.org provides great insight into your public facing web properties.
In my current project im employing some of the constructs of the caching i could not use in release 1 due to constraints

Google - Cache is King from AWS Re:invent 2013

Sada

Identity and Access Management

IAM is a huge topic as most architects know and deal with on a daily basis.
But some interesting trends on the IAM industry.
I classify the work involved in IAM into a few categories

Traditional Provisioning Challenges - User Provisioning / de-provisioning, centralized user stores, Corporate LDAP / Directories.
Enhanced Provisioning challenges - User / Group / Role management, Learning/Skill/Cert/Attestation based role changes, attributes management and utilizing them in corporate and business application
Extend IAM to large end user population, solve the provisioning for scale,volume,high availability etc
Entitlement/Permissions management - Move apps from managing permissions to more centralized permission management model
Declarative access control / resource protection - Reverse proxy model to protect web resources by a centralized policy store
Federation of identities and social integration
Support for standards based identity integration using SAML, OAUTH.
Identity management in Mobile devices - MDM and IAM integration story

As we all know the large players in this space are Oracle, CA, IBM, Microsoft and whole set of boutique niche players such as Okta, Ping, Courion,Sailpoint,Hitachi ID, Symplified and a variety of open source systems like OPENAM and JOSSO.

Recently amazon introduced the AWS IAM support for SAML to promote federation as well.

For an independent IT organization centeralized IAM and Cross domain SSO is a long vision and to realize it requires strong vision, leadership, product roadmap and an effective combination of best of breed products that stand out in their own realms.

Sada

Friday, November 22, 2013

Unconventional approaches to Architecture Views

As architects we have all had a variety of methods, tools, stencils, approaches to represent views as a tool to capture the essence of a system / software / application to capture concerns of various stakeholder's viewpoints.
Nothing new or fancy.. i agree.
This is very true for enterprises and not for startups as you may also know. Startups use technics like Business model generators, Leanstack, LeanCanvas etc to represent the dimensions of the problems they are trying to solve but the view is more geared to a VC who would like to fund the idea.

As i start working on new and unconventional projects - not your typical web apps, high volume webscale properties, content managed apps but more backend business process oriented and more around carrying out the core of a business but from behind the scenes with little to no UI components i started to explore new-traditional ways to represent architecture.

Some interesting observations on this problem are provided below

http://www.codingthearchitecture.com/presentations/gotoams2013-effective-software-architecture-sketches

I also started exploring really breaking ways to represent this.
For example - Use case view/ business process view / Use case surveys we can use standard UML notations, bubbles/ actors, interactions etc but how about taking the UPS whiteboard guy and using the UPS ad whiteboarding technique to represent every use case -

Check this out

http://www.iqagency.com/work/whiteboard-site-v3

http://portfolio.iqagency.com/ups/whiteboard08/

Similar to this for every view such as Logical, Context, Data, Deployment, Physical, Security there has to be unique ways to represent rather than the boring usual content that puts the audience to sleep.

I will post as i research and find new stuff

Sada Rajagopalan