Tuesday, May 14, 2013

Security Tutorials on mHealth Security and Auditing - #FHIR

The two presentations that I gave at the HL7 meeting Wednesday afternoon “Free Security Tutorial”, and again at the Joint Security/EHR/FHIR/SOA meeting on Thursday; are posted on the HL7.org web site. They are:

Security Education: mHealth Security and FHIR

This presentation is made up of current viewpoint
on mHealth security basics, risk-assessment models, network communications security, and user identity and access management. This information is on the HL7 FHIR site, and will improve over the coming month. Front and center is the IHE-Internet User Authorization (IUA) profile, a profiling of oAuth 2.0. Much of the material I cover is also covered on my blog at the following:

Security Education: Security/Privacy Audit Logging and Reporting

Wednesday, May 1, 2013

De-Identification - Data Chemistry

The concept of de-identification is a reoccurring theme in my circles. The use of the term de-identification that I use is the broader term well beyond the constraints of HIPAA. I use the term de-identification to refer to the process of reducing risk of privacy or identity exposure through modifying the data. This includes using pseudonyms, known as pseudonymization; and also includes removing data elements, known as anonymization. Therefore De-Identification is made up of both Pseudonymization and Anonymization.

I am involved in much of the standards work in this space, actively working in IHE on a handbook and ISO on updates to the core standard on the subject. In all of these cases we are trying to make the 'art' of de-identification more measurable, repeatable, and approachable. Too often it is seen as too hard, more often it is seen as simple and thus mistakes are made. The goal I have is to make it clear.

Why De-Identify?

First, one must understand that de-identification is just a method of lowering risk. The only way to get risk to zero is to have zero data. Even one data-element that one might consider to be purely clinical data does narrow down the population. Just to indicate that the weight of the subject is 203lbs will tell you much about the subject, if that value is 3lbs and you know the subject is a premature-baby, and if it is 403lbs it is clear you have limited the population. The first point is that all data are potentially identifiable, some data are less so.

Second, one must recognize that some data are outright Direct Identifiers. These data are in no uncertain terms identifiers. Full-Name is the most obvious. A Direct Identifier is something that is publicly known (knowable), therefore full-addresses, phone numbers, credit-card-numbers, and drivers-license-numbers. These items clearly can't be included in the de-identified data set. So they each need to be identified as a risk to be mitigated.

There are also a class of data that can be used in combination with other data in the data-set to identify a subject. Such as postal-codes, sex, date-of-birth, hospital identifier, or date-of-procedure. These are risky to be left in, so they need to be identified as potential to be mitigated.

The task of De-Identification is much like chemistry, bio-chemistry sometimes. One must understand the elements and how they interact. One must use various tools to separate or modify the elements. Each chemical process results in something useful for the purpose it was created. Some combinations of chemicals are very volitile, others benign, but all must be given respect.

De-Identification Procedure

The procedure is simple. Ill include only the high-level, each step is more involved than I indicate here.:
  1. Identify what it is you want to do with data. This is your use-case. What are critical data attributes, and what are acceptable tolerances for each data attribute. You need to justify each element you want. You must also identify the acceptable level of risk, which includes assessment of the authorizations you have.
  2. Identify ALL of the data elements that you have. This is the data set that has not been de-identified. It might be a database, it might be a stream. You must identify all of the data, not just the data you are worried about. You then classify each attribute: Direct, Indirect, or simple data. Note that any unstructured data, otherwise known as free-text, must be considered Direct Identifier. 
  3. Apply Mitigations, in theory. Given the use-case details you created in (1) and the data-element inventory you created in (2); apply the de-identification tools. (a) Redact - delete element, (b) Fuzz - modify within tolerance, (c) generalize - broader terms, or (d) replace - pseudonym. These are clearly not all the tools but the large categories of tools.
  4. Assess risk, in theory. How correlated are the data to a subject? Is this level of risk acceptable to the policy identified in (1)? Don't change your policy, that is the easy way out. Continue to apply mitigations. If further mitigations results in data that are not useful to your use-case, then you might need to change something else. 
  5. Apply Mitigations to data-set and validate the results. As with any design-of-experiments one must be able to prove your theory. Is the resulting data just as de-identified as you expected? Is the resulting data useful for your use-case?
However well you have de-identified, recognize that there is residual risk that needs to be managed. This risk is often significant  thus requiring good security practices. Just because you think your data are de-identified, does not mean you don't need to protect it. Attacks against de-identified data only get better, they never get worse.

De-Identification is Contextual

I have said exactly this (De-Identification is highly contextual) before. the de-identification algorithm you  come up with will not be useful to a different use-case, or a different data-set. It might be, but the assessment needs to be made. The context behind the needs of the use-case are critical. Take only the data, and the fidelity of the data that you need. 

Gross De-identification

There are use-cases for doing a gross de-identification into a large data set, followed by secondary use-cases with their own further de-identification analysis. This is often done in population-health analysis, using gross de-identification to fill the population database. While re-assessing results of any sub-analysis of a specific population health epidemic. Clearly the large database needs to be protected quite strongly, I might say it needs to be protected just as well as a full fidelity database.

Summary

De-Identification is a technical tool. It is not a get-out-of-jail card. The resulting data set likely still requires some protection and safe handling.

Friday, April 26, 2013

Privacy Consent State of Mind

The space of Privacy Consent is full of trepidation. I would like to show that although there are complexity, there is also simplicity. The complexity comes in fine-details. The fundamentals, and the technology, are simple.

Privacy Consent can be viewed as a "State Diagram", that is by showing what the current state of a patients consent, we can show the changes in state. This is the modeling tool I will use here.

I will focus on how Privacy Consent relates to the access to Health Information, that is shared through some form of Health Information Exchange (HIE). The architecture of this HIE doesn't matter, it could be PUSH or PULL or anything else. The concepts I show can apply anywhere,  but for simplicity think only about the broad use of healthcare information sharing across organizations.

There are two primary models for Privacy Consent, referred to as "OPT-IN" and "OPT-OUT".

Privacy Consent of OPT-IN

At the  left is the diagram for an OPT-OUT environment. One where the patient has the choice to OPT-OUT, that  is to stop the use of their data. This means that there is a presumption that when there is no evidence of a choice by the patient, that the data can be used.

This model is also referred to as "Implicit Consent". The USA HIPAA Privacy Regulation is utalizes this model for Privacy Consent within an organization. It is not clear to me that this HIPAA Privacy Regulation 'Implicit Consent' is expected to be used outside the original Covered Entity. It is a model used by many states in the USA.

The advantages typically pointed to with this model is that many individuals don't want to be bothered with the choice, these individuals trust their healthcare providers. Another factor often brought up is that when health treatment is needed, the patient is often not in good health therefore not well capable of making decisions; this however focuses on legitimate uses and ignores improper uses. Privacy worries about both proper and improper access.

Privacy Consent of OPT-IN

At the right is the diagram for an OPT-IN environment. In an OPT-IN environment the patient is is given the opportunity to ALLOW sharing of their information. This means that there is a presumption that the patient does not want their health information shared. I would view it more as a respect for the patient to make the decision.

This model is used in many regions, even within the USA. With an HIE this  model will work for many use-cases quite nicely. Contrasted with the HIPAA Privacy use of Implicit Consent, which is likely a better model for within an organization. The two models are not in conflict, one could use Implicit Consent within an organization, and OPT-IN (Explicit Consent) within the HIE.

Privacy Consent Policy

The above models seem simple with the word "YES" and "NO"; but this is not as clear as it seems. Indeed the meaning of "YES" and the meaning of "NO" are the hardest thing to figure out. It includes questions of "who" has access to "what" data for "which" purposes. It includes questions of break-glass, re-disclosure, and required-government reporting. The "YES" and the "NO" are indicators of which set of rules apply.

The important thing is that there are different rules. The state of "YES" doesn't mean that no rules apply, there are usually very tight restrictions.  The state of "NO" often doesn't truly mean no use at all. There is usually some required government reporting, such as for the purposes of protecting public health.

Privacy Consent: YES vs NO

The reality of privacy consent is that there will be a number of patients that will change their mind. This is just human nature, and there are many really good reasons they might change their mind. A patient that has given OPT-IN authorization might revoke their authorization. A patient that has indicated they don't want their data to be shared might decide that they now do want to share their data. For example as a patient ages they recognize that they can be best treated if all their doctors can see all the other doctors information.

Thus what seems like a very simple state diagram for OPT-IN or OPT-OUT; one must recognize that they need to support transition between "YES" and "NO".

Privacy Consent of Maybe

Lastly, we all recognize that the world is not made up of 'normal' people. There are those that have special circumstances that really require special handling. This I am going to show as another state "MAYBE". This state is an indicator, just like "YES" or "NO", but in this case the indicator indicates that there are patient-specific rules. These patient-specific rules likely start with a "YES" or a "NO" and then apply additional rules. These additional rules might be to block a specific time-period, block a specific report, block a specific person from access, allow a specific person access, etc. These special rules are applied against each access.
Note that the state diagram shows transitions between all three states. It is possible that one goes into the "MAYBE" state forever, or just a while.

Privacy Consent is a Simple State Diagram

I hope that I showed that Privacy Consent is simply state transitions. I really hope that I explained that each state has rules to be applied when a patient is in that state. Implicit (OPT-OUT) and Explicit (OPT-IN) are simply an indicator of which state does one start in, which state is presumed when there is an absence of a patient specific decision. The rules within each state are the hard part. The state diagram is simple.

Other Resources


Patient Privacy controls (aka Consent, Authorization, Data Segmentation)

Access Control (Consent enforcement)



Wednesday, April 24, 2013

mHealth Solution

I have been involved in many efforts targeting the mHealth use-case. I have not been involved in all of them. I am sure no-one has been involved in all of them. Specifically I have been involved in the efforts in IHE, HL7, and DICOM. I have mostly spent my time on Security and Privacy; but also had my hand in some of the interoperability aspects. This means that I have a perspective, and know that it is only my perspective. This means I know that I don't know it all. This is my blog however, so you should already know that.

What is mHealth? 

mHealth is a highly over used term now days. The reason it is over used is because it is a term that is cool. How it gets abused is because the term is not defined. Because it is undefined, it gets to multiply the excitement that anyone has around the term, without focusing any progress. This means that 10 people who have 10 different perceptions of what the term feed off of the excitement of the other 9, while getting none of the benefits of collaborative design. Thus this term is burning lots of excitement without making as much progress as it should. To show just how divergent these perceptions, here are some I have heard:
  • mHealth means that the healthcare-data is highly movable and thus can flow to where ever it is needed
  • mHealth means that the way I access my health-data is through a mobile device
  • mHealth means that I as a patient can pull copies of my data a move it wherever I want 
  • mHealth refers to sensors that I carry on my body all the time, such as fitbit
  • mHealth means that my consent automatically applies to where ever my data is accessed

The mHealth Solution

You can see from these 5 that in some cases the data are 'mobile', others the device used to access the data are 'mobile', others the patient is 'mobile', and others the sensors are 'mobile'. These are just four different view points. YES, they could all be the same. BUT the solution space for these are not working on all of them, or even more than one of them at at time. Just some examples of the solution spaces that are working on these issues, but not necessarily the same.
These are not all the efforts, nor all the perspectives on mHealth. None of these perspectives are wrong, and all of them are proper things to be doing.

Consent portability

I do have to caution that the consent moving to the data is the least mature. Mostly because there are far too many moving parts being worked on. That is that the architectures for how data are moved and accessed are not yet stabilized. Some are moving data in e-mail, others using REST, others using SOAP, others using USB/CD-ROM,  and others using proprietary means. Trying to come up with a single way to control access is hard to, and trying to control those is futile at this point. This doesn't mean there is nothing going on, there is much going on

mHealth is anything you want

This is not a fundamental problem. This is not a problem that will cause failure to meet mHealth expectations. I want to urge understanding that the term is not well defined, and thus the one you are talking to might be thinking something totally different. What they are thinking is not wrong. It is just important to be sure that you understand their perspective. Thus the mHealth solution is many, not one.

See also

Friday, April 19, 2013

Direct incompatibility with off-the-shelf e-mail

Why choose a popular underlying standard if you are not going to leverage it? Surely you should not make explicit changes that break it.

The Direct Project choose to use e-mail, and the security layer S/MIME. This choice was due to the wide spread use of e-mail. Wide spread use in the case of e-mail can be proven by the very fact that today e-mail is still the most used protocol on the internet. This in the face of those that would like to consider "the Web" as pseudonymous with "the Internet". The statistics say that it is closer to "e-mail" is pseudonymous with "the Internet". Actually they both combined make up most of the internet.

The Direct Project expectation was that healthcare should only need to specify the trust framework -- see DirectTrust.org for one organization trying really hard to make this factor a reality. This trust framework would allow a sender to be sure that what they are sending can only be seen by the one they are sending it to, and no-one in between This trust framework would allow a receiver to know that the content absolutely came from the one indicated as the sender, and no-one in-between This trust framework is critical to success. But this trust framework is 99% policy. The technology portion of this trust framework is all standards based and embodied in the common use of S/MIME and the PKI that supports it.

Direct Specification is NOT leveraging commonly implementations of S/MIME e-mail!

I have written on this topic before. At that time it was about the specific rules on how one must DISCOVER the certificate of the recipient you are sending e-mail to: MU2 - Why must healthcare use custom software when Thunderbird and Outlook would do? In that article I explain that this requirement was overly restrictive. It forces specific certificate distribution model that that is unique to healthcare. It doesn't support the Trust Framework. It just gets in the way of using off-the-shelf software. Thus forcing healthcare to use custom software.

Direct Specification forces case-sensitivity when none is necessary!

Now there is an effort to force case-sensitivity to Direct Project address. This technically is specified in the underlying standard, but it is not always implemented this way. Let me explain. The underlying e-mail specifications do indicate that the first part of the destination address shall be case-sensitive. This was because some destination systems are indeed case-sensitive. However not all destination systems want to be case-sensitive.

It is true that case-insensitivity is ambiguous once you leave the classical ASCII character set.  Therefore case-sensitivity is indeed more easily proven, and thus interoperable. However 'allowing' case-insensitivity when the destination system wants to allow it, should be allowed. 

What is happening is that there are test-tools being developed to test implementations of Direct. These test tools are being written strictly. This strict interpretation of the standards is a good thing for test tools to do. But in some cases systems need to be allowed to be more liberal in what they accept. Destination systems should not be forced to be so restrictive. This is an application of the Robustness Principle, also known as Postel's law, after Internet pioneer Jon Postel - "be conservative in what you do, be liberal in what you accept from others" .

We MUST be reasonable. The case requirement is more focused on being case-preserving, so that an endpoint ‘can’ be case-sensitive. That is to say that senders and intermediaries must preserve the case. To require that the endpoint MUST be case-sensitive is overly restrictive. This would cause many common email infrastructures to be declared non-compliant. Most off-the-shelf e-mail treats the whole address as case insensitive. This declaration of non-compliance would come at no benefit, and would limit the market space available for healthcare use.

Direct Continues to require custom software for healthcare.

This is absolutely against the values that the Direct Project included during the development. The reason to choose common e-mail transport was to leverage the large body of infrastructure and software already available. Using custom software increases costs, and makes healthcare re-develop tools that have been developed over decades of advancements in e-mail, and at no added value.

Thursday, April 18, 2013

Safety vs Privacy


What do you conclude when looking at this picture?

The solution is:
a) Make the wall shorter
b) Make the wall taller

Those with a strong privacy background recognize this as a Privacy violation. Very clearly the wall is not tall enough. Clearly the female is to be protected against the male actor. Clearly the wall is defective and needs to be taller.

Those with a strong safety background recognize this as a safety concern. Very clearly the wall is not short enough to enable safe conversation between these two. Indeed the safety assessment doesn't apply ethical characteristics to the female or male image.

My viewpoint is to understand the use-case. What are these two trying to do? Is this a case of (a) or (b). Just because the image is made up of the images used for bathrooms does not mean that the image is of a bathroom use-case. Knowing the use-case is the only way to understand if this is a privacy violation or if this is a legitimate discussion over a wall that is too high.

Indeed the solution might be BOTH. The wall is indeed there for privacy purposes, and it is failing. There is also a safety concern as the wall is not tall enough to prevent someone from putting themselves and others at risk of harm. This shows that not always are privacy and security risks at odds. Sometimes they can be solved harmoniously.

Thursday, April 11, 2013

Google creepy is not the same as Facebook creepy

Google NOW has brought a totally new form of data analytics to my fingertips, and I like it. I am actually handing more information to Google than I would have, just to get this capability. I like that driving routes are suggested, with start times that match my appointments. I like that it knows just the right sports to inform me about, even when that sport is Hockey which the news media seems to know nothing about. This level of leveraging all the information that Google can find on me to bring me value is fantastic. This is what differentiates Google creepy from Facebook creepy.

When Facebook indicates that they are going to start to pull some new form of information, I don’t get the feeling that they are going to do this for my benefit. It is very clear that Facebook is gathering more information for their own benefit. Even their insistent plea to harvest my address-book. I am not going to expose my friends to more advertising through Facebook.

The first evidence that I as a user get from the gathering of data is valuable in the case of Google, and punitive in the case of Facebook. I know that Google is using my data to make money, I am a strong believer that if you are not paying for something, then you are the product and not the customer. The fact that Google makes money with my data is less creepy because Google gives me value. In fact Google gives me so much value that I go out of my way to give it more data. Whereas Facebook creeps me out so much that I avoid telling it many things.

Perception is more powerful than reality. The perception of value, even if it is not truly valuable, is what is important. The fact that Google Now gives me driving directions automatically rather than me doing a Google Search is a small step, so the actual value is small. The actual spend by Google is small. The perception of value is big.

Goldilocks Governance

Healthcare can learn from this. The value of a Health Information Exchange is great (What is the benefit of an HIE), the perception of creepy can also be great. Trust and doing what the consumer ‘expects’ is the bridging factors. The patient wants their data to be available to those that can provide the patient value. The patient wants their data to be protected against those that provide the patient no value. I coined the term “Goldilocks Governance” for this. Not too tight, not too loose, but just right.

This is also a consistent Privacy approach as outlined by the USA Whitehouse consumer privacy principles that was published just last year. This privacy philosophy recognizes that the consumer understands the context of their interaction as defined in "Privacy As Contextual Integrity" by Helen Nissenbaum. Which indicates that consumers do understand that their data will be used in specific ways, clearly in healthcare for treatment and billing, but also in healthcare they understand their data is used for public health benefits and other normal operational ways. This is otherwise described as the "Consumer Should Not Be Surprised." Meaning they should not be surprised that their data is used in some specific ways; yet also that it is right for them to be outraged at inappropriate uses of their data.