25 September 2013

SMotW #74: security governance maturity

Security Metric of the Week #74: information security governance maturity


We've covered a number of so-called maturity metrics previously on this blog. They usually score highly on the PRAGMATIC scale, meaning that (according to the fictional managers of ACME Enterprises Inc., at least) they are valuable metrics. 

This week's example security metric specifically concerns the governance of information security. Imagine that, for some business reason, your management was interested in/concerned about the way the organization governs information security, whether for their own purposes or perhaps at the behest of the regulators or auditors. How would you go about addressing that issue? Think about what you would do.

A sensible first step would be to clarify the requirement: what does management need to know, when, and why? Exploring what they actually mean by "governance" is a good way to tease out what they are on about. Don't be surprised if individual managers have somewhat different needs and priorities: that's all part of the metrics fun.

ACME's managers considered the maturity measurement approach laid out in an appendix to PRAGMATIC Security Metrics. They scored this metric at 87%:

P
R
A
G
M
A
T
I
C
Score
95
97
70
78
91
89
90
85
90
87%

95% for Predictiveness and 97% for Relevance to information security are both outstanding ratings. We are often asked about predictive security metrics, so this is a good'un.  


18 September 2013

SMotW #73: Psychometrics

Security Metric of the Week #73: Psychometrics


We've cheated a bit with this week's example metric: 'psychometrics' such as OCAI, MBTI and MSCEIT are actually an entire class of metrics rather than one in particular. The discussion that follows concerns psychometrics in a general sense.

Many of us will have been invited to take psychometric tests during the job application and interview process. Psychometric testing is based on the science of psychology. The tests provide an additional source of information about candidates' psychological makeup - personalities, attitudes, aptitudes and so on - the 'soft stuff' that is important for almost all positions but which is hard to gauge from résumés or (unskilled/untrained) interviewers. 

Given that a substantial part of information security revolves around human behaviors and attitudes (such as ethics or compliance with policies), ACME's CISO wondered if psychometrics might have potential as security metrics. Using the PRAGMATIC method, ACME's managers gave a straight answer:

P
R
A
G
M
A
T
I
C
Score
40
24
0
79
15
55
10
42
5
30%

Maybe the managers were feeling distinctly cynical and jaundiced when they went through the process: it's unusual for them to assign a zero rating, in this case for Actionability. Their stated rationale was that a person's psychological profile is an inherent and immutable part of their personality, not amenable to being adjusted or managed in an active sense.  They were also concerned at escalating Costs if psychometric testing was extended beyond the recruitment process for information security purposes.

The CISO was tempted to counter that technically it is possible to influence someone's personality to some extent, or at least to influence our natural/preferred behavior patterns for example through training, supervision and guidance. Furthermore, that is not the only way in which psychometrics might be acted upon. They are already commonly used to support hiring decisions, but there may be other opportunities to use psychometrics, for instance for annual appraisals and/or when considering whether to promote or transfer employees. Conceivably, some personality types or characteristics might be ideally suited to high-trust security-relevant roles, while others might be indicators of trouble ahead, but without further research, the CISO would definitely be stepping out on a limb if anyone asked for specifics.

However, rather than thrash out that point and various other issues with what appeared to be a tired and grumpy ACME management, the CISO decided discretion was the better part of valor. The low ratings and pathetic overall score meant this metric was very unlikely to fly, especially given that there were many other higher-scoring metrics already on the table. That's not to say psychometrics would never be a worthwhile security metric, nor that they are necessarily a poor choice for your organization, rather that ACME was simply not ready for them.  Yet.

[Whereas mostly we talk about the PRAGMATIC method being used to identify valuable metrics, weeding-out weak or problematic metrics is an equally worthwhile objective, especially if you subscribe to the view that the organization should stick with 'a few good metrics'. The method provides a rational, even-handed basis on which to analyze, compare and contrast metrics, then select or discard them. You might think of it as a quality filter for security metrics.]

14 September 2013

Draw your own conclusions

There's as much an art to interpreting metrics and statistics as there is to designing and presenting them. Take this exploded pie for instance:


I plucked the pie chart image from a survey by Forrester on behalf of Blue Coat - in other words, Blue Coat paid them for the survey (we have discussed vendor-sponsored surveys before on this blog). The survey Key Drivers, Why CIOs Believe Empowered Users Set The Agenda for Enterprise Security was promoted on email via IDG Connect

Before we continue, what conclusions do you draw from the figure above? I appreciate I have taken it out of the context of the report but take another look at the graphic. Imagine you are a busy business manager briefly pondering a graphic similar to this, whether in a commercial survey, an in-flight magazine, or an internal corporate report from Information Security. What does it say to you?  What's your impression?

I spy with my beady eye that the largest slice was for the response 'some of the time', accounting for more than half of the responses. If I mentally add that 54% proportion to the 10% 'rarely' slice, those two responses together account for a little under two thirds of the responses - a clear majority as far as I'm concerned. Consequently, I would conclude that most respondents chose 'rarely' or 'some of the time', in other words most were of the opinion that information security did not inhibit or slow down important business initiatives.

However, the reason I plucked this particular figure from the report is that the legend to the pie, presumably written by a Forrester analyst, implies a markedly different conclusion:


According to the analyst, the key message is that security is an inhibitor, and frequently at that. The headline message is diametrically opposed to my understanding of the data.  Lucky I bothered to check the data!

Perhaps the analyst arrived at that curious conclusion because the 'all', 'most' and 'some' categories together account for 90% of the responses? Perhaps the conclusion just happened to fit the brief from Blue Coat when they commissioned the survey for their marketing? Hmmm. 

I notice also that the legend omits the words "slow down" and "important" from the question posed, on the not unreasonable assumption that the question in quotation marks was exactly as it was stated in the survey. I'll say no more on that point.

Anyway, digging a little deeper, there's still more insight to glean from figure 2 in the report.

Why do you think the pie has been exploded? The technique is often used to emphasize particular slices. In this case, my eye was drawn to the apparent balance between the (sum of the) three smaller slices and the main slice. Given that the main slice is labeled "Some of the time", it would be easy to infer that the other three therefore represent "Not some of the time", whereas they are not in fact a single category but opposite ends of the scale (one of the inherent drawbacks of any pie graph). In contrast, re-drawing the same data as a bar chart emphasizes the separation between the ends:


And what about the colors? Color can have a surprisingly important influence on the way we perceive metrics. We often use it to our advantage with RAG (red-amber-green) color coding. Just look at the visual impact if the pie was recolored thus:


In the same way that exploding the pie emphasized the 'some of the time' slice, I have deliberately called it out with an extreme red. If I could have figured out the HTML to make it flash, I might have done that too! It is patently biased. The original pie coloring was far more even-handed, but it's another potential issue to bear in mind.

Aside from the presentation style/format, we can glean yet more information from the original graphic. Two particular aspects caught my eye.

Firstly, the text below the pie appears to indicate that the sample size was 50 people, not just 50 randomly selected people but 50 "C-level and VP IT budget decision-makers at North American enterprises", a fairly specific demographic.

Presumably the 50 were already on Forrester's database due to some previous contact, but perhaps all or at least some of them were identified and contacted specifically for this study. Who knows, perhaps some of them were suggested by Blue Coat given that he who pays the piper calls the tune?

Although the caption mentions 50 people, we are not told how many were actually surveyed or responded. Maybe they asked 50 and only a dozen responded? Maybe they asked 1,000 but picked out the 50 for some unstated reason? I very much doubt that Forrester would pull stunts of that nature but the point remains that this is primarily a piece of marketing, not a scientific research paper. To give them their due, Forrester did incorporate a "Methods" section at the end of the report, stating:
"This Technology Adoption Profile was commissioned by Blue Coat Systems. To create this profile, Forrester leveraged its Forrsights Workforce Employee Survey, Q4 2012, and Forrsights Budgets And Priorities Tracker Survey, Q4 2012. Forrester Consulting supplemented this data with custom survey questions asked of 50 C-level and VP IT decision makers North American enterprises with 1000 employees or more. The auxiliary custom survey was conducted in March 2013."
That's nice to know, but not quite up to the standard of the materials and methods section in a typical paper in any mainstream scientific journal. A survey of just 50 people could be of questionable statistical value (depending on the assurance level required), and we're not told whether the survey was conducted online, through an automated survey tool, by telephone interview, face-to-face interview, or by some other means. Reproducing the actual survey form, with the actual questions posed, in the precise wording and sequence used, complete with any preamble, context or incentives, would have given me a lot more confidence ... but maybe that's just my scientific training showing through, my own bias. 

Secondly, I noticed that there are four categories in the pie chart, corresponding (presumably) to the four possible responses to the survey question. This implies the use of a Likert-like scale where there is no middle option, forcing respondents to choose options either above or below the notional center point of the scale. This was probably a deliberate choice on the part of the survey designer: it is commonly used to discourage people going for 'the easy option', the middle choice. I wonder what the results might have been if the survey had included a mid-point response, for instance "Very rarely", "Occasionally". "Some of the time", "A lot of the time" and "Almost all the time" ... which reminds me of the tricky issue of phrasing both the question stem and the answers. It would be easy to bias the responses, for example using "Never" as the lowest response, or indeed "All of the time" as the upper response - which, I note, was evidently one of the choices in Forrester's survey. "All of the time" leaves respondents almost no wiggle-room. It's similar to "Always".  It could be argued that it is not even a category but an end point to the notional scale.

Personally, I hate being shoe-horned into boxes. Sometimes I want to indicate a response that is at the upper or lower limit of a category, occasionally right on the boundary between categories, which isn't strictly possible with Likert-like scales. That's why I personally prefer continuous scales, percentage scales in particular. Measuring responses against a percentage scale generates more precise data, in my opinion, with hardly any extra effort on the part of subject or surveyor. The better automated survey tools allow the use of continuous scales, calculating percentage values from responses with no human effort at all (although I have yet to find a tool that allows responses that are below 0% or above 100% for those rare occasions when the respondent deems the scale range too limited!).

OK, enough already.  The take-home message from this rambling blog piece is to be aware of subtle and not-so-subtle biases in the way metrics are sampled, gathered, analyzed and presented. Bear this post in mind whenever you are giving or receiving statistical information. Better still, consult a trained statistician or survey engineer if the information is important, which it often is. My ramble has barely scratched the surface of an enormous topic.

Kind regards,
Gary Hinson  Gary@isect.com

PS  I have no ax to grind with Forrester or Blue Coat. The survey is worth reading, albeit with a hint of cynicism. I chose it simply as an example, typical of its kind, not a special case called out to embarrass anyone. Feel free to register for your own copy provided you don't mind disclosing your personal information ...

10 September 2013

SMotW #72: % of privileged/trusted users

Security Metric of the Week #72: Proportion of highly privileged/trusted users or functions

This metric is indicative of the organization's control over access to IT systems, networks and perhaps other information assets.  

The metric is measured by someone suitable (such as an IT auditor) systematically reviewing access permissions assigned to user IDs on (a suitable sample of) the organization's IT systems in order to determine the proportion that are privileged or have enhanced access rights.

The metric's PRAGMATIC ratings are not bad:

P
R
A
G
M
A
T
I
C
Score
86
80
51
40
65
39
55
95
60
63%

"Not bad" however needs to be taken in context, since there is a wide choice of metrics relating to access rights/permissions. Of the 17 examples classified as "IT security metrics" in the book, ACME managers scored this one in seventh place.  It has merit but is not necessarily going to be chosen for ACME's information security dashboard.

The metric could be a very granular and rich source of information provided someone has the time, energy and integrity to analyze and report the numbers. A simple illustration would be to compare the metric between business-critical and non-critical servers, or different classifications: it is not unreasonable to assume that access to highly-classified systems should be more carefully controlled than to the remainder. This is just one of many hypotheses that could be tested using metrics, an approach that auditors often use.

Putting in more effort would increase both the costs and the benefits of the metric, but perhaps to differing extents.  With many metrics, there is (at least in theory) a sweet spot where the net value of the metric peaks.  It's not so easy to determine that in practice, especially as metrics audiences vary in their perceptions of value (PRAGMATIC notwithstanding).

Automating the data collection, analysis and even reporting/presentation for this metric requires some investment up-front with a payoff over the medium to long term.  It can be automated through remote security management software that interrogates IT systems on the network, and it's even easier if user IDs are centrally administered, provided the tools don't lie, and provided there are agents running on all applicable systems.  If this metric is important to the organization, someone ought to check both provisos manually, both initially when implementing the metric and periodically thereafter in case things change.  

Conversely, if the metric is not important enough to warrant those integrity checks, is it even worth reporting? This point applies to other security metrics too: since they are intended to support various strategic, management and operational decisions, missing, incomplete, out-of-date, erroneous or misleading data could have serious implications.  Metrics that cannot be relied upon could literally be worse than useless, giving the impression of sound management whereas in fact the information may be invalid.  

09 September 2013

Mexican book review


Thanks to Aztec-History.com for the Mexican flag!
Elia Fernandez has enthused about PRAGMATIC Security Metrics in Spanish on her blog part 1 and part 2 concluding “Importante es determinar cómo la organización puede identificar las métricas de seguridad que vale la pena utilizar, y cómo se pueden evaluar los méritos de una métrica. A la fecha, el enfoque común ha sido informal y subjetivo. Por el contrario, el método pragmático permite medir y evaluar una métrica en forma estructurada; obliga a analizar la métrica en detalle.”  Thanks Elia - and sorry for using the wrong flag!




05 September 2013

Biological metrics

Biological metrics - commonly shortened to "biometrics" - comprise an interesting class of information security metrics that, unfortunately, we didn't have space to explore in our book. Biometrics are commonly used for strong authentication in situations where there is a genuine need to authenticate and distinguish legitimate people from impostors.

Take for example your heart rhythm. According to the blurb on the Bionym website "Bionym has developed the first wearable authentication device that utilizes a user's Electrocardiogram (ECG) to validate a person’s identity." Before reading that, I didn't even appreciate that heart rhythm was a reliable biometric. I presume the Nymi can cope with heart rate changes caused by stress, exercise, rest, drugs such as caffeine, and some medical conditions - it does at least have the advantage of collecting biometric data over a sustained period, but as with any biometric, there must surely be some important tolerance parameters in there due to natural variations and the accuracy constraints of measurement. I wonder if patients with pacemakers have less-unique ECGs? I wonder if Nymi sounds the alarm if the wearer suffers an obvious cardiac incident? The site mentions that it guards against electronic spoofing, but I wonder how the Nymi prevents replay of captured ECGs? As always with cryptography, many security concerns may arise from the implementation details, particularly the concessions made for practical reasons of cost and utility. I have no reason to doubt that Bionym have covered all the bases, but thanks to the 'security mindset' I'm naturally curious, dubious and perhaps even a touch cynical about their marketing claims.

Talking of being security-minded, Bionym's comparison to fingerprints ("Your cardiac rhythm is protected inside your body, making it almost impossible to steal, mimic or circumvent. In comparison, a fingerprint is left on every surface a user touches") reminds me that biometric data are sensitive by nature (terrible pun intended!). Should we now be concerned about protecting our ECG records at the surgery, in the same way that we perhaps ought to worry about our iris and retinal patterns at the optometrist, and our dental records at the dentist? Do powerful heads of state have teams of DNA flunkies following them wherever they go to secure all the cellular detritus they inevitably shed and leave behind?

In the same vein (yes, an even worse pun), many other information security metrics are themselves sensitive, valuable information that almost certainly deserves to be secured. Imagine the mischief that someone could cause if they ascertained your organization's risk catalog, its IT audit reports, or the results from vulnerability scans and penetration tests. Therefore, when specifying and designing an information security measurement system, don't forget to consider the associated information security risks and controls.

Cheers all,
Gary Hinson

PS I have no affiliation with or commercial interest in Bionym or Nymi. I hadn't even heard of them before reading Brian Honan’s Security Watch blog this morning. Thanks Brian!

04 September 2013

SMotW #71: QA in infosec processes

Security Metric of the Week #71: Extent to which quality assurance (QA) is incorporated in information security processes


This week's metric, randomly selected from the 150-odd examples discussed in chapter 7 of PRAGMATIC Security Metrics, doesn't appear very promising, with mediocre PRAGMATIC ratings as far as ACME management is concerned and an overall score of 58%:

P
R
A
G
M
A
T
I
C
Score
75
70
66
61
80
50
35
36
50
58%

The premise - the rationale behind the metric - is that the quality of various information security products (such as risk assessments, functional and technical specifications for security controls and security functions, architectures/designs, test plans, test scenarios, test results etc. in relation to application development projects, plus many other products in relation to other security activities) significantly influences (but does not entirely determine) the security achieved by the corresponding information systems, business applications and associated business processes. Therefore QA efforts to improve the quality of the security processes and products should have a positive effect on the security of the organization. Therefore measuring the QA has some relevance to information security.

The metric's PRAGMATIC score is held back by low ratings for Timeliness and Independence. Could anything be done to address management's obvious concerns on those two parameters?

The low rating for Timeliness was justified because it was originally proposed that the metric would be analyzed, reported and hence acted-upon only once or twice a year. It would involve someone retrospectively examining the records relating to various information security processes, looking for evidence of QA activities, checking compliance with the procedures, and somehow coming up with a value for the metric.  

Many of those QA activities could also generate metrics "live", feeding process information directly to management while the processes were running rather than months later. That way, the measured processes could be tweaked and quality-improved when doing so would have the most impact. With that approach in place, however, the proposed metric would be more or less redundant unless, for some reason, management really needed to double-check that QA was happening.

ACME managers were of the opinion that the proposed metric would be measured and reported by the people who had a vested interest in the outcome, hence its Independence (or Integrity) was in question. Could they be trusted to report the metric honestly if the expected QA activities were not, in fact, being performed routinely? Well most QA activities generate evidence in the form of checklists, sign-offs/approvals and so on, so the metric's base data could be reviewed and verified independently (e.g. by Internal Audit or QA people) if there was a suspicion that management were being painted an unrealistically rosy picture. With hindsight, perhaps ACME's the Independence rating was too low and should have been challenged, although on the other hand those audit and QA people's time is not free, so the additional checks would further depress the metric's Cost-effectiveness rating.

The upshot of this analysis is uncertain.  It would be possible to address at least some of the identified shortcomings of this metric, changing its definition, data collection, analysis and/or reporting, but it may not be worth the effort, especially if there are other higher-scoring metrics on the table covering similar aspects. Management's appreciation that perhaps they ought to promote/support the QA activities directly rather than periodically measuring and reporting on them effectively sealed the fate of this metric for ACME, although YMMV (Your Metrics May Vary).