A fascinating thread is being woven over on the ISO27k Forum, concerning information security risk analysis (RA) methods. Bob Ralph made a good point that set me thinking this morning:
"... 'unknown knowns' should be scored the very highest until you know otherwise ... Of course the 'unknown unknowns' will not even be on the RA. But they are still about somewhere."
Biologists have developed techniques for estimating the
unknown and answering awkward questions such as "How many Giant Green Land
Snails are there on this island?" The obvious technique is to try to catch and count them all, but that's
(a) costly, (b) disruptive for the snails and their ecosystem, and (c) not as
accurate as you might think (snails are well camouflaged and duck for cover
when biologists approach!). Capture-mark-recapture, also known as tag-and-release, is a more useful
technique: catch all the snails you can find in a given area, mark or uniquely identify
them in some way (preferably not with bright yellow blobs of paint that make
them even easier targets for the Little Blue Snailcatcher bird that preys on Giant
Green Land Snails!), then some time later repeat the exercise, this time counting
the number of snails that are marked/identified or new. From that proportion, estimate the total
population of Giant Green Land Snails in the area, and extrapolate across the
entire island, taking various other factors into account (e.g. nesting areas
for the Little Blue Snailcatchers, quantity and quality of habitats for the snails, snail lifetime, foraging range etc.). There are statistical techniques supporting this
kind of method, and various other methods that give reasonable estimates,
sufficient to answer the original and related questions, such as "Is the population shrinking or expanding?". I'm sure there are similar approaches in other fields of science -
estimating the age/size of the universe for instance.
Go back to the last paragraph to swap "hackers"
for Giant Green Land Snails, and "law enforcement" for biologists,
and you have a technique for estimating the size (or, in fact, other characteristics)
of the hacker population. It's
"only" an estimate, but that is better than a pure guess since it is
based on measuring/counting and statistics, i.e. it has a scientific, factual, reasonably
repeatable and accurate basis. Douglas
Hubbard's excellent book "How to measure anything" talks at length
about the value of estimation, and (in some circumstances) even what one might call WAGs
(wild-arse-guesses) - it's a very stimulating read.
So, I'm thinking about how to apply this to measuring information
security risks, threats in particular. We have partial knowledge of the threats Out There (and, to be accurate, In Here too) gleaned from identified
incidents that have been investigated back to the corresponding threats. There are other threats that are dormant or emerging, or
that are so clever/lucky as to have escaped detection so far (Advanced Persistent Threats and others). There are errors in our processes for
identifying and investigating incidents (meaning there are measurement risks - a
chance that we will materially miscalculate things and over- or under-estimate),
and a generalized secrecy in this field that makes it tricky to gather and share reliable
statistics although some information is public knowledge, or is shared within trusted
groups. But the overall lesson is that
the problem of the "known and unknown unknowns" is not intractable: there are data,
there are methods, there is a need to estimate threats, and it can be done.
One thing the scientists do but we information security bods
don't (usually) is to calculate the likely errors associated with their
numbers. So, in the snail example, the
study might estimate that "There are 2,500 Giant Green Land Snails on the
island, with a standard deviation of 850" or "We are 95% certain that the total
population of Giant Green Land Snails on the island is between 420 and 735". There
are numerous situations in information security in which errors or confidence
limits could be calculated statistically from our data, but we very rarely (if ever!) see
them in print - for instance, in survey-type studies where there are sufficient people
or organizations in the pool for the statistics to work out (and with the right
statistics, a surprisingly low minimum sample size may be sufficient, less than
the 30 that used to be our rule of thumb).
Speaking as a reformed (resting, latent, recuperating, ex-) scientist, current real-world
information security practice is largely unscientific, outside of academic
studies and journals anyway. Aside from the issue just mentioned, surveys and other data sources rarely explain their methods properly - for instance they may (if we're lucky) mention the sample size, or more often the number of respondents (a different parameter), but seldom are we told exactly how the sample was selected. With vendor-sponsored surveys, there is a distinct possibility that the sampling was far from random (e.g. they surveyed their own customers, who have patently expressed a preference for the vendor's products). Small, stratified and often self-selected samples are the norm, as are implicit or explicit extrapolations to the entire world.
Consequently, for risk analysis purposes, we are often faced with using a bunch of numbers of uncertain vintage and dubious origins, with all manner of biases and constraints. And surprise surprise we are often caught out by invalid assumptions. Ho hum.
Regards,
Gary
PS The snails and birds are pigments of my imagination.
No comments:
Post a Comment
Have your say!