Security Metametrics: SMotW #39: access alert message rate

Security Metric of the Week #39: rate of messages received at a centralized access logging/alarming/alerting system

This week's example metric scores surprisingly well using the PRAGMATIC criteria despite appearing, at first glance, to be a rather narrow technical metric.

The premise for this metric is that ACME has already established a centralized system to collect and analyze access control messages from around the organization. Such systems are not uncommon in organizations that have learnt that tracking and responding to IT security events without coordination is hopelessly inefficient and ineffective.

Imagine that, on an ordinary day, access control events tick along at a steady, fairly even rate on all the monitored systems but suddenly there’s a rash of events on one system. What could it be? Straight away, we have a possible incident in progress. We know which system has the issue, when it happened, and we know from the nature of the messages broadly what kind of incident it is. At this stage, if we are quick, there is a reasonable chance we can nip it in the bud, dealing with the situation before it causes any serious impact.

If it turns out to be a simple explanation (maybe an automated system that is failing to connect), we can probably resolve it in minutes, and in the process it will become clear to those involved that we are actively watching the security logs, broadcasting a powerful security awareness message in its own right.

Now imagine the situation where, instead of seeing events tick along as normal, we notice an unusual dip in the rate. Looking at the data, we see that a system, or a bunch of related systems (perhaps all at one remote site) appear to have reported no events for some while. Again we know which systems to investigate further, we probably know when the anomaly started (just after the last message received), we have a clue about what might be going on (looks like a site comms issue, may be a deliberate attempt to conceal something untoward going on) and we are on top of it promptly.

According to ACME Enterprises' CISO, the metric's PRAGMATIC score hits 90%:

P	R	A	G	M	A	T	I	C	Score
87	88	94	93	93	94	97	89	79	90%

The rating is slightly reduced for Predictability because the events that we see, or don’t see, have already passed. On the other hand, we may be watching a hack unfold before us, or shortly about to happen e.g. when an automated brute-force password guesser finally hits pay-dirt. Compared to wading through the logs on a daily or irregular basis, near-real-time monitoring and response takes things to a whole new level.

The Relevance rating takes a slight hit because what appear to be security incidents may in fact be the result of benign network, system or user issues. A large part of the art of log monitoring is the skilled analysis based on experience that can identify and prioritize anomalies relating to probable serious incidents over those that appear quite innocuous. Real-time log analysis is one area of information security that can definitely benefit from the application of expert or knowledge-based systems, leveraging the finite capabilities (and attention spans!) of security analysts.

Independence is of some concern in the sense that we envisage the metric being measured and reported by the log analysts themselves. The metric is probably not an ideal way of tracking or managing the analysts' performance, unless ACME management is willing to trust that they will always report accurately and honestly. On the other hand, the metric's Independence is far less of a concern if it is being used as an operational tool by the analysts themselves: they have nothing much to gain by meddling with the numbers, apart from fine-tuning the metric to suit their own purposes*.

The lowest rating for Cost is because of the effort required to set up not just centralized reporting/access control (which has many other benefits) but the metric reporting and alerting. Clearly it will take some effort to monitor message rates and set up the alerting and response processes (especially expert systems) to identify and deal with possible issues, but there are substantial benefits to this metric that largely outweigh the costs.

In conclusion, this candidate metric shows a lot of potential, it's high score putting it high up the wish list for ACME's security management. BUT before you rush away to implement it yourself, remember that YMMV (Your Metrics May Vary). It scores well within the imaginary context of a fictitious organization facing a specific set of security challenges and information needs. Furthermore, it is not too hard to think of variants of this metric that may score even better, for example why not open up the metric to all types of security log messages, not just those concerning access? Aside from the rate of arrival, would the nature or classification of the messages be useful additional information? And how about measuring the results of the analyses and subsequent incident investigations as a way of encouraging the analysts to focus more intently on significant security issues rather than being diverted by trivia? In practice, ACME would be well advised to determine and compare the PRAGMATIC scores for such variants - something, by the way, that used to be a recipe for disaster in the dark pre-PRAGMATIC days. Many a metrics selection activity has grown like Topsy into a metrics catalog project and hence totally lost its way. With no rational basis for assessing and comparing metrics, management is faced with a confusing mess of conflicting demands, often resorting to choosing metrics that are deemed "popular" (remember, the plague was "popular" in the Middle Ages!) or that are someone's pet metric (often the person who 'invented' a given metric becomes a passionate advocate for it, regardless of its true merits).

OK, that's it for this blog piece but we will be announcing our third Security Metric of the Quarter shortly. Feel free to hunt back through this blog to figure out the winner for yourself, or simply hold on for a day or three, if you can bear the excitement.

Gary & Krag.

* By the way, all 150+ example metrics in the book are categorized as strategic, management and/or operational metrics, depending on which levels of the organization are most likely to be using them. We discuss not only these particular categories and how they might be used, but the value of categorizing metrics in various other ways for various purposes.

07 January 2013

SMotW #39: access alert message rate

Security Metric of the Week #39: rate of messages received at a centralized access logging/alarming/alerting system

No comments:

Post a Comment