Skip to content

DUMB performance measures are better than SMART ones

The SMART acronym is one of the most popular in the field of organisational objective setting and performance measures. My preferred list of the SMART criteria is:
• Specific: measures one thing at a time – focused
• Measurable: can interpret as either a specific number that can be measured, or at least a verbal description that is testable.
• Achievable/ attainable: they should be achievable, but with the inference that they are not too easily achieved. Setting the level of indicator is a difficult issue. For example, just because an indicator target has been achieved easily in the past, should the target be raised? If this is done, can it act as a disincentive to exceeding targets?
• Relevant: Is it significant, does it contribute materially to the overall objectives of the organisation.
• Time-bound: Need to specify over what period the performance will be achieved.

There are, however, a number of problems with SMART. First is the ambiguity. The Wikipedia entry lists between three and thirteen meanings for each of the letters in the SMART acronym. For example, there could be significant, stretching, simple or sustainable instead of specific.

I also think there are more problems and limitations, as per the list below:

  • Specific: Specific to what? If it means let’s only measure one thing at a time, that is not very helpful. If it means that the goal should be specified, that is a sensible requirement, but it is not quite the same as defining a performance measure
  • Measurable: A good criterion
  • Achievable: This only refers to targets, not to the measures themselves. It is also a highly judgemental characteristic.
  • Relevant: Yes, but there is a potential overlap with Specific.
  • Time-based: Yes, but specifying the period over which something should be done is only one element of being well- defined

One reason for the problems in using SMART criteria for performance measures is that they were not designed for that purpose. They make more sense for personnel evaluation, where, for example, targets for individual performance do need to be specific to that person and to be achieved in a set time frame. They might also be useful for setting objectives for organisations or organisational groups. But organisational performance measures are better if they are DUMB:

:smart dumb

  • Defined: The measures should be unambiguously defined so that it is apparent what data is required, and how that data will be used in calculating the measure.
  • Useful and unbiased Measures are more useful if they are closely correlated with the organisational objective being measured. Being so correlated helps to aid against bias, distortion and gaming of performance measures.
    There is also a time and cost aspect to usefulness. It should be practicable to produce the measure frequently enough to track progress, and quickly enough for it still be useful. And ideally, there should be little additional effort required to collect and report the measure that would not have been required to be done anyway by management, meaning that the performance measure is economical to collect.
  • Measurable: The measure should be independently verifiable and able to be reproduced accurately by a second assessor.
    Normally this means a numerical measure, and one that is based on a solid chain of evidence. However, establishing a sound rubric for non-numerical assessment is often sufficient to establish measurability.
  • Balanced and benchmarked: Taken as a whole, the set of measures should cover all key elements of performance over the organisation or activity being assessed. One way of assessing completeness is to consider whether the set of performance measures can reasonably assess how the program logic – the connection from resources to activities to outputs to outcomes – is working. A complete or balanced set of performance measures also helps to prevent gaming of individual measures.
    The ability to benchmark performance using performance measures is also desirable. For this reason, if there are industry standard measures, it makes sense to use them. When a measure is revised, it becomes harder to compare it with previous periods.

Can DUMB be improved on? The acronym needs a bit of work – I have doubled up the load on ‘U’ and ‘B’ – and any suggestions for improvement are welcome. And there is some overlap in that a performance measure can’t be measurable unless it is defined. But I still think that, for organisational performance measures, it is better to be DUMB than SMART.

Graham.smith@numericaladvantage.com.au
© Numerical Advantage 2013
http://www.numericaladvantage.com.au

Advertisements

What is needed first: good measurement systems or good measures?

I am one of the first to confess that there are many problems with performance measures.  In some cases, the placing of incentives on achieving specific targets can lead to diversion of effort to making the numbers rather than achieving results for the organisation, commonly known as gaming – for example minimising inventory at the expense of performance.  On other occasions, performance targets may be weakly expressed, meaning little effort is required in order to achieve them, for example targets based on maintaining last year’s performance.  Sometimes, performance measures are incomprehensible, and on other occasions they are meaningless, for example counting the number of pieces of advice a civil servant presents to a Minister.  And often times, performance measures are not in place long enough to establish a trend and permit benchmarking.

The obvious response is that someone, often the CEO or CFO directs someone to ‘fix this’ by devising better performance measures.  And certainly, efforts by the person tasked with such an endeavour are welcome.  But ‘heroic action’ is often not enough.  As per the organisational maturity model, this represents the lowest level of maturity.  Without understanding the forces that affect performance measurement, such reform can be both incomplete and temporary. 

So we need to look a little deeper at the causes of defective measures.  We need to consider not just the measures themselves, but  the influences that drive behaviour at a personal and organisational level.  The major influences I have seen that affect performance measurement at a systemic level are:

  • Organisational type – the nature of the products and services delivers.  Some are more amenable to quantification than others
  • Leadership style and organisational ethos.  This includes the nature of incentives, the weight put on process as opposed to outcome, and whether decision-making is more rational or judgemental.
  • Ownership of the performance measurement system – is this at the organisational level, the sub-organisational level, or is it imposed from the outside?
  • How performance measures are used – just to comply with imposed requirements, as a key driver of performance, or somewhere in between
  • Context: nature of the environment. For example, is the environment stable, or dynamic? Highly competitive or not? Exposed to public scrutiny and influence by stakeholders or not?
  • Internal mechanisms: How the system is designed, coordinated and resourced, including provision of training on the system.  The fundamental issue here is whether there is central design or coordination, or more a bottom-up approach
  • How the measurement system is reviewed – e.g. not at all, internal review, external review.

The bad news is that many of these features are not easily changed, implying that it is difficult to improve the performance measurement.  But some can be adjusted, and by taking into account those that are fixed, you can focus on achievable change..

© Numerical Advantage 2013

http://www.numericaladvantage.com.au

Performance Management Conference, Queenstown, New Zealand

The Performance Management Association (Australasia) held its two-yearly conference in Queenstown New Zealand from 30 October to 1 November. This was a small conference of some 50 people or so, which enabled an intimate feel and good interchange with all participants. The majority of attendees were from New Zealand, but there was also a strong international presence, with participants from Australia, UK, Denmark, Norway, Hungary, Indonesia, and UAE.

pma-australasia-nz-2013

In the following very brief summary of highlights from my point of view, I emphasise the ‘pure’ performance measurement and management papers; there were also a number of interesting insights more generally in the fields of management and accounting, including personnel performance measurement, collaboration and networks, integrity and organisational behaviour. The summaries come from notes from the sessions I attended, and are necessarily an incomplete record.

Jacob Eskildsen from Aarhus University in Denmark presented on satisfaction surveys. He focused on statistical analysis to form segmented populations, but a fascinating sideline was the discussion of how overall life satisfaction feeds into satisfaction surveys. Women are more positive than men. Satisfaction starts high at 18 yo, declines to about 40, then increases. Satisfaction declines with higher education – people become better at criticism. Urbanisation leads to a decline in satisfaction, and those with more influence have higher satisfaction. There are also national differences – Denmark being high, Japan and Portugal being low.

Prof. Mike Bourne of Cranfield in the UK presented a keynote address on the design and conduct of performance measurement review meetings (of organisational units). The meetings need to include both feedback on performance, as well as feed forward (management direction). Prof. Bourne produced a hierarchy of the levels of measurement, starting with no measurement, then reporting of actuals, then various types of ratios for assessment at deeper levels, and finally reviews of the metrics and targets. The unanswered question, though, is whether the process makes a difference.

Derek Gill from the New Zealand Institute of Economic Research described some research on the use of performance information in NZ government agencies. Despite some robust quotes to the effect that ‘performance information is crap’ Dr. Gill’s research found that managers do use organisational performance information more than they use informal feedback but the dominant purpose was control. The higher in the hierarchy, the less was the use of formal performance information; and regional and local offices were heavier users than head office. Organisations with direct services use it more, and if a ‘power user’ used it for one purpose, they were more likely to use it for other reasons, such as publicity. He found there was a high correlation between external and internal reporting. He also noted that, contrary to the governance assumptions that ministers purchasing outputs from departments to achieve government outcomes bureaucrats are more interested in outcomes than ministers are.

Bernie Frey from Praxxis and Mark Le Comte from Auckland Council reported on the exercise to build a performance management system for the merged Auckland Council. Challenges included complex governance based on a multi-level structure and the existence of many plans (150 and growing), and the requirements for rapid reporting. A summary of their lessons learned is: Don’t: wait for all ducks to be in a row; don’t assume big is best; don’t search for big data; don’t leave responsibilities to each silo; or rely on a sponsor. Do pay attention to strategy (and change it if necessary); do involve the organisation, on a top-down basis; do create an Enterprise Metrics Framework; do populate the framework using a top-down and bottom-up approach; and do build the system to support professional judgement; get a leader.

Richard Greatbanks from Otago University reported on work relating to grants to charities in NZ. He found most performance information was collected to report to trustees, with only 1/3 used to provide feedback to grantees.

Ishani Soyser of Massey University, in some work on Australasian non-profits. commented that a performance measurement system is easier to maintain than a balanced scorecard.

Adam Feeley, currently CE of Queenstown Lakes District Council, and previously CEO of the NZ Serious Fraud Office gave some practical advice based on his experience. Accountability documents are useful both to define purpose – enabling push back against unreasonable public expectations, and to align political and public expectations – explaining to political masters how the objectives will be achieved and measured. Internally, they can give staff a common purpose. They should start with goals; the Serious Fraud Office goals had been process-based, but were changed to reflect overall objectives. Previously, quality standards had been vague, hard to measure, and often not reported, but this has improved. The Queenstown Lakes District Council had 150 measures, some meaningless, and 1/3 not achieved. Some questions: does it make sense to ask customer opinions on everything (e.g. stormwater – a technical issue)? Are levels of use a proxy for quality? Is timeliness always a good measure? Do all businesses, including the minor ones, need to be measured? And there should be sanctions for failure to achieve targets.

Norbert Kiss of Corvinus University, Budapest talked on improving the performance of e-health networks. He commented that we look at the public policy cycle – a macro level and the managerial cycle – a micro level, but miss out on the meso level – networks Bottom-up development of performance measures often relates to voluntary, informal and market mechanisms; top-down to mandated, formal and hierarchy. The network manager can be shared, be the lead organisation or be a separate administrator. There can also be multi-level networks, e.g. at central, regional and local levels. Things work better if there is domain consensus, positive evaluation and work coordination at the local level and requirements management and defence of the paradigm at the policy level.

This summarycan only touch on selected highlights of a useful conference. The next conference of the Performance Management Association is in Aarhus, Denmark in June 2014, with abstracts due shortly. See http://www.performanceportal.org/

‘The numbers’: democracy and performance measures

One of the criticisms of performance measurement as used in large government organisations is that they are very much focused on internal structures – some might say bureaucratic – and do not adequately consider the perspectives of the citizens.

One response to that is that the influence of citizens can be reflected in performance measures. Often, this is as simple as conducting satisfaction surveys among those who are affected by government services, and using these as performance measures. In principle, such measures could be used to get the attention of senior management by also using them to influence performance pay, but this does not seem to be very common.

A more fundamental response is that the bureaucracy is responsible to an elected government, which needs to win elections to remain in power. The fundamental performance indicator is therefore votes gained.

election image

Following on from that, other key indicators are the votes of members of parliament or of significant interest groups, within or outside the ruling parties. If a question of policy comes up, political operatives often ask: Do we have the numbers for that?

Indeed, there is a view that performance measurement (or evaluation) in government is not necessary. All that is needed is to have controls and enforcement to make sure that government organisations administer legislation properly (including budget legislation), and then hold elected governments responsible for the legislation and its administration at each election. This view had been associated with the nations of continental Europe, although I understand that even they are now moving a little closer to the view of the anglosphere, which has always been more inclined to the managerialist view, and therefore in favour of organisational performance measurement.

So, what do we make of this? I think a multilevel approach is appropriate. The fundamental performance measure does remain, in a democracy, the votes of the people who can get rid of a government that is not performing. But this is a coarse and infrequent measure: it is a yes/no to an overall package of policies and practices, and only occurs every few years. To support that, we need finer-grained control mechanisms, of which performance measures, including those that reflect public opinion, have a role to play.

© Numerical Advantage 2013
http://www.numericaladvantge.com.au

 

Big data and operations research

This post tries to bring together a fashion of the moment and an old technology that I think deserves to be rediscovered. The idea of big data seems, to use another of the fashionable phrases of the moment, to have reached a tipping point. The availability of data, and lots of it, is not that new or that radical. What technology does seem to have brought to bear in last few years is to enable the production of such data much more cheaply, available to more people, and faster. As is the case in many technologies, increasing these parameters leads not just to the same use being done better, but to a much wider spectrum of uses. So big data is no longer the preserve of government and multi-national companies, but can be employed by a large range of researchers. These researchers can also, because the data is getting cheaper, link information from a wide array of sources. And because many researchers can explore many connections, this crowd-sourcing can help to make unexpected connections.

big data gapminder

Can we use the potential offered by large amounts of cheap data to reinvigorate operations research? Operations research is characterised by modelling, especially numerical modelling, of processes within government or industry in order to determine, in some sense, an optimum. It is used in areas that have lots of internal data, and significant problems for which it is worth spending significant effort to get an extra few per cent of optimisation; areas such as transportation and defence. But it is been of little or no application to areas of social policy. Now that social media as well as official sources can yield big data, can these be used to help governments make key decisions on social investment? And if so, can operations research assist? Hans Rosling and his gapfinder website (from which the picture is taken) talks about his illuminating charts of international trends helping not to prove anything statistically, but to form hypotheses. At the moment, the conventional way of testing such hypotheses has been to use the discipline of evaluation to examine available information and report on the impacts (good or bad) that a specific intervention has caused or contributed to. Can operations research do a better job to provide practical advice to decision-makers by testing the sort of hypotheses that might be generated by big data? For example, in rural development, exactly how do income support, economic investment, health services, training and education and housing supply interact with life expectancy, average income and satisfaction? Can we build a model of this and test, using real data, what is going on? Would this be the same if we applied it to remote communities? What about primarily Indigenous communities? Such models have to pass two tests. Their structure has to make sense, that is they have to use plausible mechanisms; and the results have to be consistent with reality. Construction and testing of such complex models raises significant challenges, at least as many as for major econometric models. But when the data is available, the task becomes possible. © Numerical Advantage 2013 http://www.numericaladvantge.com.au

‘Sticky’ performance measures

Dan and Chip Heath have written about the concept of ‘stickiness’. They suggest that there are rules that determine which ideas survive and prosper, and which are doomed to fade away to oblivion. The characteristics of ideas that survive, they claim, can be summarised by the acronym SUCCES. My interpretations of the parameters are:

• Simple: extraneous detail removed; avoidance of jargon and complex language. The authors talk about ‘stripping an idea down to the core’

• Unexpected: Have an element of surprise that makes them memorable. Interesting stories tend to be those that have a twist to the tale.

• Concrete: Expressed in terms of real tangible objects that relate directly to the reader, not abstract concepts.

• Credible: Believable, makes inherent sense; has a degree of authority; and evidence is available to support the idea

• Emotional: Ideas that connect to fundamental human drives, that affect our emotions, will have more impact

• Story: A narrative is easier to remember and assimilate than a formal or statistical argument

How does this relate to performance measures? Performance measures are, or should be, used to communicate, to inform and even to persuade stakeholders about the level of performance and by inference the sort of action that needs to be taken. Too often, they just sit there is some sort of dry and dusty annual report; in other words, they have failed to be sticky. Provided that they remain valid and reliable – one of the critiques of the ‘stickiness’ approach is that it can be seen as more applicable to mere commercial marketing as opposed to transmitting concepts of lasting social value – then maybe the SUCCES acronym can help. How might this list be applied to performance measures?

• Simple: It is preferable to measure one thing, not use a figure of merit that combines several factors in an opaque way.

• Unexpected: The main concept here is that it must be feasible that a measurement surprises; it must be possible to express a failure, and when a failure occurs, it must be recorded. Surprisingly often, this is not the case.

• Concrete: There must be a simple and defined method to calculate the measure, based on known or measurable facts.

• Credible: The measure must be testable; the base data must be available and valid, and a second person, using the same base data, should produce the same result.

• Emotional: In some cases, human perceptions are valid inputs and need to be recognised and measured, for example through opinion surveys.

• Story: Performance measures need to relate to the overall ‘story’ – i.e. vision or mission – of the organisation. If they provide direct evidence of the achievement of a desired outcome, they are very powerful.

That is not quite a perfect list of what makes a good set of performance measures (more of that in later posts). But I do consider that it is not a bad start.

Reference: Chip Heath and Dan Heath, Made to Stick: Why Some Ideas Survive and Others Die. Random House, 2007

© Numerical Advantage 2013
http://www.numericaladvantge.com.au

DO count your chickens before they are hatched

Contrary to the old saying, it is a good idea to assess how many chickens you are likely to wind up with before they are hatched. That way, you can figure out how to feed and look after them up, and how much money you will make when they are eventually sold (assuming this is a commercial operation). The saying is, of course, a warning against undue optimism, and for this reason you need a factor to convert the number of apparently fertilised eggs into the expected number of chicks. But it is sensible to count the eggs and make that estimate.

chickens

What we are talking about is a leading indicator, and as such it can be really valuable. It has the virtues of timeliness and forewarning at the expense of loss of precision. Economists are really keen on leading indicators as helping to predict movements in the economy. For example, they keep an eye on the number of housing approvals. This has to happen before houses are built, and then there is further economic activity in the outfitting of the house with furniture and whitegoods. So housing approvals helps to predict the direction of the economy – not perfectly, but it is a guide.

Similarly in business or government. Keeping an eye on precursors or enablers helps to predict what happens next. All other things being equal, increased sales inquiries will lead to more sales and higher profit. An increase in applications to enter a particular discipline at the nation’s universities is likely to lead to an increase in graduates some years hence.

The newfangled term for this is prognostics (by comparison with diagnostics). With more sensors available on equipment like jet engines, GE can now have ‘prognostic’ information – being able to forecast what will happen, as opposed to the former availability of only diagnostic information – what is happening now, what is wrong now. In the New York Times of September 14, 2013, Beth Comstock, G.E.’s chief marketing officer noted that a company can assemble data so much more accurately to “observe performance, predict performance and change performance.”

The ideal leading indicator is something that measures something ‘out there’, among the clients and users of a program, rather than activity or effort inside the program. If there is a well-defined program, then adding more resources is likely to increase the outputs; but there is a danger in blindly using a measure of activity as a leading indicator. For example, it is risky to assume at the start of an untried program that its initiation will lead to the predicted results. Using a measure of resource usage as an indicator of success is, in this case, not a leading indicator but an example of counting your chickens before they are hatched.

© Numerical Advantage 2013.