Reducing "Noise" in Evaluations

By Richard Pennington posted 06-21-2021 19:53

Recommend

In 2021, three giants in the fields of psychology, business strategy, and public policy published a new book: Daniel Kahneman, Olivier Sibony, and Cass Sunstein, Noise: A Flaw in Human Judgment. For my money, this book belongs on the shelf of every procurement professional. it has very practical advice. Chapter 8, "How Groups Amplify Noise," itself is worth the price.
The book is focused on professional judgments, starting with the premise that a judgment is a measurement. A judgment is an attempt to be as accurate as possible in matching a decision to reality, whether in a predictive judgment or an evaluative judgment. Think about it, procurement evaluations are a combination of both. They involve evaluative judgments about the merits of a proposed approach and the needed past experience to achieve a successful outcome. And if risk is assessed, there is a predictive element to forecasting an outcome.
The authors do not use procurement evaluation decisions as examples, but they cover decisions that are analogous and instructive. Those include some directly relevant decisions like employment performance ratings and hiring decisions. From their examples about judgments in law, medicine, and business, there is a lot to be learned. Perhaps the fundamental theme in the book is: Bias is one error in judgments; noise, the variation in judgments, the other.

This not-so-hypothetical chart is based on an evaluation spreadsheet I saw years ago. Was Evaluator 3 biased? Was there unacceptable “noise” in the evaluation standard, like ill-defined standards for the scale? Noise is unwanted variation in judgments. The authors dispel the myth that averages take care of variation, that the random pattern of noise cancels it count. Or said another way, just averaging scores on a spreadsheet doesn’t solve the problem.
The book plainly discusses the types of bias and noise. Want one example? Confirmation bias, as in searching for information in a proposal that is consistent with the preconceived intuitive opinion about it. Sound familiar? We as procurement professionals spend a great deal of time addressing bias and avoiding it. We spend less time on variation in evaluations and the concept of “noise” as explained by the authors.
I'll let you individually explore the ideas in the book, but here I'll adapt a few of the concepts that can be applied to proposal evaluations. These examples are not taken from the book but are derived using the principles the authors illuminate. Further, the authors don't cover the decision constrains that public procurement professionals operate under. For example, continuing to get more and more information before the decision is somewhat restrained by rules requiring transparency, disclosure of significant evaluation factors, and limits on the outside information that can be considered in evaluating proposals. That said, here are some practices that would be consistent with the Noise's ideas.

Collaborate as a committee on the definition of evaluation criteria and their weights. In Noise the ability to do this iteratively through the decision process is a possibility; public procurement as a matter of policy and equity clarifies scales before proposals are seen by evaluators.

Training of evaluators is important so they understand the scales. Ambiguity in scales can cause variation in scores and noise. As Noise points out, “Words such as likely or numbers (e.g., ‘4 on a scale of 0 to 6’) mean different things to different people.” Discuss as a committee evaluation factors and assessment criteria.

Using a structured evaluation approach that reaches finality on the technical evaluation before considering price helps eliminate "noise" from the influence that can be caused by knowledge of high or low bids.

Consider structuring evaluations so significant criteria are evaluated independently. It’s difficult to do in a procurement evaluation, but limiting price disclosure to the evaluation committee until later in the evaluation is an example. Some evaluation committees have a sub-group initially evaluate past experience independent of the evaluators looking at the technical proposal.

Having truly independent initial evaluations can reduce the noise caused by early expressions of judgment in a group and the varying influences that evaluators can exert on a group, causing less vocal members to stay silent ant not express opinions. Aggregation of independent evaluations, like averaging initial scores, then are more accurate.

Group discussion after independent initial evaluations can reduce noise caused by cognitive and emotional errors like overconfidence, confirmation bias, anchoring (e.g. halo effect of a smart-looking cover on the first proposal reviewed), loss/risk aversion, or availability bias (like a recent bad experience with a supplier submitting a proposal). During those consensus discussions, initially focus on the facts used in the assessment and resist expressing “premature intuitions” that signal overall judgments. Initially hiding the total scores on spreadsheets can reduce premature judgments during discussions about technical merit.

Some noise reduction strategies can have downsides, as when a team member truly is the expert and group consensus has the effect of marginalizing the better judgment of the member most competent and experienced. Consider having an open discussion about overconfidence and certainly.

At some point later in the evaluation, take the outside view: looking at the comparative results of evaluations from the perspective of how that various choices overall compare with one another. A final consensus meeting and preparation of an award memorandum at the completion of evaluations can serve as a final check.

The authors of Noise promote the use of a decision observer to monitor the group dynamics and potential sources of noise. This perhaps is a role best played by the procurement professional who evaluates pricing but does not vote on proposals. In that role, the professional may be able to monitor the process for potential sources of errors, help spot noise, and lead discussions to reduce it.
The authors conclude that comparative, relative scales and judgments involve less noise than absolute ones. In procurement terms, they would say that it is more accurate to compare proposals than to use an absolute scale for determining merit. The disadvantage with the comparative approach, though, is the limited ability of ordinal ratings to provide a measure of the magnitude of the differences, needed in best value judgments where price is a factor.
The statistical ideas of the book have less relevance, like use of large studies to measure historical bias, for example, and make adjustments to judgments. In procurement evaluations, that data doesn’t exist. As a result, process occupies a predominant role in reducing noise in procurement evaluations. The book acknowledges the importance of process.

Noise is a terrific read and directly relevant to decisions in the practice of public procurement. I welcome your perspectives on the book’s application to your practice. Do you have noise in your evaluations?

2 comments

229 views

Permalink

Comments

Michael Thornton

12-06-2021 11:06

@Greg Anderson
I recently switched from a 1-5 to a 1-10. I have found that a 1-5 does not provide evaluators enough 'room' in scoring for a clear separation of proposals. A 1-10 gives evaluators more room to reward the better proposals from the others.

As for an evaluator controlling the outcome, I recommend you apply ordinal ranking to determine the final ranking. In this method all proposals are ranked ordinally (1,2,3,4 . . .) for each evaluator. The ordinals are then added up and the proposal with the lowest total ordinal rank is the top ranked and the others ranked accordingly. This method eliminates the use of 'total points'.

Greg Anderson

10-04-2021 14:57

I recently had a department take a hard look at our general guidance regarding scoring, and they are pressing to go from a 1-10 (current system) rating to a 1-5 rating method. Their thought being that it would prevent one evaluator from "running away" with the award to get who they want or ensure that one supplier is not eligible for award. However, all these rating methods are weighted, and I always get comments from departments about allowing for decimal scores, etc...

The only benefit I can see here is a half measure (meaning that a "1" on a 1-5 scale would be relatively worth a "2" on a 1-10 scale). In my experience the only way to keep a determined member of an evaluation team from "running away" with an evaluation is through training or through taking the extreme step of elimination of the evaluator.

Additionally, we state that the final ranking will be based on consensus rather than scoring, but that independent scores and evaluation team review of strengths and weaknesses inform that consensus.

Blog Viewer