Kappa Statistic for Attribute MSA

The Kappa Statistic is the main metric used to measure how good or bad an attribute measurement system is.

In the measure phase of a six sigma project, the measurement system analysis (MSA) is one of the main and most important tasks to be performed. An MSA lets you know if you can trust the data that you are measuring. Before you go ahead and create experiments and analyze any data, you want to make sure that the data is measured properly and that you can actually trust the data. This is the tool you need to use to test the capability of your measurement system.

It is important to note that this tool is used for attribute measurements (category, error type, ranking, etc.) and not variable measurements (time, distance, length, weight, temperature, etc.). To test the capability of a variable measurement system, you need to perform a Gage R&R.

Where would you use an attribute measurement system? Usually in service type environments. For example, at a call center, you may have internal quality raters who rate each call on a scale of 1 to 5 depending on how well the call went. It is important to ensure a consistent measurement system - if one quality rater gave a rating of 4 to a particular call, all the other quality raters should have the same rating. If not, then there is some flaw, confusion, or inconsistency in the measurement system.

The Kappa statistic is used to summarize the level of agreement between raters after agreement by chance has been removed. It tests how well raters agree with themselves (repeatability) and with each other (reproducibility). For more information on repeatability and reproducibility, please see Gage R&R.

Kappa Statistic Formula

Pobserved = Proportion of units classified in which the raters agreed

Pchance = Proportion of units for which one would expect agreement by chance


The Kappa statistic tells us how much better the measurement system is than random chance. If there is substantial agreement, there is the possibility that the ratings are accurate. If agreement is poor, the usefulness of the ratings is extremely limited.

The Kappa statistic will always yield a number between -1 and +1. A value of -1 implies totally random agreement by chance. A value of +1 implies perfect agreement. What Kappa value is considered to be good enough for a measurement system? That very much depends on the applications of your measurement system. As a general rule of thumb, a Kappa value of 0.7 or higher should be good enough to use for investigation and improvement purposes.

Just like the Gage R&R, the Attribute MSA is set up like an experiment. Samples are randomly chosen for multiple operators to measure. Each operator will also measure each sample randomly multiple times. The results of each measurement are then run through an Attribute MSA analysis (very easily done through statistical software like Minitab or Sigma XL). This then gives us the output of the Kappa statistic, and lets us know how much better than random chance our measurements system is.


Leave "Kappa Statistic for Attribute MSA" and go back to "Lean Six Sigma Tools"