Confusion matrices

4 minutes

You can think of data as continuous, categorical, or ordinal (categorical but with an order). Confusion matrices are a means of assessing how well a categorical model performs. For context as to how these matrices work, let’s first refresh our knowledge about continuous data. Then, we can see how confusion matrices are simply an extension of the histograms we already know.

Continuous data distributions

When we want to understand continuous data, the first step is often to see how the data is distributed. Consider the following histogram:

Histogram showing label distribution.

We can see that the label is, on average, about zero, and most datapoints fall between -1 and 1. It appears symmetrical; with a nearly even count of numbers smaller and larger than the mean. If we wanted, we could use a table rather than a histogram, but it could be unwieldy.

Categorical data distributions

In some respects, categorical data aren't so different from continuous data. We can still produce histograms to assess how commonly values appear for each label. For example, a binary label (true/false) might appear with frequency like so:

Bar plot showing more false labels then true.

This plot tells us that there are 750 samples with false as a label, and 250 with true as the label.

A label for three categories is similar:

Bar plot showing more animal labels than person and tree labels.

This plot tells us that there are 200 samples that are person, 400 that are animal, and 100 that are tree.

As categorical labels are simpler, we can often show these labels and values as simple tables. The two preceding graphs would appear like so:

Label	False	True
Count	750	250

And:

Label	Person	Animal	Tree
Count	200	400	100

Looking at predictions

We can look at predictions that the model makes just like we look at the ground-truth labels in our data. For example, we might see that in the test set our model predicted false 700 times and true 300 times.

Model Prediction	Count
False	700
True	300

This result provides direct information about the predictions our model is making, but it doesn’t tell us which of these predictions are correct. We can use a cost function to understand how often the correct responses are given. But the cost function doesn't tell us which kinds of errors are being made. For example, the model might correctly guess all true values, but also guess true when it should guess false.

The confusion matrix

The key to understanding the model performance is to combine the table for model prediction with the table for ground-truth data labels:

Diagram of the confusion matrix with total numbers added.

The square that isn't filled out is called the confusion matrix.

Each cell in the confusion matrix tells us one thing about the model’s performance. These things are True Negatives (TN), False Negatives (FN), False Positives (FP), and True Positives (TP).

Let’s explain these acronyms one by one and replace them with actual values. Blue-green squares mean the model made a correct prediction, and orange squares mean the model made an incorrect prediction.

True Negatives (TN)

The top-left value indicates how many times the model predicted false, and the actual label was also false. In other words, TN indicates how many times the model correctly predicted false. Let’s say, for our example, that this happened 500 times:

Diagram of the confusion matrix without total numbers, showing true negatives only.

False Negatives (FN)

The top-right value tells us how many times the model predicted false, but the actual label was true. We know now that the FN value is 200. How? Because the model predicted false 700 times, and 500 of those times it did so correctly. Thus, 200 times it predicted false when it shouldn't have.

Diagram of the confusion matrix showing false negatives only.

False Positives (FP)

The bottom-left value holds false positives. It tells us how many times the model predicted true, but the actual label was false. We know now that the FP value is 250, because there were 750 times that the correct answer was false. 500 of these times appear in the top-left cell (TN):

Diagram of the confusion matrix showing false positives also.

True Positives (TP)

Finally, we have true positives. This value is the number of times that the model correctly prediction of true. We know that the TP value is 50 for two reasons. Firstly, the model predicted true 300 times, but 250 times it was incorrect (bottom-left cell). Secondly, there were 250 times that true was the correct answer, but 200 times the model predicted false.

Diagram of the confusion matrix showing true positives also.

The final matrix

We normally simplify our confusion matrix slightly, like so:

Diagram of the simplified confusion matrix.

We colored the cells here to highlight when the model made correct predictions. From this matrix, we know not only how often the model made certain types of predictions, but also how often those predictions were correct or incorrect.

Confusion matrices can also be constructed when there are more labels. For example, for our person/ animal/tree example, we might get a matrix like so:

Diagram of the expanded confusion matrix with three labels: person, animal, and tree.

When there are three categories, metrics like True Positives no longer apply, but we can still see exactly how often the model made certain kinds of mistakes. For example, we can see that the model predicted that person 200 times when the actual correct result was animal.