What is a Confusion matrix?
A confusion matrix is a table that shows how a classifier’s predictions compare to the true labels. It helps you see not just how often a model is right, but how it is wrong—which classes it confuses and where errors concentrate.
Binary confusion matrix (two classes)
For a yes/no model, the matrix typically includes:
- True Positive (TP): predicted positive, actually positive
- True Negative (TN): predicted negative, actually negative
- False Positive (FP): predicted positive, actually negative
- False Negative (FN): predicted negative, actually positive
From these values, teams compute common metrics:
- Precision: when the model predicts positive, how often is it correct?
- Recall: of all actual positives, how many did the model catch?
- F1 score: balance of precision and recall
Multi-class confusion matrix
For many classes, the matrix becomes a grid where each row is a true class and each column is a predicted class. This is one of the fastest ways to spot:
- Two classes that are consistently mixed up
- A class the model almost never predicts
- A class with high false positives because it “looks like everything”
Why it matters for data teams
Confusion matrices are practical because they often point directly to data actions:
- Add more examples for a weak class
- Split a confusing class into clearer definitions
- Improve labeling guidelines
- Collect “hard negatives” that look similar but are different