Confusion Matrix

Rohit Dhore
5 min readNov 19, 2021

* What Is a Confusion Matrix?

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making. Confusion matrix is one of the easiest and most intuitive metrics used for finding the accuracy of a classification model, where the output can be of two or more categories. This is the most popular method used to evaluate logistic regression.

A confusion matrix is a table that allows you to visualize the performance of a classification model. You can also use the information in it to calculate measures that can help you determine the usefulness of the model.

Here is how you would arrange a 2×2 confusion matrix in abstract terms:

Confusion matrix is quite simple, but the related terminologies can be a bit confusing. Alright, let us understand the terminologies related to confusion matrix with the help of an example.

Let us say, we have a data set with the data of all patients in a hospital. We built a logistic regression model to predict if a patient has cancer or not. There could be four possible outcomes. Let us look at all four.

True Positive

True positive is nothing but the case where the actual value as well as the predicted value are true. The patient has been diagnosed with cancer, and the model also predicted that the patient had cancer.

False Negative

In false negative, the actual value is true, but the predicted value is false, which means that the patient has cancer, but the model predicted that the patient did not have cancer.

False Positive

This is the case where the predicted value is true, but the actual value is false. Here, the model predicted that the patient had cancer, but in reality, the patient doesn’t have cancer. This is also known as Type 1

True Negative

This is the case where the actual value is false and the predicted value is also false. In other words, the patient is not diagnosed with cancer and our model predicted that the patient did not have cancer.

Understanding Various Performance Metrics

We will be taking the help of a confusion matrix given below in order to find various performance metrics.

Alright, let us start with accuracy:

Accuracy or Classification Accuracy:

  • What: In classification problems, ‘accuracy’ refers to the number of correct predictions made by the predictive model over the rest of the predictions.
  • When to use: When the target variable classes in the data are nearly balanced
  • When not to use: When the target variables in the data are majority of one class

Precision

  • What: Here, ‘precision’ means on what proportion of all predictions that we made with our predictive model are actually true.
  • How:
  • It means, when our model predicts that a patient does not have cancer, it is correct 76 percent of the time.

Recall or Sensitivity:

  • What: ‘Recall’ is nothing but the measure that tells what proportion of patients that actually had cancer were also predicted of having cancer. It answers the question, “How sensitive the classifier is in detecting positive instances?”
  • How:
  • It means that 80 percent of all cancer patients are correctly predicted by the model to have cancer.

Specificity:

  • What: It answers question, “How specific or selective is the classifier in predicting positive instances?”
  • How:
  • A specificity of 0.61 means 61 percent of all patients that didn’t have cancer are predicted correctly.

F1 Score

  • What: This is nothing but the harmonic mean of precision and recall.
  • How:
  • F1 score is high, i.e., both precision and recall of the classifier indicate good results.

Benefits of Confusion Matrix

  1. It gives information about errors made by the classifier and the types of errors that are being made.
  2. It reflects how a classification model is disorganized and confused while making predictions.
  3. This feature assists in prevailing over the limitations of deploying classification accuracy alone.
  4. It is practised in conditions where the classification problem is profoundly imbalanced and one class predominates over other classes.
  5. The confusion matrix is hugely suitable for calculating Recall, Precision, Specificity, Accuracy and AUC-ROC Curve.

Conclusion

A confusion matrix is a remarkable approach for evaluating a classification model. It provides accurate insight into how correctly the model has classified the classes depending upon the data fed or how the classes are misclassified.

--

--