Evaluations For A Classifier In Machine Learning

4 min readOct 27, 2020

This blog is all about various evaluation methods in a classification problem. Confusion matrix, evaluation metrics and ROC - AUC curves can be used to evaluate the model performance.

Confusion Matrix:

Confusion Matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of the target classes. The matrix compares actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kind of errors it is making.

For a binary classification problem we would have a 2 x 2 matrix as show below

True Positives(TP): The predicted value matches the actual value. The actual value was positive and the model predicted as positive.

True Negatives (TN): The predicted value matches the actual value. The actual value was negative and the model predicted as negative.

False Positives (FP):The predicted value was falsely predicted. The actual value was negative but model predicted it as positive. Also known as Type 1 error.

False Negatives (FN): The predicted value was falsely predicted. The actual value was positive but the model predicted it as negative. Also known as Type 2 error.

Evaluation metrics can be calculated by using TP, TN, FP, FN.

Confusion Matrix using sklearn:

from sklearn.metrics import confusion_matrix# Returns the confusion matrixconfusion_matrix(y_test , y_predictions)

Evaluation Metrics:

Evaluation metrics measures the quality of a machine learning model. Using evaluation metrics are critical in ensuring that your model is operating correctly and optimally.

Let’s work through these evaluation metrics to understand what each metrics tell us.

Precision & Recall:

Precision:

Precision measures how precise our predictions are. The following formula shows how to use information found in confusion matrix to calculate the precision on a model.

Recall:

Recall indicates what percentage of the classes we are interested in were actually captured by the model. The following formula shows how to use information found in confusion matrix to calculate the precision on a model.

There is always an inverse relationship between precision and recall. If our precision goes up recall goes down and vice versa.

Accuracy & F1 Score:

Accuracy:

Accuracy is one of the most informative metric that is used to measure the performance on the model. The formula for the accuracy is

Accuracy allows us to answer out of all the predictions made by the model made, what percentage are correct.

F1 Score:

F1 Score is also an informative metric like accuracy. F1 Score represents the harmonic mean of precision and recall. The formula for the F1 Score is

Evaluation Metrics in sklearn:

from sklearn.metrics import accuracy_score# Returns Accuracy Scoreaccuracy_score(y_test,y_predictions)from sklearn.metrics import classification_report# Returns F1-Score,Precision,Recall,Support classification_report(y_test,y_predictions)

ROC - AUC Curves:

ROC-AUC Curves(Receiving Operator Characteristics- Area Under Curve) helps to analyze the performance of classification at various threshold settings.

ROC illustrates the True Positive Rate against the False Positive Rate. The True Positive Rate is another name for Recall. The formula for the True Positive Rate is

False Positive Rate is the ratio of false positive predictions compared to all values that are actually negative. The formula for the False Positive Rate is

The best performing models will have an ROC curve that hugs the upper left corner of the graph.

The ROC curve gives us a graph of the tradeoff between false positive and true positive rate. The AUC gives us a singular metric to compare these. An AUC of 1 being the perfect classifier and AUC of 0.5 being that which has a precision of 50%.

ROC — AUC using sklearn built in functions:

from sklearn.metrics import roc_curve, aucy_score = logreg.fit(X_train, y_train).decision_function(X_test)fpr, tpr, thresholds = roc_curve(y_test, y_score)auc = auc(fpr, tpr)

We will go into a detailed example in my next blog and show how we can use each of these metrics and understand them better.

Happy Reading !!!

References

scikit-learn

"We use scikit-learn to support leading-edge basic research [...]" "I think it's the most well-designed ML package I've…

scikit-learn.org

3.3. Metrics and scoring: quantifying the quality of predictions - scikit-learn 0.23.2…

scikit-learn: machine learning in Python