-
Notifications
You must be signed in to change notification settings - Fork 7
ROC Curves
The purpose of the post-processing is to convert the true and predicted labels on the data into some meaningful summary statistic.
The simplest possibility for doing this would be a basic sum-of-squares. This does not scale well with data size, and strongly penalizes indecisiveness on the part of the network (a value near 0.5), even if the result consistently leans in the correct direction.
Since we're ultimately building a classifier, it makes sense to use a classifier specific summary statistic. This ensures that it captures the information that we have: these values are probabilities, in some sense.
The first step in evaluating a classifier is creating a confusion matrix. Classifiers which output a class label alone have these. Since we know more than a label (we have the full neural network output: a real value), we can treat this as constructing a family of classifiers. For each possible threshold (each distinct neural network output), we can construct a classifier. After we do this, we have a large number of confusion matrices.
We can go a step further in summarizing by picking two specific useful
statistics: the true positive rate (hereafter TPR), and the false positive rate
(FPR). These are equal to TP / (TP + FN)
and 1 - (TN / (TN + FP))
.
Plotting them against each other gives an ROC curve. The "goodness" of the
classifier is then the area between that curve and the "no-discrimination line":
the line where FPR = TPR
. A classifier with no knowledge of the data will
approximate that line. Doubling the signed area between the curve and the
no-discrimination line gives the Gini coefficient, a quantity that ranges from
zero to one and quantifies the "goodness" that we want. This appropriately
evaluates success.