I was surprised to find that a model can generate the same ROC curve, but have drastically different precisions/recalls depending on the data set the model is applied to. This is driven by the fact that ROC is scaling invariant, while the latter is not.
I was recently working with an imbalanced data set D, where the positive minority class represented ~5% of the samples. I down sampled the negative class to create a balanced (50-50 split) data set D’. I trained the model on D’ and in order to get an idea of model performance, I generated the ROC curve and the Precision/Recall using both D and D’. While the ROC curve was the same for both, the precision dropped from 80% on D’ to 10% on D. What is the interpretation?
The ROC curve is the True Positive Rate (TPR) plotted against the False Positive Rate (FPR). The former measures the success rate within the actual positive sample, while the latter measures the rate within the actual negative class. Thus, if the rate of prediction within each class remains the same for D and D’, the ROC curve will look similar.
On the other hand, precision measures performance of the positive predictions. Going from D’ to D increases the number of negative samples. If the rate of prediction within each class remains the same, giving a similar ROC curve, then the precision will drop because of the substantially more negative samples. Essentially, the model will be calling many of the new negative samples as positive.
- Assume that FPR and TPR remains the same
- TN, TP, FN, FP is the number of True Negatives, True Positives, False Negatives, False Positives.
- Actual negative class is under sampled by a factor n – this only effects the number of TN and FP
FPR_D’ = FP / FP+TNFPR_D = nFP/ nFP + nTN = FP / FP+TNFPR_D’ = FPR_D
Precision_D’ = TP / TP + FPPrecision_D = TP / TP + nFP
Precision_D’ / Precision_D = TP + nFP / TP + FP
Plugging in numbers to see the impact. Assume 1) TP ~ FP, so that Precision_D’=50% and 2) n~5 (majority negative class is 5x minority positive class). Then: Precision_D’ / Precision_D = 1+n / 2 = 3. The precision on the under sampled class D’ will be 3x higher than on the full data set D.