I was surprised to find that a model can generate the same ROC curve, but have drastically different precisions/recalls depending on the data set the model is applied to. This is driven by the fact that ROC is scaling invariant, while the latter is not.

I was recently working with an imbalanced data set D, where the positive minority class represented ~5% of the samples. I down sampled the negative class to create a balanced (50-50 split) data set D’. I trained the model on D’ and in order to get an idea of model performance, I generated the ROC curve and the Precision/Recall using both D and D’. While the ROC curve was the same for both, the precision dropped from 80% on D’ to 10% on D. What is the interpretation?

The ROC curve is the True Positive Rate (TPR) plotted against the False Positive Rate (FPR). The former measures the success rate *within* the actual positive sample, while the latter measures the rate *within* the actual negative class. Thus, if the rate of prediction within each class remains the same for D and D’, the ROC curve will look similar.

On the other hand, precision measures performance of the positive predictions. Going from D’ to D *increases* the number of negative samples. If the rate of prediction within each class remains the same, giving a similar ROC curve, then the precision will drop because of the substantially more negative samples. Essentially, the model will be calling many of the new negative samples as positive.

Mathematically

- Assume that FPR and TPR remains the same
- TN, TP, FN, FP is the number of True Negatives, True Positives, False Negatives, False Positives.
- Actual negative class is under sampled by a factor n – this only effects the number of TN and FP

FPR_D’ = FP / FP+TNFPR_D = nFP/ nFP + nTN = FP / FP+TNFPR_D’ = FPR_D

Precision_D’ = TP / TP + FPPrecision_D = TP / TP + nFP

Precision_D’ / Precision_D = TP + nFP / TP + FP

Plugging in numbers to see the impact. Assume 1) TP ~ FP, so that Precision_D’=50% and 2) n~5 (majority negative class is 5x minority positive class). Then: Precision_D’ / Precision_D = 1+n / 2 = 3. The precision on the under sampled class D’ will be 3x higher than on the full data set D.