Receiver Operator Characteristic

Receiver Operator Characteristic

Receiver operator characteristic (ROC) curve connects the consecutive TPR-FPR 2-D points, which are obtained by

  • ranking the testcases according to the probability of being positive from high to low;

  • repeatedly labelling the current testcase as positive and re-computing the TPR-FPR;

  • and proceeding to the next.

Note that true positive rate (TPR, also the recall) and false positive rate (FPR) are

\[ \text{TPR} = \frac{\text{TP}}{\text{TP + FN}},\ \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} \]

When we lower the threshold (from \(1\) to \(0\)) of the probability of estimating the sample as positive, both TPR and FPR will increase from \(0\) to \(1\) monotonically. The area under the curve (AUC) of ROC depicts the probability that a random positive testcase has a higher predicted probability of being positive than a random negative testcase.

To show it, note that in the process described above, when a positive testcase is added as positive, TPR goes up by \(\frac{1}{\text{\# total positive samples}}\); when a negative testcase is added as positive, FPR goes up (right) by \(\frac{1}{\text{\# total negative samples}}\). Suppose at iteration i there is an update on FPR, we can accumulate the area by

\[ \frac{\text{\# current positive samples}_i}{\text{\# total negative samples} \times \text{\# total positive samples}} \]

\(\text{\# current positive samples}_i\) is the number of positive testcases that are added before the negative testcase i. This holds for each negative testcase. That is

\[ \begin{aligned} \text{AUC-ROC} = \frac{\text{\# positive-negative pairs where $+$ is ranked before $-$}}{\text{\# total negative samples} \times \text{\# total positive samples}} \\ = \frac{\text{\# positive-negative pairs where $+$ is ranked before $-$}}{\text{\# total positive-negative pairs}} \end{aligned} \]

AUC-ROC reflects the model’s capability of telling positive testcase from negative testcase.

Classification: ROC Curve and AUC | Machine Learning | Google for Developers

Previous
Next