The Precision

The typical precision and recall definition in a binary classification is

$$
\text{precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}, \text{recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}
$$

Precision determines, among all the samples that are identified as positive, how many are really positive. Recall determines, among all the samples that are positive, how many are successfully identified.

Binary classification task will assign each case a score that shows how confident the model is in the case is indeed positive. We can arrange all the samples in descending order of this score. Then beginning with an empty set, we add the sample one by one into this set. Every time we add in a new sample, we can calculate the precision and the recall within this set.

During this process, recall will increase monotonically; but precision may go up and down. We can draw a plot with regard to this two numbers and this is the precision-recall curve (PRC). The area under PRC usually indicates the goodness of the model, as that of a perfect model will be 1.

Average Precision

The “average precision” term usually appears in document retrieval and object detection scenario. For document retrieval task,

$$
\text{precision} = \frac{{ \text{relevant documents} } \cap { \text{retrieved documents} } } {{ \text{retrieved documents} }}
$$

The question is where the “average” comes from. Similar to the plotting of PRC, we may rank the retrieved documents according to the relevance. Starting from an empty set of documents predicted as relevant, we add these retrieved documents one by one. Then we can have a series of precisions to average upon.

On the other hand, in object detection task, predicted anchors are firstly ranked according to its predicted objectness score. After that, each anchor will be assigned a objectness label. The “positivity” of a anchor is mainly determined by its intersection over union (IoU) with the ground-truth bounding boxes. The rules for determining positivity is complex. But as a result, we will have positive anchors, negative anchors and those neither positive nor negative. During training, only positive and negative anchors will contribute gradient. But in precision, non-positive anchors are treated as “negative”. For a detailed discussion of positivity in object detection, please refer here.

We can rank the anchors according to the objectness score. Then similarly beginning with an empty set, we add the anchor one by one into this set. Every time we add in a new anchor, we can calculate the precision. By doing so, we obtain a series of precisions to average.

Mean Average Precision

We can also obtain a series of average precisions for different classes of objects, yielding the concept of mean average precision.

“Average Mean Average Precision”

Moreover, we may change the IoU threshold to obtain a series of mean average precisions to average upon; in some sense this is the “average mean average precision”.