Warping Error, Rand Error and Pixel Error in Semantic Segmentation

3 min readJul 27, 2023

In this post, I’ll review three error metrics that are used most frequently in semantic segmentation.

Given K binarized feature maps of class predictions, where K is the number of classes. Our discussion is based on the kth map, for notational convenience, we call it a, and it’s corresponding ground truth binary mask a*. Segmentation error metrics measure the similarity between the prediction and the ground truth.

Pixel Error

Pixel Error is the most straightforward among the three metrics being discussed. The number of pixel values that are identical between the ground truth and the predicted is denoted as ‘N’, total number of ground truth pixels is ‘M’, pixel error equals ( M — N / M), and is defined as the ratio of pixels whose predicted values disagree with the ground truth labels.

Warping Error

Let’s continue with the one with the fanciest name — warping error.

We could notice that pixel error treats all the difference between ground truth and prediction equally important, some minor errors, e.g. those discrepancies among the boundary may not worth to be taken too seriously in practical use. Like in the case of segmenting cell images for medical diagnosis, as long as a coarse cell boundary is given, qualitatively the pixel errors near the boundary don’t cause noticeable difference. So warping error is introduced to help.

There are two binary feature maps L and L*, we say L is a warp of L* if L* can be transformed into L by a sequence of pixel flips(e.g 0 to 1)while simultaneously preserving the topology of L within a mask M

Warping Error of another binary prediction T to warping of L* is defined as:

That’s to say, the warping error between two segmentations is the minimum pixel error between the target segmentation to the topology preserving warp of the source segmentation.

Rand Error

Rand Index was originally proposed to measure the similarity of two data clusterings, nevertheless, semantic segmentation is to cluster the pixels into integer indexes, so recently, Rand Index(aka RI) and Rand Error(aka RE) are introduced to measure segmentation performance.

RE is defined as: 1 — RI.

Given a set of n elements S {o1, o2, …, on}, we need to compare the similarity of two clusterings X and Y, X classifies the elements into r subsets {x1, x2, … , xr}, and Y categorizes them into s subsets {y1, y2, … , ys}. The following terms are defined:

a, the number of pairs of elements in S that are in the same subsets in X and in the same subsets in Y

b, the number of pairs of elements in S that are in different subsets in S and in different subsets in Y

c, the number of pairs of elements in S that are in the same subsets in X but in different subsets in Y

d, the number of pairs of elements in S that are in different subsets in X but the same subsets in Y

It’s trivial to show that the list is exhaustive for all pairs of elements in S, so based on k-permutations of n:

a + b + c + d = C(n, 2)

Rand Index is defined as :

RI = (a + b) / (a + b + c + d) = (a + b) / C(n,2)

At first glance, RI is quite similar to Dice Coefficient, so why do we need RI? The following figure has more to say:

Fig 1: Two clusterings that are topologically similar

In Fig 1, without diving into calculation, we can sense that the Dice Coefficient between them two would be quite small, but they nearly made the identical classification except that the indexes attributed to clusterings are different, in fact the RI between them is 0.94.

In the case of unsupervised learning, the topologically similar clusterings are equivalent, so it makes more sense to use RI to measure their similarity.