Geometry & Transforms · #12 of 16

Contour Detection

Finding Shapes

A 3D reconstruction of a human femur built by stacking the contours segmented from a series of cross-sectional medical scans
A femur rebuilt in 3D by stacking the contours traced from many cross-sectional scans. The closed outlines on each slice are the raw material for the marching-cubes surface. — Newe A, Ganslandt T, CC BY 3.0

A radiologist scrolls through a stack of CT slices. On each one, a pale ring of bone floats against darker tissue. Her eye, without effort, closes that ring into a single thing: femur. Slice after slice, the rings stack, and a leg bone takes shape in her mind in three dimensions.

A computer cannot see the femur. It sees a grid of numbers. To make it perceive the bone, you have to teach it to do what her eye did instantly: walk along the edge of the bright region, never lifting the pen, and come back to where it started. That traced loop is a contour, and once you have it you can measure the bone, count the bones, match this bone against a library of bones.

A contour is the act of closing an edge into a shape: the moment a boundary stops being a scatter of pixels and becomes an object you can hold.

The previous chapters gave us edges: places where brightness changes sharply. But an edge is a local verdict, a yes-or-no at a single pixel. It does not know whether it belongs to the same coin as the edge two pixels over. Contour detection is the step that stitches those local edges into a complete, ordered, closed boundary, so that "the outline of the coin" becomes one connected list of points you can reason about as a whole.

Trace it yourself

The simulator below takes an image, thresholds it into pure black-and-white, then walks the borders of every white region and draws each traced loop in its own color. Drag the threshold slider and watch the outlines snap to the shapes. Then tap Camera and point it at something with strong contrast (a dark pen on white paper works beautifully). Look for the moment when a single blob splits into two as the threshold crosses a critical value: that is the algorithm rediscovering the boundary in real time.

input
extracted contours

What you just watched is the whole pipeline in miniature. The threshold turns a continuous image into a binary mask of "object" and "not object." The tracer then follows the staircase frontier between the two, recording each turn as a point, until it returns home. Notice that the result is fundamentally different from an edge map: an edge map is a fog of bright pixels, while a contour is an ordered sequence with a beginning, an end (which equals the beginning), and an inside.

From threshold to closed loop

The governing idea is simple to state. A binary image BB assigns every pixel a label,

B(x,y)={1if I(x,y)T0otherwiseB(x, y) = \begin{cases} 1 & \text{if } I(x, y) \ge T \\ 0 & \text{otherwise} \end{cases}

where I(x,y)I(x, y) is the original intensity at column xx, row yy, and TT is the threshold you chose. A pixel is a boundary pixel when it is foreground (B=1B = 1) but has at least one background neighbor (B=0B = 0). A contour CC is then the ordered set of boundary pixels you reach by always keeping the foreground on the same side as you walk:

C={p0,p1,,pn1}withpn=p0C = \{\, p_0, p_1, \dots, p_{n-1} \,\} \quad \text{with} \quad p_n = p_0

Here each pi=(xi,yi)p_i = (x_i, y_i) is a boundary pixel, and the closure condition pn=p0p_n = p_0 says the walk ends where it began. That last line is the whole difference between an edge and a contour: the loop closes.

The Suzuki-Abe tracer

OpenCV's findContours implements an algorithm published in 1985 by Satoshi Suzuki and Keiichi Abe, "Topological structural analysis of digitized binary images by border following." Its insight is that you can trace every boundary in a single raster scan, top to bottom, left to right. When the scan hits a transition from background to foreground, it has found a new border. It then "follows" that border all the way around, marking each pixel with a unique border number, before resuming the scan. Because each border gets its own label, the algorithm recovers not just the shapes but how they nest.

A diagram of 4-connectivity: a center pixel linked to its top, bottom, left, and right neighbors
4-connectivity: a pixel touches only its edge neighbors (up, down, left, right). The tracer's notion of what counts as 'connected' decides which pixels belong to the same loop. — Dhull003, Public domain
A diagram of 8-connectivity: a center pixel linked to all eight surrounding neighbors including the diagonals
8-connectivity adds the four diagonals. Choose 8-connectivity for the foreground and a thin diagonal bridge will hold two blobs together that 4-connectivity would split apart. — Dhull003, Public domain

The choice between 4-connectivity and 8-connectivity is not academic. It decides whether two regions touching only at a corner count as one object or two, and it is a perennial source of off-by-one bugs in blob counting. The convention that keeps borders self-consistent is to treat foreground and background with opposite connectivities: if foreground pixels connect 8-ways, background connects 4-ways, and vice versa.

Hierarchy: shapes inside shapes

Contours are rarely lonely. A letter "O" has an outer ring and an inner ring; a washer has a disk and a hole; a face has eyes inside a head. Suzuki and Abe's real contribution was tracking this nesting as a hierarchy: every contour records its parent, its first child, and its siblings.

external contour internal contour (hole) object
Contours nest: the external boundary wraps the filled region, an internal boundary rings each hole.

OpenCV exposes this through retrieval modes. RETR_EXTERNAL keeps only the outermost loops. RETR_TREE reconstructs the full parent-child forest, so you can say with certainty that this hole lives inside that disk. The hierarchy is stored as four indices per contour: next sibling, previous sibling, first child, and parent, with 1-1 marking "none."

A 3D CT reconstruction of a human thorax with bone and tissue rendered in distinct regions
A 3D CT of the thorax. Segmentation labels each voxel; the contours extracted from those labels are what let surgeons and software treat 'rib' and 'lung' as separate, measurable objects. — Mikael Häggström, CC0

What a contour tells you

Once a boundary is a closed point list, it stops being a picture and becomes geometry you can measure. The key descriptors:

The area and centroid both fall out of image moments. The raw moment of order (p,q)(p, q) for a region RR is

Mpq=(x,y)RxpyqM_{pq} = \sum_{(x,y)\,\in\,R} x^{p}\, y^{q}

so M00M_{00} is simply the pixel count (the area), and the centroid is the first moments normalized by the zeroth:

xˉ=M10M00,yˉ=M01M00\bar{x} = \frac{M_{10}}{M_{00}}, \qquad \bar{y} = \frac{M_{01}}{M_{00}}

Each symbol earns its keep: M00M_{00} totals the region, M10M_{10} weights every pixel by its column, M01M_{01} weights by its row, and dividing cancels the size so only the position survives.

For the advanced reader → How seven numbers recognize a shape regardless of how it is moved, sized, or spun

To match shapes you want descriptors that ignore where the shape sits, how big it is, and which way it points. The path runs in three steps.

First, recenter. The central moments subtract the centroid so the descriptor no longer cares about position:

μpq=(x,y)R(xxˉ)p(yyˉ)q\mu_{pq} = \sum_{(x,y)\,\in\,R} (x - \bar{x})^{p}\, (y - \bar{y})^{q}

Second, rescale. Dividing by a power of the area removes the dependence on size, giving the normalized central moments:

ηpq=μpqμ00γ,γ=p+q2+1\eta_{pq} = \frac{\mu_{pq}}{\mu_{00}^{\,\gamma}}, \qquad \gamma = \frac{p + q}{2} + 1

Third, combine them so rotation cancels too. In 1962 Ming-Kuei Hu showed that seven specific polynomial combinations of the ηpq\eta_{pq} are invariant to translation, scale, and rotation. The first two are

h1=η20+η02h_1 = \eta_{20} + \eta_{02}h2=(η20η02)2+4η112h_2 = (\eta_{20} - \eta_{02})^2 + 4\,\eta_{11}^2

These Hu moments are why OpenCV's matchShapes can decide that a logo photographed from across a room, tilted and shrunk, is the same logo it has in its catalog. Two shapes with nearly equal Hu vectors are geometrically the same shape, full stop.

A related simplification is the Douglas-Peucker algorithm, which thins a contour of hundreds of points down to a handful of corners by recursively discarding points that lie close to the line between their neighbors. Run it on a quadrilateral and you get back exactly four vertices: the basis of every "this contour has 4 sides, it might be a document" detector.

Key takeaways

The radiologist's eye closed each pale ring into a bone without her ever naming the act. We have now named it, and broken it into steps a machine can follow: set a threshold, keep the foreground on your left, walk until you come home. Do that on every slice and the loops stack into a femur; do it on a page and the loops resolve into letters; do it on a hand and the loops count the fingers.

A contour is the smallest unit of seeing a thing as a thing. Everything downstream, every measurement, every match, every reconstruction, begins the instant the pen returns to where it started.