Image Filtering · #05 of 16

Edge Detection

Finding Boundaries

input

edges

edge threshold = 80

🖼 Upload

A young girl holding pink flowers, beside the same photo reduced to a white-on-black line drawing produced by an edge detector. — A photograph (left) and its edges (right). Almost everything you needed to recognize the scene survives the throw-away. — JonMcLoone, CC BY-SA 3.0

A child holds out a fistful of pink flowers. Now throw away the color. Throw away the soft shading on her cheek, the warm light in her hair, the green blur of the garden behind her. Keep only the places where the picture changes suddenly: the rim of a petal, the edge of a sleeve, the line of a smile.

What is left is a tangle of thin white lines on black. And yet you still see a girl, still see flowers, still read the whole scene in an instant. You threw away almost every number in the image and lost almost no meaning.

An edge is where the image changes sharply, and edges carry the lion's share of what a picture means.

The trick a vision system pulls here is older than computers. An edge is just a place where brightness jumps: dark fabric against a bright face, a petal against shadow. Our eyes are wired to chase exactly these jumps, which is why a four-line cartoon of a face is unmistakably a face. Edge detection is the act of teaching a machine to find those jumps and nothing else, turning a dense grid of pixels into a sparse sketch of boundaries.

The demo at the top of this page builds that sketch in front of you. Drag the threshold slider, flip the output between Gradient and Edges, then tap 📷 Camera and watch your own outline appear live beside the real feed. Look for the moment when raising the threshold erases the noise and faint texture first, leaving only the strongest contours standing. That is the central tension of edge detection in one slider: too low and you drown in spurious edges, too high and real boundaries vanish. The continuous Gradient view shows the raw evidence; the Edges view shows the verdict after a cutoff.

An edge is a steep slope

Picture a single row of the image as a landscape, where bright pixels are high ground and dark pixels are valleys. Across a flat wall the landscape is level. At the boundary of an object it drops off a cliff. The steepness of that cliff is the gradient, and a steep gradient means an edge.

In calculus the steepness of a function is its derivative. An image has two directions to vary in, so it has two partial derivatives, and together they form the gradient vector:

$\nabla I = \left( \frac{\partial I}{\partial x},\ \frac{\partial I}{\partial y} \right)$

Here $I$ is the image brightness at a pixel. $\frac{\partial I}{\partial x}$ is how fast brightness changes as you step sideways (it lights up at vertical edges). $\frac{\partial I}{\partial y}$ is how fast it changes as you step downward (it lights up at horizontal edges). The two together point in the direction of steepest increase. We usually care about two summary numbers built from them:

$|\nabla I| = \sqrt{G_x^2 + G_y^2}, \qquad \theta = \operatorname{atan2}(G_y, G_x)$

The magnitude $|\nabla I|$ is how strong the edge is (the height of the cliff), written with the shorthand $G_x = \partial I / \partial x$ and $G_y = \partial I / \partial y$ . The direction $\theta$ is which way the cliff faces, an angle pointing across the edge rather than along it. The demo at the top computes exactly this magnitude on a synthetic scene, an uploaded photo, or your live camera, then keeps every pixel above the threshold.

Measuring the slope with Sobel

You cannot take a true derivative of a pixel grid (it has no smooth formula), so you estimate the slope from neighbors. The classic estimator is the Sobel operator: two small 3×3 kernels, slid across the image by convolution (the operation from the previous lesson), that approximate $G_x$ and $G_y$ .

Gx (vertical edges)      Gy (horizontal edges)
[-1  0  +1]              [-1  -2  -1]
[-2  0  +2]              [ 0   0   0]
[-1  0  +1]              [+1  +2  +1]

Notice the structure. The $G_x$ kernel subtracts the left column from the right column: if they differ, brightness is changing left-to-right, so there is a vertical edge. The middle row gets double weight, which quietly blurs along the edge while differencing across it, making the estimate less jittery in noise. Each kernel sums to zero, so a flat region (every neighbor equal) returns exactly zero. Only change survives.

A grayscale photograph of steam-engine valve gear and piping, the standard test image used in edge-detection papers. — The 'valve' test image, a staple of edge-detection demonstrations since the 1980s. Its mix of crisp metal rims, repeating bolt circles, and gentle shading makes it a brutal honesty test for any detector. — Simpsons contributor, CC BY-SA 3.0

Run Sobel on an image like the valve above and you get the raw Gradient view from the demo: bright where rims and rods cut against their background, dark across flat castings. But raw gradients are fat and fuzzy. A single physical edge produces a thick smear of high-gradient pixels because the cliff has finite width and the image has noise. A good detector still has work to do: it must decide which of those pixels are the edge.

Canny: turning evidence into a verdict

In 1986 a graduate student named John Canny asked a sharper question than "where are the gradients?" He asked what an optimal edge detector would even mean, and answered it with three goals: catch every real edge (good detection), place each edge exactly (good localization), and mark each edge once, not as a thick band (single response). From those criteria he derived an algorithm that is still the gold standard four decades later. The Canny edge detector runs five stages:

Gaussian blur to suppress noise, so the detector chases real cliffs, not speckle.
Sobel gradients to measure magnitude and direction at every pixel.
Non-maximum suppression to thin each fat ridge down to a one-pixel line, keeping only the pixel that is a local maximum across the edge.
Double threshold to sort pixels into strong, weak, and discarded by two cutoffs instead of one.
Hysteresis to keep a weak edge only if it touches a strong one, linking faint contours into the bold ones they belong to.

John Canny seated in his office at UC Berkeley. — John F. Canny · b. 1958 Australian computer scientist at **UC Berkeley**. As a graduate student in 1986 he published a *computational theory of edge detection*, deriving the optimal detector from first principles. His five-stage algorithm is still the default in OpenCV today.

Stage 3 is where edge detection gets clever, and the demo's threshold control gives you a feel for stages 4 and 5. To suppress non-maxima you need the gradient direction, $\theta$ , which you snap to the nearest of four orientations (horizontal, vertical, and the two diagonals) so you can ask the simple question: is this pixel brighter in gradient than its two neighbors along $\theta$ ? If not, it is on the shoulder of the cliff, not the peak, and gets zeroed. That is how a smear becomes a line.

A semicircle split into colored angular sectors labeled 45, 90, and 135 degrees, illustrating how gradient angles are binned into four directions. — Gradient directions are snapped into four bins (0, 45, 90, 135 degrees) before non-maximum suppression, so each pixel is compared against the two neighbors that lie across its edge. — Kaba Misha, CC BY-SA 3.0

The double threshold then replaces the single slider with two. Pixels above the high threshold are certainly edges. Pixels below the low threshold are certainly not. The interesting ones lie between: weak candidates that hysteresis keeps only if a chain of pixels connects them back to a strong, confident edge. This is why a faint contour that trails off a bold one survives, while an isolated speck of noise of the same strength does not. Drag the demo's threshold and you are watching a one-dimensional version of this decision: every pixel of evidence judged against a cutoff.

Reading the edges back out

A clean edge map is rarely the final product; it is raw material. Stack subpixel fitting on top and you can locate a boundary to a fraction of a pixel, which matters enormously in measurement.

A medical X-ray angiogram of blood vessels in the center, with two zoomed crops on either side showing yellow contour lines traced precisely along a vessel boundary. — Subpixel edge detection on an angiogram. The zoomed crops show contour lines fitted between pixels to trace a blood-vessel boundary more precisely than the pixel grid alone allows. — Machanguillo1, CC BY-SA 4.0

That precision is why edge detection quietly powers so much: lane lines for self-driving cars, the paper boundary your phone snaps to when it scans a document, the outline of a tumor in a radiograph, the silhouette that a robot arm uses to grasp a part. In each case the pipeline is the same shape you have been dragging in the demo: find the gradient, decide on a threshold, keep the boundaries, discard the rest.

For the advanced reader → Why a Gaussian first, and how blur and differentiation fold into one kernel

Differentiation amplifies noise. A derivative measures change, and noise is change, so running Sobel on a raw image makes every speck of sensor grain shout. Canny's first stage answers this by smoothing with a Gaussian of standard deviation $\sigma$ :

$G_\sigma(x, y) = \frac{1}{2\pi\sigma^2}\, e^{-\frac{x^2 + y^2}{2\sigma^2}}$

The beautiful part is that you never need two passes. Convolution is associative and the derivative is linear, so differentiating the blurred image equals convolving once with the derivative of the Gaussian:

$\frac{\partial}{\partial x}\big(G_\sigma * I\big) = \left(\frac{\partial G_\sigma}{\partial x}\right) * I$

So blur-then-difference collapses into a single oriented kernel applied once. Choosing $\sigma$ trades detection against localization: a large $\sigma$ smooths away noise but rounds off corners and shifts edges, while a small $\sigma$ pins edges precisely but lets noise through. That trade-off is exactly the tension you feel in the demo's threshold, pushed one stage earlier into the blur.

Marr and Hildreth took the second-derivative route instead. Their operator is the Laplacian of Gaussian,

$\nabla^2 G_\sigma = \frac{\partial^2 G_\sigma}{\partial x^2} + \frac{\partial^2 G_\sigma}{\partial y^2},$

and edges are the zero crossings of $\nabla^2 G_\sigma * I$ . An edge is a peak in the first derivative, which is the same as a zero crossing in the second derivative, so both schools are looking at the same cliff from two sides of the calculus.

Key takeaways

An edge is a steep gradient. Brightness changing sharply over a short distance is the entire signal; everything else can be thrown away with surprisingly little loss of meaning.
The gradient has a magnitude and a direction. $|\nabla I| = \sqrt{G_x^2 + G_y^2}$ says how strong the edge is; $\theta = \operatorname{atan2}(G_y, G_x)$ says which way it faces.
Sobel estimates the slope by convolution. Two zero-sum 3×3 kernels difference across the edge while averaging along it, so flat regions vanish and only change survives.
Canny turns evidence into a verdict. Blur, gradient, non-maximum suppression, double threshold, and hysteresis together catch real edges, place them precisely, and mark each one exactly once.
Thresholds are a trade-off, not a setting. Too low drowns you in noise, too high erases real boundaries; hysteresis softens the choice by linking weak edges to strong neighbors.

Edge detection is, in the end, an act of forgetting done well. The girl with her flowers survives being reduced to a fistful of white lines because the lines were never decoration; they were where the meaning lived all along.

Every detector since, from Sobel's lab folklore to Canny's optimal criteria, is a refinement of one ancient instinct your own eyes obey before you are even aware of looking: find where the world changes, and there you will find its shape.