Recipe18.17.Computing the Convex Hulls and Diameters of 2D Point Sets

Recipe 18.17. Computing the Convex Hulls and Diameters of 2D Point Sets

Credit: David Eppstein, Dinu Gherman

Problem

You have a list of 2D points, represented as pairs (x, y), and need to compute the convex hull (i.e., upper and lower chains of vertices) and the diameter (the two points farthest from each other).

Solution

We can easily compute the hulls by the classic Graham's scan algorithm, with sorting based on coordinates rather than radially. Here is a function to perform this task:

def orientation(p, q, r):     ''' >0 if p-q-r are clockwise, <0 if counterclockwise, 0 if colinear. '''     return (q[1]-p[1])*(r[0]-p[0]) - (q[0]-p[0])*(r[1]-p[1]) def hulls(Points):     ' Graham scan to find upper and lower convex hulls of a set of 2D points '     U = [  ]     L = [  ]     # the natural sort in Python is lexicographical, by coordinate     Points.sort( )     for p in Points:         while len(U) > 1 and orientation(U[-2], U[-1], p) <= 0:             U.pop( )         while len(L) > 1 and orientation(L[-2], L[-1], p) >= 0:             L.pop( )         U.append(p)         L.append(p)     return U, L

Given the hulls, the rotating calipers algorithm provides all pairs of points that are candidates to be set's diameter. Here is a function to embody this algorithm:

def rotatingCalipers(Points):     ''' Given a list of 2d points, finds all ways of sandwiching the points         between two parallel lines that touch one point each, and yields the         sequence of pairs of points touched by each pair of lines.  '''     U, L = hulls(Points)     i = 0     j = len(L) - 1     while i < len(U) - 1 or j > 0:         yield U[i], L[j]         # if all the way through one side of hull, advance the other side         if i == len(U) - 1:             j -= 1         elif j == 0:             i += 1         # still points left on both lists, compare slopes of next hull edges         # being careful to avoid divide-by-zero in slope calculation         elif (U[i+1][1]-U[i][1]) * (L[j][0]-L[j-1][0]) > \              (L[j][1]-L[j-1][1]) * (U[i+1][0]-U[i][0]):             i += 1         else: j -= 1

Given all the candidates, we need only to scan for the max on pairwise point-point distances of candidate pairs of points to get the diameter. Here is a function that performs exactly this task:

def diameter(Points):     ''' Given a list of 2d points, returns the pair that's farthest apart. '''     diam, pair = max( [((p[0]-q[0])**2 + (p[1]-q[1])**2, (p,q))                       for p,q in rotatingCalipers(Points)] )     return pair

Discussion

As function hulls shows, we can apply Graham's scan algorithm without needing an expensive radial sort as a preliminary step: Python's own built-in sort (which is lexicographical, meaning, in this case, by x coordinate first, and by y coordinate when the x coordinates of two points coincide) is sufficient.

From hulls, we get the upper and lower convex hulls as distinct lists, which, in turn, helps in the rotatingCalipers function: that function can maintain separate indices i and j on the lower and upper hulls and still be sure to yield all pairs of sandwich boundary points that are candidates to be the set's diameter. Given the sequence of candidate pairs, function diameter's task is quite simpleit boils down to one call to built-in max on a list comprehension (a generator expression would suffice in Python 2.4) that associates the pairwise point distance to each pair of candidate points. We use the squared distance, in fact. There's no reason to compute a costly square root to get the actual non-squared distance: we're comparing only distances, and for any non-negative x and y, x < y and sqrt(x) < sqrt(y) always have identical truth values. (In practice, however, using math.hypot(p[0]-q[0], p[1]-q[1]) in the list comprehension gives us just about the same performance.)

The computations in this recipe take care to handle tricky cases, such as pairs of points with the same x coordinate, multiple copies of the same point, colinear triples of points, and slope computations that, if not carefully handled, would produce a division by zero (i.e., again, pairs of points with the same x coordinate). The set of unit tests that carefully probe each of these corner cases is far longer than the code in the recipe itself, which is why it's not posted on this cookbook.

Some of the formulas become a little simpler and more readable when we represent points by complex numbers, rather than as pairs of reals:

def orientation(p, q, r):     return ((q - p) * (r - p).conjugate( )).imag ...         # still points left on both lists, compare slopes of next hull edges         # being careful to avoid divide-by-zero in slope calculation         elif ((U[i+1] - U[i]) * (L[j] - L[j-1]).conjugate( )).imag > 0:             i += 1         else: j -= 1 ... def diameter(Points):     diam, pair = max([(abs(p-q), (p,q)) for p,q in rotatingCalipers(Points)])     return pair

If we represent points by complex numbers, of course, we cannot just use Points.sort( ) any more because complex numbers cannot be compared. We need to "pay back" some of the simplification by coding our own sort, such as:

    aux = [ (p.real, p.imag) for p in Points ]     aux.sort( )     Points[:] = [ complex(*p) for p in aux ]     del aux

or equivalently, in Python 2.4:

    Points.sort(key=lambda p: p.real, p.imag)

Moreover, under the hood, a complex numbers-based version is doing more arithmetic: finding the real as well as imaginary components in the first and second formula, and doing an unnecessary square root in the third one. Nevertheless, performance as measured on my machine, despite this extra work, turns out to be slightly better with this latest snippet than with the "Solution"'s code. The reason I've not made the complex-numbers approach the "official" one, aside from the complication with sorting, is that you should not require familiarity with complex arithmetic just to understand geometrical computations.

If you're comfortable with complex numbers, don't mind the sorting issues, and have to perform many 2D geometrical computations, you should consider representing points as complex numbers and check whether this provides a performance boost, as well as overall simplification to your source code. Among other advantages, representing points as complex numbers lets you use the Numeric package to hold your data, saving much memory and possibly gaining even more performance, when compared to the natural alternative of representing points as (x, y) tuples holding two floats.

Recipe18.17.Computing the Convex Hulls and Diameters of 2D Point Sets