Monday, October 6, 2008

Tuples are compared according to a lexicographical comparison of their elements in order

If you have a list of tuples, sort() will sort them by lexicographical order.

Example:

[(3, 'zzzzd'), (2, 'hello'), (3, 'whatever')]

calling .sort() on this object will result in 

[(1, 'whatever'), (2, 'hello'), (3, 'zzzzd')]

This is because python evaluates tuples in comparisons according to pairs of indexes in order. If there is a difference between the 0-index of the tuples, it will be evaluated according to that first pair; if the first indexes are the same, it will move on the second index, and so on.

(1, 1) < (2, 0) # True -- based on first elements
(1, 1) < (-1, 0) # False -- based on first elements
(1, 1) < (1, 2) # True -- first elements the same, comparison decided based on the second elements

To make the comparison explicit and to force it to use the first and only the first index in the comparison, use the cmp kwarg to sort:

[(1, 'whatever'), (2, 'hello'), (3, 'zzzzd')].sort(cmp=lambda x, y: cmp(x[0], y[0]))

(note: sort will sort the list in-place, so the above won't actually evaluate to anything. sorted() returns a sorted copy of whatever it's run on, and accepts an identical cmp parameter.)

This will make how the comparison is being executed explicit, and will also allow you to sort using any index of the tuple or any aspect of the objects in the list.

list.sort() and sorted() also accept a reverse kwarg that accepts a boolean that will return the list in the opposite order -- though a custom cmp kwarg could do the same implicitly. 

No comments: