Thursday, October 9, 2008

The python set object

An oft-overlooked builtin python type is the set type, and its immutable equivalent, frozenset. A set is like a list or tuple except every member can only appear once; more accurately, it is like a dict that only has keys and no values.

Have you ever found yourself doing something like this?

found = {}
for item in someiterator():
if not found.has_key(item):
found[item] = 1
...
else:
...

The more optimal way of doing this in python is to use a set

found = set()
for item in someiterator():
if not item in found:
found.add(item)
...
else:
...

The 'in' operator used with sets appears to be as fast as lookup for a key in a dict, yet does not carry the memory overhead and inelegance of using a dict where you don't actually care about the values.

Another useful use of set is when you want to make the members of a list or tuple unique; just cast the list to a set, and cast it back  if and when you need to do so.

Frozenset does not appear to have any speed advantages over a set but probably takes up less memory.

No comments: