The Secret Life of Bears: October 2008

Wednesday, October 29, 2008

The evolution of coolness string interpolation in SQL in python

Phase 1, tuples:

c.execute("update person set name=%s, address=%s where id=%s", ('Bob Smith', '3 Red Rock Rd', 1))

Phase 2, dicts:

c.execute("update person set name=%(name)s, address=%(addr)s where id=%(id)s", {'name':'Bob Smith', 'addr':'3 Red Rock Rd', 'id':1))

Phase 3, locals():

name = 'Bob Smith'

addr = '3 Red Rock Rd'

id = 1

c.execute("update person set name=%(name)s, address=%(addr)s where id=%(id)s", locals())

Most people don't seem aware of dict string interpolation, which is really much more convenient when you're using a large number of variables. This was cool but seemed annoying because you now need to make a large dict, requiring more typing. Now, locals() returns a dict of all of variables that exist in the local scope. Since we probably have all of the stuff you want to stick into an SQL statement sitting in other variables anyway, combine dict interpolation and locals() and we can stick variables into SQL much more conveniently.

Tuesday, October 28, 2008

apply a custom sort order

The best way to apply a custom order to a field is to use the field() function.

Via this comment thread.

Monday, October 27, 2008

How to catch more than one kind of exception in the same except statement

Example:

import urllib
try:
return urllib.urlopen(source)
except (IOError, OSError):
pass

Example borrowed from the always excellent Dive Into Python.

speed up temporary table inserts

Temporary tables use the MyISAM storage engine by default (in fact, you can specify any storage engine for a temporary table, a fact that few people realize).

You can thusly speed up bulk inserts into a temporary table just like any other table. For the case of default MyISAM temporary tables, I was able to approximately halve the insert execution time of a big bulk insert by surrounding the insert operations with alter table index_builder disable keys and alter table index_builder enable keys.

Another tip for temporary tables is if the temporary table is reasonably small, you can specify engine=Memory to create the table in memory rather than MyISAM, just be aware of the implications of memory/heap tables.

replacement apply syntax

The useful apply() function is deprecated.

The alternative is the "extended call syntax". It looks identical to the *kargs and **kwargs syntax in function definitions, but in the inverse in that you use it to pass arguments to functions rather than defining their input.

def foo(a,b):

pass

foo(*(1,2))

foo(**{'a':1, 'b', 2})

either of these will generate errors:

foo(**{'v':3})

foo(*(1,2,3,))

Friday, October 24, 2008

Tell if a table has been modified

Want to tell if two tables are exactly the same?

Use CHECKSUM TABLE

Or, for more advanced methods, use maatkit's mk-table-checksum.

Thursday, October 23, 2008

Quick way to interpolate local variables into a string

x = "variable 1"

y = "variable 2"

"%(x)s %(y)s" % locals()

This is so easy it hurts.

We can use the dict syntax to interpolate variables into a string, which is often more convenient; and, we can use locals() to return a dict of all locally-bound variables.

Wednesday, October 22, 2008

How to import a bunch of local variables into a dict by name

Here's a cool little function I wrote:

def _vars2dict(vars, *vars_wanted):
vars_wanted = set(vars_wanted)
return dict(filter(lambda i: i[0] in vars_wanted, vars.iteritems()))

Call it with locals(), and any subsequent variables passed to it will be returned in a dict by name.

Here's an example, using a variable we create and a built-in function

>>> x =1
>>> _vars2dict(locals(), 'log', 'x')
{'x': 1, 'log': }

This is useful if you have a lot of local variables that you want to interpolate into a string using the more intelligible dict format rather than the tuple format.

Monday, October 13, 2008

The defaultdict object

In the collections module, there's a object called defaultdict. This is useful (and supposedly faster) if you were to instead call setdefault() for every insert into a regular dict.

This is especially useful for nested dicts; rather than testing for existence or calling setdefault() at every level of the dict; the following code below will run works and illustrates an example:

bidict = collections.defaultdict(dict)

bidict['a']['b'] = 1

Both levels of the dict will be initialized correctly automatically.

A more pedestrian usage might be to initialize every value to zero, which can be accomplished simply, since calling int() will always return 0:

termfreq = collections.defaultdict(int)

termfreq['unknownterm'] == 0

Sunday, October 12, 2008

itertools.groupby

dave-squared has a post with some examples of how to use the interesting itertools.groupby() function, which works kind of like the SQL group by clause.

Friday, October 10, 2008

Example usage of the glob module

The python glob module in the standard library is a simple, useful way of expanding wildcard matches to filenames. It dovetails nicely with the os, os.path, and shutil modules.

Example usage:

import os, glob

TEMP_DIR = '/tmp/whatever'

for file in glob.glob(os.path.join(TEMP_DIR, '*.tmp')):

os.remove(file)

The glob module contains just two functions: glob() and its ~~case-insensitive~~ generator-returning equivalent, iglob().

Thursday, October 9, 2008

How to convert all tabs into spaces in vim

One of the sometimes annoying things about python is the IndentationErrors you get from the conflicting appearance of spaces and tab characters, the difference between which is invisible to the naked eye.

In vim, to convert spaces to tabs throughout a file, just use the command %retab!

How to automatically include modules in your python shell

1. Create a environmental variable in your shell config PYTHONSTARTUP that contains a full path to a python script which we're about to create. For bashrc, this looks like:

export PYTHONSTARTUP="/home/nick/.pythonstart.py"

This file will be executed every time you start a python shell (but not every time you execute a script).

2. In the python file we specified put import statements including whatever modules you want. Mine looks like:

import os, sys, re

from pprint import pprint

from datetime import datetime

from math import *

Every time you start an interactive session these modules will be automatically included and you won't need to bother importing them.

You can see the official docs on the PYTHONSTARTUP variable in the manual.

The python set object

An oft-overlooked builtin python type is the set type, and its immutable equivalent, frozenset. A set is like a list or tuple except every member can only appear once; more accurately, it is like a dict that only has keys and no values.

Have you ever found yourself doing something like this?

found = {}

for item in someiterator():

if not found.has_key(item):

found[item] = 1

...

else:

...

The more optimal way of doing this in python is to use a set

found = set()

for item in someiterator():

if not item in found:

found.add(item)

...

else:

...

The 'in' operator used with sets appears to be as fast as lookup for a key in a dict, yet does not carry the memory overhead and inelegance of using a dict where you don't actually care about the values.

Another useful use of set is when you want to make the members of a list or tuple unique; just cast the list to a set, and cast it back if and when you need to do so.

Frozenset does not appear to have any speed advantages over a set but probably takes up less memory.

Tuesday, October 7, 2008

Expanding wildcard matches on files

Use the glob module.

See also the fnmatch.

Um, all of them

Monday, October 6, 2008

2 snakes enter, 1 snake leaves

Valuable slashdot discussion on python 2.6 and 3.0

Slashdot: Python 2.6 to clear the way for 3.0, coming next month

tuples are always greater than strings

Any tuple is greater than any string.

(1,2) > 'ab' # True

I'm told this is for archaic reasons.

In python 3.0, attempting this will raise an exception.

Tuples are compared according to a lexicographical comparison of their elements in order

If you have a list of tuples, sort() will sort them by lexicographical order.

Example:

[(3, 'zzzzd'), (2, 'hello'), (3, 'whatever')]

calling .sort() on this object will result in

[(1, 'whatever'), (2, 'hello'), (3, 'zzzzd')]

This is because python evaluates tuples in comparisons according to pairs of indexes in order. If there is a difference between the 0-index of the tuples, it will be evaluated according to that first pair; if the first indexes are the same, it will move on the second index, and so on.

(1, 1) < (2, 0) # True -- based on first elements

(1, 1) < (-1, 0) # False -- based on first elements

(1, 1) < (1, 2) # True -- first elements the same, comparison decided based on the second elements

To make the comparison explicit and to force it to use the first and only the first index in the comparison, use the cmp kwarg to sort:

[(1, 'whatever'), (2, 'hello'), (3, 'zzzzd')].sort(cmp=lambda x, y: cmp(x[0], y[0]))

(note: sort will sort the list in-place, so the above won't actually evaluate to anything. sorted() returns a sorted copy of whatever it's run on, and accepts an identical cmp parameter.)

This will make how the comparison is being executed explicit, and will also allow you to sort using any index of the tuple or any aspect of the objects in the list.

list.sort() and sorted() also accept a reverse kwarg that accepts a boolean that will return the list in the opposite order -- though a custom cmp kwarg could do the same implicitly.

String interpolation using dicts

Everyone knows about about the string interpolation operator, %:

"(%s, %s)" % (1, 2)

Rarely do you see the often more useful use of interpolation using dicts. Not only is it much more readable once the number of variables you're trying to interpolate grows large, it will also allow you to repeatedly insert the same element if need be:

"(%(key)s, %(key)s)" % {'key': 1}

The single most valuable thing I've learned about python

... pydoc and help()

pydoc is a executable that's installed with python, pass it the name of a module to see a generated man page for that module (e.g. pydoc string). You can also pass it a file path, but for a local python file, in order for it to work you have to use the full path or precede the filename with './', (e.g. pydoc ./whatever.py).

help() runs within a python shell. Run it without arguments to get general help with python; pass a string containing the name of a module to get documentation for that module, or pass any it any existent module or object reference and it will generate docs for that object.

This will save you a lot of lookups to the manual. For instance, to remind yourself of the methods of dict objects, you can just run help() on any dict object, or just an empty such object (e.g. help({})).

Sunday, October 5, 2008

Um, all of them

Ah, that new blog smell.

This blog will be a place where I store useful tidbits that I come across as I come across them. And yes, the occasional distraction.