Monday, November 10, 2008

The hidden debug flag in the re module

The compile function of the re module, as I'm sure you already know, takes a second parameter of flags, such as re.I or re.X.

There is an undocumented flag, however, that will print out the parse tree of the regex. This is the re.DEBUG flag.

Here's some trivial output:

>>> re.compile(u'ell2\u1f4c', re.DEBUG)
literal 101
literal 108
literal 108
literal 50
literal 8012
The 'literal' statements are followed by the ascii dec values of the characters in the pattern. I threw in a unicode character and you can see the unicode code point as the last item.

This will also tell you whether the pattern is being compiled for the first time or if it is being pulled out of the re module's cache.

No comments: