Monday, November 17, 2008

Get subpattern matches balanced even if they're optional

This will give you a mysterious "unbalanced grouping" error:

re.sub('.(.)?)', r'aaa\1')

Because the parenthetical grouping is optional, trying to access it via the \1 backreference will result in an error.

The solution is to replace the ? operator with a null alternative in the paren:

re.sub('.(|.)', r'aaa\1')

Which gives the grouping the option of containing nothing, thereby maintaining the existence of a backreference (though the backreference will of course contain an empty string, which is what you would actually expect).

No comments: