Parsing Python: chouyu

chouyu_31

Parsing Python

Apr 11, 2009 23:42

As work continues on the next version of PyPE, I've discovered a few cases where having the AST for the module being edited is actually the most convenient way to discover information about the document.

In the past, any time I've needed some new nifty functionality, I've written custom parsers. I've got one to pull tags, another to look for function definitions and docstrings when my parser.suite()-based compiler fails, another compiler.ast-based unrepr function such that unrepr(repr(obj)) == obj, one for latex, and even one for C/C++. One of the (many) reasons why I don't add Java support to PyPE is because I don't want to write another parser.

If I remove the need to be able to display information about a document when it's syntax isn't correct, then I can remove one of my custom parsers. If I swap to using the AST directly (rather than doing a token match over parser.suite() output), then basically all of my other needs are covered by taking a pass (or 3) through the AST of a parsed document...which is *really* tempting. Why don't I do it straight out? Parsing pype.py using compiler.parse() brings the memory of a Python console from 4M to 25M, even after deleting the output. Thankfully, it doesn't leak when I parse/delete in a loop regularly (hanging around 25-30M), but enabling psyco leaks roughly 3M/cycle. parser.suite() only brings memory use to 11M, and with psyco, actually uses less (9M after parsing finishes).

At this point, I'm basically damned if I do (memory use goes up), damned if I don't (I have to write *even more* custom parsers to add functionality). This wouldn't be that big of a deal, but I've got a feature request to auto-parse imported files to be able to autocomplete and show calltips. I'm "meh" on that one, but one of my coworkers suggested something that is *quite* useful: autocompleting on local variable names (not their methods, except for self). How many times do you scroll up and down to try to find the right variable name? Diito for globals and imported modules.

I'm thinking about only supporting this new and nifty functionality for people with Python 2.6 or later, as then I could use multiprocessing to just pass the parsing off to another process, cycling that secondary process every once in a while.

It's late, I'm tired. We'll see.

algorithms, software, python