dburrows/ blog/ entry/ Sometimes, tools are useful.

I generally try not to get caught up in writing auxiliary tools for my free software projects. My experience in the past has been that given how little spare time I have each week, a tool that takes more than an hour or two to write quickly ends up becoming a time sink that prevents me from making progress on the main project. But occasionally it's worth the effort.

screenshot

I spent the last couple weeks putting together a tool to extract information about a dependency search from a log trace of aptitude's resolver. The search is displayed in a log interface that shows both the steps of the search in order, and the tree structure of the search. It took a long time to write, but after working with it for just a few hours, I was able to track down the bugs in the resolver modifications I was trying to write; without a tool like this, it would have taken a lot of painful grepping through log files to get all the information I needed.

In other news, the resolver now stores the choices it has made as a single set of choices, instead of using two different sets for the two types of choices it can make. There's no user-visible change (although I did fix another performance problem that affected safe-upgrade), but this will pave the way for adding more types of choices in a cleaner way: in particular, the resolver should get explicit support for package replacements in the near future.

The code for this is available in the tools directory of aptitude's experimental Mercurial repository, http://hg.debian.org/hg/aptitude/post-lenny. When I get around to merging it this with the main repository, it will also be available at http://hg.debian.org/hg/aptitude/head.


A brief digression regarding the implementation language

The tool is implemented in Haskell, a decision that I'm not sure was the right one. Haskell's data model was perfect for this task: having algebraic datatypes allows me to precisely represent cases where parts of the structure can't be inferred from the log file, or can only be inferred incompletely. And of course there are the other well-known benefits of Haskell (pure, a strong type system, good support for higher-order programming, etc). But all those benefits were wiped out by the time I spent fighting the runtime.

The thing is, Haskell is a lazy language. In theory, this means that it never evaluates anything you don't need and that you can work with arbitrarily large (even infinite) values and only allocate the parts you actually use. In practice, this means that you have to pay excruciatingly close attention to every single expression you write, or as soon as you run your program on non-trivial input, it will either run very slowly, eat all your memory, die with a stack overflow, or all of the above. Oh, and did I mention that laziness means that you can't have a runtime debugger, that trace statements are mostly-useless, because your program doesn't run in any predictable order, and that it's impossible to get backtraces when there's a fatal error?

I think if I were doing this over again, I'd use O'Caml. I'm less familiar with it and its syntax is clunkier, but I have used it productively in the past and it provides the core stuff I wanted from Haskell (functional programming with algebraic datatypes). If someone could make a language that was pure, had Haskell's nice syntax and type features, but was also strict, and that had decent tool support and development resources, I think I'd be in nerd heaven.