dburrows/ blog/ entry/ Package Management Sudoku

Update: There are more notes on how aptitude performs as a Sudoku solver here.

Russell Coker recently wrote a post entitled Ownership of the Local SE Linux Policy. This post has nothing to do with the substance of his post, which is a discussion of how distributions should configure SELinux by default. I know nothing about SELinux, but something that Russell said in passing caught my attention:

I am not aware of the Debian package dependencies (or those of any other distribution) being about to represent that the postfix package depends on selinux-policy-default-postfix if and only if the selinux-policy-default package is installed. Please note that I am not suggesting that we add support for such things, a package management system that can solve Sudoku based on package dependency rules is not something that I think would be useful or worth having.

(emphasis added)

As it happens, a little-known fact about the Debian packaging system is that you can, in fact, describe Sudoku puzzles in it!

Note for the impatient: If you want to skip all this talk and jump to the code, see debsudoku.py.

For those of you who, like me, are not Sudoku-heads, here's a quick summary of the rules (for the common case of a game with 3 blocks):

  1. The game is played by filling in a 9-by-9 grid with numbers between 1 and 9, inclusive. The board is divided into 9 3-by-3 blocks of grid cells in the obvious way.
  2. All the cells in a column must contain different numbers.
  3. All the cells in a row must contain different numbers.
  4. All the cells in a block must contain different numbers.

There are many ways that you could convert a Sudoku problem into a Debian package archive, but here's a particularly simple one that is close to my summary of the rules. Create 9 * 9 * 9 = 729 packages, named cellR.C-V where R stands for row, C for column, and V for value. Each of these packages represents writing the value V in the cell (R, C). To describe the relationships of each cell to the other cells in the problem, we can create several virtual packages:

Converting these requirements to package dependencies is quite straightforward. For the package cellR.C-V, in block row BR and block column BC, we add these lines to the package's definition:

Provides: cellR.C, blockBR.BC-V, rowR-V, colC-V
Conflicts: cellR.C, blockBR.BC-V, rowR-V, colC-V

For instance, the cell at (5, 8) will get these provides/conflicts for V=2:

Provides: cell5.8, block2.3-2, row5-2, col8-2
Conflicts: cell5.8, block2.3-2, row5-2, col8-2

This relies on the fact that self-conflicts are ignored (see Debian Policy section 7.4).

To actually describe the puzzle we want to solve, we need to add another package that requires two things:

  1. Every cell contains a value.
  2. The cells that the problem fixes have their fixed values.

For instance,

Package: puzzle
Depends: cell1.1, cell1.2, ..., cell5.9-3, cell6.1, ....

if we want to require that the cell at (5, 9) contains the value 3.

Proof-of-concept

As a proof-of-concept, I have written a hacky Python script, named debsudoku.py, that can convert ksudoku saved games into Packages files suitable for use with apt-get or aptitude. In fact, it has two modes: conflicts, which implements the conversion described above, and depends, which implements a different but logically equivalent conversion using only Depends instead of Conflicts. To use the script, save a ksudoku board without doing any work on it, then run

python debsudoku.py /path/to/the/ksudoku/board

I've only done minimal testing on this, so don't use it to fly airplanes (in fact, please don't use any Sudoku player to fly airplanes), but I did use it, in combination with aptitude, to solve a single ksudoku puzzle.

Real-world evaluation

Interestingly, in a test with an easy puzzle generated by ksudoku, aptitude was able to solve the puzzle in just a few seconds when it was expressed with the conflicts reduction, but failed to find a solution at all using depends (I stopped it after several minutes).

I think this is due to how the depends reduction is performed. Take rows: instead of generating conflicts between cells in a row, it requires that each row contain every number between 1 and 9. The problem with this is that aptitude's resolver functions poorly when there are non-obvious conflicts in the dependency structure. It will decide to try a combination of packages that can't be installed together, but it won't realize it. As a result of this, it ends up trying many possible combinations of packages. For instance, it might try to place the number 3 in both (2, 5) and (5, 5). There is no dependency outlawing this in the depends reduction, so aptitude will happily continue trying to resolve dependencies. Since it has used 3 twice in the fifth column, it will not be able to find any way to resolve all the dependencies that state that all the numbers between 1 and 9 appear in the 5th column, but it lacks the higher-level reasoning that would allow it to break out; it will end up trying many or most of the possible ways of filling cells in before it gives up on the offending re-use of 3.

Luckily, real-world package archives (what the aptitude resolver is designed to run on) don't usually look like this.