Zero Install SAT Solver
In 2007, OSNews ran an article about OPIUM, showing how to cast apt-get installation problems (choosing which of several possible dependencies to install) as a set of pseudo-boolean constraints which could then be solved mathematically to give the optimal solution. I this post, I'll describe how we've recently adapted this technique to Zero Install, addressing some problems experienced by the Sugar environment (One Laptop Per Child) and allowing better integration with distribution packages.
With apt-get, if two programs depend on different versions of the same library, they can't both be installed at once. When you try to install one, apt-get will first uninstall the other. Sometimes this is unavoidable, but in other cases there is a choice of dependency. The choices apt-get makes determine which other packages must be uninstalled, but it doesn't always find the best solution (or even any solution at all). OPIUM solved this problem by ensuring that the best available choice was always made.
Zero Install has a slightly different problem. In Zero Install, every package is unpacked to its own directory. Libraries are shared when possible (two programs depend on the same version), and installed in parallel otherwise. Therefore, installing one program with Zero Install never requires uninstalling another. However, it is still possible for the dependencies within a single program to conflict with each other. For example, a Python program may depend on "python2.5 or python2.6", but a library it uses may only work with one of them. When running that particular program, Zero Install must choose a version of Python and a version of the library that are compatible.
While apt-get has to look for conflicts across a very large number of packages (every package installed or being considered for installation), Zero Install only has to consider the packages needed for the program being run, but it must look at a large number of different possible versions for each package.
Zero Install managed quite well with a simple non-backtracking solver in the past because most packages had only a few dependencies. Because exactly the same set of versions was available to everyone, if a program was installable by its author then it would be installable everywhere, and even if some dependencies did conflict it was easy enough to list them in some order so that Zero Install always got a solution. But two things have made life more complicated...
A typical modern 64-bit system is also capable of running 32-bit code. However a single program must be either entirely 64-bit or entirely 32-bit. We can't, for example, select a 64-bit version of libgtk and a 32-bit version of Python for a single application. In other words, every 64-bit binary conflicts with every 32-bit binary, which means we have to deal with a lot more conflicts these days.
More complex packages
According to the site leaderboard on the main mirror site, the biggest producer of Zero Install feeds is Sugar Labs. Sugar is a learning environment originally developed for the One Laptop per Child XO-1 netbook, but now available for many Linux distributions. It is an open system, encouraging children to write and distribute their own software, and seems that Zero Install and Sugar may be a good combination.
The Sugar developers want their Zero Install packages to depend on existing distribution packages in many cases. The normal way to do this is to provide a Zero Install download of the dependency (e.g Python 2.6) yourself, but tell Zero Install it can use a distribution package instead if available. However, the Sugar developers (quite reasonably) want to avoid packaging Python at all and depend only on the distribution package. Because Python isn't binary-compatible across versions, they publish separate builds of their software for each version of Python and rely on Zero Install to choose the one that will work with their users' distributions.
These kinds of problems can't be solved using older versions of Zero Install. The solver would choose the Python 2.6 version of the main program (for example) and then try to select a version of Python 2.6 to go with it. If the distribution only had Python 2.5, this would fail. Aleksey Lim from the Sugar project worked around this problem by adding backtracking to the solver. However, this becomes very slow when there are many possible combinations of versions to consider.
Adapting the OPIUM algorithm
Initially, I tried following the OPIUM paper closely. Here, they represent each possible version of each package as a variable which is 1 if the package is to be installed and 0 if not. They make a list of constraints (expressions that must be true). For example, if we want to select a version of Firefox (either 3.5 or 3.6, but not both at once) we would write:
firefox3.5 + firefox3.6 = 1
If Firefox 3.6 depends on GTK >= 2.18 then we can express that dependency as:
gtk2.18.0 + gtk2.18.1 + gtk2.18.2 - firefox3.6 >= 0
This expression can be satisfied by either not choosing Firefox 3.6 (firefox3.6 = 0) or by choosing a compatible version of GTK to go with it.
After writing out all these expressions, we pass them all to a pseudo-boolean constraint solver (I tested with minisat+). We also give it a cost function to minimise (e.g. selecting older versions "costs" more, so it chooses newer versions where possible). The solver then tells us which variables should be 1, and these are the versions to use.
This scheme worked, but it turned out to be quite slow. The problem was the cost function: most combinations have a similar cost, which makes it difficult for the solver to narrow the search down quickly. There could even be several solutions with the same cost, so you might get a different set of versions every time you ran it!
The solution was to optimise one component at a time:
- Solve, optimising for the "best" version of Firefox that could be part of a valid selection.
- Then solve again, for the "best" version of GTK that can be part of a valid combination with the previously selected version of Firefox.
- ... and so on recursively until we have selected a version of every component we need.
In fact, we don't need to use a cost function at all. We can just ask whether there is any valid combination involving the best version of Firefox. If not, we ask again for the second best version, etc. The problem can then be simplified to plain old Boolean satisfiability and implemented efficiently. A DPLL-based algorithm with conflict-driven learning turned out to be very fast, even implemented in pure Python.
As a simple test, Tricky has two versions. The "best" one depends on a library that can't be installed (which could be because it's a distribution-only package that your distribution doesn't have). The old solver would try to use the newer version and get stuck. The new solver uses the older version that works.
As I don't know which version of Zero Install your distribution carries, we'll select the old and new versions explicitly (yes, conflict-free installation is great for testing!):
$ ZI="http://0install.net/2007/interfaces/ZeroInstall.xml" $ alias 0launch-old='0launch --before=0.46 $ZI' $ alias 0launch-sat='0launch --not-before=0.46 $ZI'
The old 0launch selects Tricky v0.2 and fails to find the required library:
$ 0launch-old -c http://0install.net/tests/Tricky.xml Can't find all required implementations: - http://0install.net/tests/Tricky.xml -> sha1new=5cbf4... - http://0install.net/tests/libtricky.xml -> None
The SAT-solver version selects v0.1:
$ 0launch-sat -c http://0install.net/tests/Tricky.xml Hello from Tricky v0.1!
Of course, depending on packages from your users' distributions means your program might not work if they don't have any of the possible required versions installed. The next step is to merge Sugar's PackageKit support so that we can prompt the user to install missing distribution packages too.