Making ASDF more magic by making it less magic: fare

fare

Making ASDF more magic by making it less magic

Jul 30, 2018 06:46

This short essay describes some challenges I leave to the next maintainers of ASDF, the Common Lisp build system, related to the "magic" involved in bootstrapping ASDF. If you dare to read further, grab a chair and your favorite headache-fighting brewage.

ASDF is a build system for Common Lisp. In the spirit of the original Lisp DEFSYSTEM, it compiles and loads software (mostly but not exclusively Common Lisp code) in the same image as it's running. Indeed, old time Lisp machines and Lisp systems did everything in the same world, in the same (memory) image, the same address space, the same (Unix-style) process - whatever you name it. This notably distinguishes it from traditional Unix build, which happens in multiple processes each with its own address space. The upside of the Lisp way is extremely low overhead, which allows for greater speed on olden single-processor machines, but also richer communication between build phases (especially for error reporting and handling), interactive debugging, and more. The upside of the Unix way is greater parallelizability, which allows for greater speed on newer multi-processor machines, but also fewer interactions between build phases, which makes determinism and distributed computation easier. The upside of the Lisp way is still unduly under-appreciated by those ignorant of Lisp and other image-based systems (such as Smalltalk). The Lisp way feels old because it is old; it could be updated to integrate the benefits of the Unix way, possibly using language-based purity and effect control in addition to low-level protection; but that will probably happen with Racket rather than Common Lisp.

One notable way that ASDF is magic is in its support for building itself - i.e. compiling and/or loading a new version of itself, in the very same Lisp image that is driving the build, replacing itself in the process, while it is running. This "hot upgrade" isn't an idle exercise, but an essential feature without which ASDF 1 was doomed. For the explanation why, see my original post on ASDF, Software Irresponsibility, or the broader paper on ASDF 2, Evolving ASDF: More Cooperation, Less Coordination.

Now, despite the heroic efforts of ASDF 2 described in the paper above, self-build could not be made to reliably happen in the middle of the build: subtle incompatibilities between old and new version, previously loaded extensions being clobbered by redefinitions yet expected to work later, interactions with existing control stack frames or with inlining or caching of methods, etc., may not only cause the build to fail, but also badly corrupt the entire system. Thus, self-build, if it happens, must happen at the very beginning of the build. However, the way ASDF works, it is not predictable whether some part of the build will later depend on ASDF. Therefore, to ensure that self-build happens in the beginning if it happens at all, ASDF 3 makes sure it always happens, as the very first thing that ASDF does, no matter what. This also makes ASDF automatically upgrade itself if you just install a new source repository somewhere in your source-registry, e.g. under ~/common-lisp/asdf/ (recommended location for hackers) or /usr/share/common-lisp/source/cl-asdf/ (where Debian installs it). This happens as a call to upgrade-asdf in defmethod operate :around in operate.lisp (including as now called by load-asd), so it is only triggered in side-effectful operations of ASDF, not pure ones (but since find-system can call load-asd, such side-effects can happen just to look at a not-previously-seen, new or updated system definition file). Special care is taken in the record-dependency method in find-system.lisp so this autoload doesn't cause circular dependencies.

Second issue since ASDF 3: ASDF was and is still traditionally delivered as a single file asdf.lisp that you can load in any Common Lisp implementation (literally any, from Genera to Clasp), and it just works. This is not the primary way that ASDF is seen by most end-users anymore: nowadays, every maintained implementation provides ASDF as a module, so users can (require "asdf") about anywhere to get a relatively recent ASDF 3.1 or later. But distributing a single file asdf.lisp is still useful to initially bootstrap ASDF on each of these implementations. Now, by release 2.26, ASDF had grown from its initial 500-line code hack to a 4500-line mess, with roughly working but incomplete efforts to address robustness, portability and upgradability, and with a deep design bug (see the ASDF 3 paper). To allow for further growth, robustification and non-trivial refactoring, ASDF 3 was split into two sets of files, one for the portability library, called UIOP (now 15 files, 7400 lines) and one for ASDF itself (now 25 files, 6000 lines as of 3.3.2.2), the latter set also specifically called asdf/defsystem in this context. The code is much more maintainable for having been organized in these much more modular smaller bites. However, to still be able to deliver as a single file, ASDF implemented a mechanism to concatenate all the files in the correct order into the desired artifact. It would be nice to convince each and every implementation vendor to provide UIOP and ASDF as separate modules, like SBCL and MKCL do, but that's a different challenge of its own.

There is another important reason to concatenate all source files into a single deliverable file: upgrading ASDF may cause some functions to be undefined or partially redefined between source files, while being used in the building ASDF's call stack, which may cause ASDF to fail to properly compile and load the next ASDF. Compiling and loading ASDF in a single file both makes these changes atomic and minimizes the use of functions being redefined while called. Note that in this regard, UIOP could conceivably be loaded the normal way, because it follows stricter backward compatibility restrictions than ASDF, and can afford them because it has a simpler, more stable, less extensible API that doesn't maintain as much dynamic state, and its functions are less likely to be adversely modified while in the call stack. Still, distributing UIOP and ASDF in separate files introduces opportunities for things to go wrong, and since we need a single-file output for ASDF, it's much safer to also include UIOP in it, and simpler not to have to deal with two different ways to build ASDF, with or without a transcluded UIOP.

As a more recent issue, ASDF 3.3 tracks what operations happen while loading a .asd file (e.g. loading triggered by defsystem-depends-on), and uses that information to dynamically track dependencies between multiple phases of the build: there is a new phase each time ASDF compiles and loads extensions to ASDF into the current image as a prerequisite to process further build specifications. ASDF 3.3 is then capable of using this information to properly detect what to rebuild or not rebuild when doing a incremental compilation involving multiple build phases, whereas previous versions could fail in both ways. But, in light of the first issue, that means that trying to define ASDF or UIOP is special, since everything depends on them. And UIOP is even more special because ASDF depends on it. The "solution" I used in ASDF 3.3 was quite ugly - to prevent circular dependency between a define-op asdf and define-op uiop, I made asdf.asd magically read content from uiop.asd without loading uiop.asd, so as to transclude its sources in the concatenated file asdf.lisp. This is a gross hack that ought to be replaced by something better - probably adding more special cases to record-dependency for uiop as well as asdf along the way.

Finally, there is a way that ASDF could be modified, that would displace the magic of bootstrap such that no special case is needed for ASDF or UIOP - by making that magic available to all systems, potentially also solving issues with cross-compilation. (UIOP remains slightly special: it must be built using the standard CL primitives rather than the UIOP API.) But that would require yet another massive refactoring of ASDF. Moreover, it would either be incompatible with existing ASDF extensions or require non-trivial efforts to maintain a backward-compatible path. The problem is the plan made by ASDF is executed by repeatedly calling the perform method, itself calling other methods on objects of various classes comprising the ASDF model, while this model is being redefined. The solution is that from the plan for one phase of execution, ASDF would instead extract a non-extensible, more functional representation of the plan that is impervious to upgrade. Each action would thus define a method on perform-forms instead of perform, that would (primarily) return a list of forms to evaluate at runtime. These forms can then be evaluated without the context of ASDF's object model, actually with a minimal context that even allows them to be run in a different Lisp image on a different Lisp implementation, allowing for cross-compilation, which itself opens the way for parallelized or distributed deterministic backends in the style of XCVB or Bazelisp. Such changes might justify bumping the ASDF version to 4.0.

As usual, writing this essay, which was prompted by a question from Eric Timmons, forced me to think about the alternatives and why I had rejected some of them, to look back at the code and experiment with it, which ultimately led to my finding and fixing various small bugs in the code. As for solving the deeper issues, they're up to you, next ASDF maintainers!

software, lisp, build, bootstrapping, asdf, en