19. Beyond Exception Handling: Conditions and Restarts

    One of Lisp’s great features is its condition system. It serves a similar purpose to the exception handling systems in Java, Python, and C++ but is more flexible. In fact, its flexibility extends beyond error handling—conditions are more general than exceptions in that a condition can represent any occurrence during a program’s execution that may be of interest to code at different levels on the call stack. For example, in the section “Other Uses for Conditions,” you’ll see that conditions can be used to emit warnings without disrupting execution of the code that emits the warning while allowing code higher on the call stack to control whether the warning message is printed. For the time being, however, I’ll focus on error handling.

    To start, I’ll introduce some terminology: errors, as I’ll use the term, are the consequences of Murphy’s law. If something can go wrong, it will: a file that your program needs to read will be missing, a disk that you need to write to will be full, the server you’re talking to will crash, or the network will go down. If any of these things happen, it may stop a piece of code from doing what you want. But there’s no bug; there’s no place in the code that you can fix to make the nonexistent file exist or the disk not be full. However, if the rest of the program is depending on the actions that were going to be taken, then you’d better deal with the error somehow or you will have introduced a bug. So, errors aren’t caused by bugs, but neglecting to handle an error is almost certainly a bug.

    Because each function is a black box, function boundaries are an excellent place to deal with errors. Each function—low, for example—has a job to do. Its direct caller—medium in this case—is counting on it to do its job. However, an error that prevents it from doing its job puts all its callers at risk: medium called because it needs the work done that low does; if that work doesn’t get done, medium is in trouble. But this means that medium‘s caller, high, is also in trouble—and so on up the call stack to the very top of the program. On the other hand, because each function is a black box, if any of the functions in the call stack can somehow do their job despite underlying errors, then none of the functions above it needs to know there was a problem—all those functions care about is that the function they called somehow did the work expected of it.

    Consider the hypothetical call chain of high, medium, low. If low fails and can’t recover, the ball is in high‘s court. For high to handle the error, it must either do its job without any help from medium or somehow change things so calling medium will work and call it again. The first option is theoretically clean but implies a lot of extra code—a whole extra implementation of whatever it was medium was supposed to do. And the further the stack unwinds, the more work that needs to be redone. The second option—patching things up and retrying—is tricky; for high to be able to change the state of the world so a second call into medium won’t end up causing an error in low, it’d need an unseemly knowledge of the inner workings of both and low, contrary to the notion that each function is a black box.