Mar 15, 2010

Software processes are software too

An oft-cited paper in software engineering is “Software Processes Are Softare Too”, by Leon Osterweil. The central thesis of the paper is:

“Our suggestion is that we describe software processes by "programming” them much as we “program” computer applications. We refer to the activity of expressing software process descriptions with the aid of programming techniques as process programming, and suggest that this activity ought to be at the center of what software engineering is all about.“

Today this seems like a rather banal observation, but remember that this paper was published in 1987. Today, we have source repositories integrated with code review tools, testing rigs, continuous builds, bug databases and the works. Most of the "process” of industrial software engineering is automated, and is software itself. A code patch is created by an author, and then is acted upon by a large cast, both human and mechanic. Humans review it. Machines test and analyze it. Finally, the finished “product”, the reviewed patch, is submitted. To check in a single patch can require a dozen or more systems, running on at least as many machines. Most developers take this completely for granted.

Was this realistic in 1987? Not with the computing power of the day. A deluge of CPU cycles and disk space has made automated processes possible. A continuous build machine. A submit queue machine. And on and on.

(Sidebar: the computer science literature from the 70s and 80s is chock full of ideas that were held back by the expense (at the time) of computing power, storage, and networking. Another big example: just-in-time compilation. No, the Java Virtual Machine was not the first to use JIT techniques — there are papers going back to the 1980s that describe similar techniques. But back then, a JIT would consume 50% of a CPU, and now its barely a blip.)

What will automated engineering processes look like in another decade? To answer that, lets look at some of the areas where humans are currently needed: system design, actually writing the code, and reviewing the code.

The underlying “input” that humans provide to the system is understanding. However, over time the domain of what computers have understood has steadily increased. For example, modern IDEs are now so sophisticated that sometimes I think my job title should be “Eclipse operator”, not software engineer. Our tools are now at the stage where they can compute, store and navigate complex relationships and properties of source code. But today this data is still mostly used by humans, for example, to perform automated refactoring, or to navigate source code. It is not hard to imagine the computer taking over the role of actually manipulating the source code with the relationships it has computed, presenting the proposed modifications to humans, and then humans acting in a purely advisory role.

On the reviewing side as well, computers have been in inexorably marching forward. Many reviewing activities are already automated. Syntax and style checking have been performed by tools for some time now, and now the reach of sophisticated semantic and static analysis is reaching even further to offer programmers suggestions on how to improve their code.

What is the difference between a process and an algorithm? To me, a process is something that relies heavily on human input. On the other hand, an algorithm is fully automated, once started. As human input to a process is reduced, it approaches an algorithm. How far is software engineering from becoming an algorithm?