Plans for Noweb 3

April, 1999
Norman Ramsey

Don Knuth coined the term ``literate programming'' to describe the art of programming primarily for the human reader, and only secondarily for the machine. Literate programming is supported by many tools, all of which provide some way for authors to interleave program source code with well typeset documentation. Most tools also support automatic or semi-automatic cross-referencing of source code. Only four or five literate-programming tools are widely used, and noweb may be the most widely used of all. It is certainly the most widely used literate-programming tool that is independent of the target programming language, and it was the first such tool.

Noweb emphasizes simplicity, extensibility, and language-independence. Noweb has the simplest markup of any literate-programming tool, making it easy for authors to understand the tool and to create literate programs. Noweb uses a pipelined architecture, which makes it possible for expert users to extend the system without recompiling and using the programming language of their choice. Users write extensions as Unix programs and use command-line options to insert them into the noweb pipeline. Users of noweb have written extensions for prettyprinting, conditional compilation, language-dependent cross-reference, etc. The pipelined architecture also makes it easy to support multiple styles of documentation; noweb is unique in supporting plain TeX, LaTeX, HTML, and troff.

Noweb is structured as a collection of C programs, shell scripts, awk scripts, and Icon programs, connected together by Unix pipelines. Noweb can be difficult to install; installers may have to work around bugs in vendors' implementations of awk, and installers must get Icon [Available for free from the University of Arizona] to exploit all of the capabilities of the system. Porting Noweb to the DOS or Windows platform requires either some effort to replace shell scripts or the purchase of a commercial shell.

Noweb's main competitor in the market for language-independent literate-programming tools is nuweb, whose design was inspired by noweb, but which is structured as a monolithic C program. As a result, nuweb is not extensible, but it is easy to port, and it runs quickly. Noweb can run slowly when it is necessary to fork many pipeline stages, some of which run in interpreted languages. [As hardware speeds have continued to increase, this speed disadvantage is less important today than in 1997, when the plans for Noweb 3 were being laid.] Noweb can process nuweb files, but nuweb users continue to prefer nuweb because of its speed and installation.

Noweb's cross-referencing capability extends to HTML; a reader of a literate program can use a Web browser to click on an identifier and jump to the identifier's definition (and documentation). This capability has proven very useful, but it is limited to single documents. When large programs are composed of many separately compiled modules, it is awkward, to say the least, to process the entire program as a single document. (Such documents may run to hundreds of pages, even for a program of modest size, say 10,000 lines.) Users would much prefer to browse one document per module, and to be able to follow references between documents, but noweb does not currently support this model.

In sum, the three improvements that noweb's users would most like to see implemented are

The Noweb 3 effort is focusing on the last two improvements---the first improvement awaits an able student or collaborator with an interest in thinking deeply about cross-reference.

I intend to realize these improvements by replacing the shell scripts and Icon programs with code written in the embedded language Lua. I have chosen Lua primarily because the implementation is small, clear, and simple (about 6000 lines of C code), and it works on both Unix and Microsoft systems. To avoid working with a moving target, I have cloned Lua version 2.5. Extending it to support case statements has resulted in the language ``Lua2.5+nw,'' which I expect to be able to maintain indefinitely. The language itself is quite clean, and it can readily be extended to support special types and operators as needed for functionality and performance.

Thus, to run a Noweb 3 program, you call the no binary, written in C, which in turn calls a Lua script to weave or tangle. This Lua script builds and executes a pipeline, which may include C stages, Lua stages, and external stages.

Garret Prestwood has contributed substantially to the implementation of Noweb 3.