noweb Index and Identifier Cross-Reference

To noweb, any string of nonwhite characters can be an identifier. A human being or a language-dependent tool must mark definitions of identifiers; noweb finds the uses using a language-independent algorithm. The algorithm relies on an idea taken from the lexical conventions of Standard ML. Characters are divided into three classes: alphanumerics, symbols, and delimiters. If an identifier begins with an alphanumeric, it must be delimited on the left by a symbol or a delimiter. If it begins with a symbol, it must be delimited on the left by an alphanumeric or a delimiter. If it begins with a delimiter, there are no restrictions on the character immediately to the left. Similar rules apply on the right-hand side. The default classifications are chosen to make sense for commonly used programming languages, so that noweb will not recognize `zip' when it sees `zippy', or `++' when it sees `++:='. This trick works surprisingly well, but it does not prevent noweb from spotting identifiers in comments or string literals.

The basic assumption in noweb is that a human being will identify definitions using the

@ %def mumble foo quux
construct. I have, however, found it very useful to write simple filters that attempt to identify global definitions automatically. Filters for Icon, TeX, and yacc all take about 30 lines of Icon code and are included in the noweb distribution. Recognizing definitions in C is somewhat more complicated, but a reasonable recognizer for C is also included. Contributions for other languages are encouraged. If you write a filter of your own, you can put it in the $LIB directory with a name like `autodefs.pascal'. To use the filters, see the noweave man page for descriptions of the -autodefs and -showautodefs options.

noweave -index works well for short programs, but nodefs, noindex, and noweave -indexfrom are there for large multi-file programs. See the noindex man page for details.