noweb Index and Identifier Cross-Reference
To noweb, any string of nonwhite characters can be an identifier. A
human being or a language-dependent tool must mark definitions of
identifiers; noweb finds the uses using a language-independent
algorithm. The algorithm relies on an idea taken from the lexical
conventions of Standard ML. Characters are divided into three
classes: alphanumerics, symbols, and delimiters. If an identifier
begins with an alphanumeric, it must be delimited on the left by a
symbol or a delimiter. If it begins with a symbol, it must be
delimited on the left by an alphanumeric or a delimiter. If it begins
with a delimiter, there are no restrictions on the character
immediately to the left. Similar rules apply on the right-hand side.
The default classifications are chosen to make sense for commonly used
programming languages, so that noweb will not recognize `zip' when it
sees `zippy', or `++' when it sees `++:='. This trick works
surprisingly well, but it does not prevent noweb from spotting
identifiers in comments or string literals.
The basic assumption in noweb is that a human being will identify
definitions using the
@ %def mumble foo quux
construct. I have, however, found it very useful to write simple
filters that attempt to identify global definitions automatically.
Filters for Icon, TeX, and yacc all take about 30 lines of Icon code
and are included in the noweb distribution.
Recognizing definitions in C is somewhat more complicated, but a
reasonable recognizer for C is also included.
Contributions for other
languages are encouraged. If you write a filter of your own, you can
put it in the $LIB directory with a name like `autodefs.pascal'.
To use the filters, see the noweave man page for descriptions
of the -autodefs and -showautodefs options.
noweave -index works well for short programs, but nodefs, noindex, and
noweave -indexfrom are there for large multi-file programs. See the
noindex man page for details.