This document sketches a proposal for a C-like approach to providing high-level services (I/O, text decompression, dictionary lookup) in a virtual machine for the support of interactive fiction. The essence of the approach is that instead of providing special bytecodes (VM instructions) for these services, we arrange for them to be provided by ordinary procedures. Hence, the ``procedure machine'' or ``P-machine.'' A crucial aspect of the proposal is that procedures may be referred to by name---this means the set of procedures can easily be extended. The names are resolved to individual procedures when the story file is loaded, just before it starts to run---hence ``dynamic linking.''
I assume that arguments are passed and results are returned on a stack of ``VM values.'' It's probably a good idea if all VM values are the same size, e.g., 32 bits.
<type definitions>= (U->) [D->] typedef unsigned int vmvalue; /* supposed to be 32 bits */
Definesvmvalue
(links are to index).
I'm also going to assume that there's some virtual-machine state in a
struct vmstate
.
I won't bother to specify what this is, but you can imagine memory,
program counter, a value stack, whatever you like.
<type definitions>+= (U->) [<-D->] typedef struct vmstate *Vmstate;
DefinesVmstate
(links are to index).
Now, let's imagine an arbitrary procedure called from the VM.
It needs a pointer to the VM state.
I'm also going to assume an explicit specification of the arguments
passed, so that a procedure that just messes with arguments and
results can be compiled without any knowledge of the VM state.
Finally, I'll assume that the procedure takes an extra argument that
is somehow associated with the procedure itself.
This extra argument can support an object-oriented or functional style
from within C.
In various different places, you'll see it called self
, or a
closure, or a cookie, or a rock.
We'll use the term ``closure.''
A VM procedure is therefore represented as a value of type Vmproc
:
<type definitions>+= (U->) [<-D] typedef struct vmproc { unsigned (*f)(Vmstate vm, int argc, vmvalue *argv, void *closure); void *closure; } Vmproc;
DefinesVmproc
(links are to index).
The unsigned
return value from f
should indicate how many
results were left on the stack.
At run time, procedures are identified by small integers, which we'll call procedure IDs. Let us assume our VM has a stack. Just before a call, the VM code should push all the arguments on the stack. Then, the very last value pushed should be a procedure ID identifying the procedure to be called. [The reason for putting the procedure ID on the stack instead of in the instruction is to make it possible for games to create new procedures on the fly. This feature is probably useless in a VM intended for adventure games, and if you could do without it you could make the procedure ID part of the call instruction itself. You'd want at least 16 bits for procedure IDs.] The call instruction uses the ID to find the procedure and call it.
Here's some hypothetical code. In real life, making the stack
and
sp
global variables would probably be a mistake---they should be
part of that struct vmstate
.
<hypothetical call example>= (U->) vmvalue stack[]; /* the VM value stack */ vmvalue *sp; /* pointer to the first empty slot on the stack */ Vmproc *known_procedures; /* pointer to all known procedures */ static void vm_call(void) { /* implements the call opcode */ extern struct vmstate *current_state; /* pointer to VM state (bogus) */ Vmproc *p; <advance VM's program counter past the call instruction> assert(sp > 0); p = &known_procedures[sp[-1]]; sp = stack + p->f(current_state, sp - stack - 1, stack, p->closure); }
Definesknown_procedures
,vm_call
(links are to index).
In a real VM, of course, vm_call
wouldn't be a procedure; it
would be inlined.
Assume there are N procedures, with IDs ranging from 0 to N-1. M of these procedures are compiler-generated, and the rest are provided by the interpreter. Let's call the procedure IDs from 0 to M-1 the low procedure IDs, and the procedure IDs from M toN-1 the high procedure IDs. The convention is that low procedure IDs refer to procedures generated by the compiler, and high procedure IDs refer to procedures provided as part of the interpreter. The compiler gets to pick N and M, and it gets to assign procedure IDs any way it likes. But how does it tell the interpreter about them? By putting two tables into the story file.
The first table, or the bytecode procedure table, has M entries, one for each low procedure ID. Each entry points (somehow) to the information for that procedure in the story file. Such information probably includes the number of local variables used by a procedure, possibly the number of arguments it expects, and certainly the bytecodes that implement the procedure.
The second table, or the native procedure table, has
N-M entries, one for each high procedure ID.
Each entry is a string that gives the standard name for some
procedure.
The interpreter should have a list of all the standard procedures it
implements, together with their names.
The names are part of the VM standard; every name has to have a
well-defined semantics, so that the procedure with that name does the
same thing on every platform, no matter what.
The name will probably encode at least a subsystem (e.g., glk
),
a procedure, and a version number.
The interpreter uses
both tables to build the known_procedures
table before
the game begins.
Note that if the native procedure table asks the interpreter for a name that is not
known to the interpreter, that's OK---the interpreter should bind in a
procedure that will issue an error message if never called.
This way, a game can link to a special procedure but not call it if
the procedure isn't there.
[A good standard procedure would be System.provides
, to
ask the interpreter if it provides a procedure with a given name.]
Here's an example error procedure for a Unix interpreter:
<sample error procedure>= (U->) static unsigned vm_unknown_proc(Vmstate vm, int argc, vmvalue *argv, void *closure) { fprintf(stderr, "Standard procedure `%s' is not available in this interpreter\n", (char *)closure); exit(1); }
Definesvm_unknown_proc
(links are to index).
A better procedure would try to roll the VM back to its last consistent state, so the user could have a chance to save the game.
<vm example>= (U->) [D->] struct vmstate { unsigned char *memory; vmvalue *locals; /* local vars of currently active procedure */ unsigned num_locals; /* number of local variables */ vmvalue *stack; /* the value stack */ vmvalue *sp; /* next available slot in stack */ struct frame *suspended; /* states for suspended bytecode procs */ unsigned char *pc; /* pointer to next bytecode to interpret */ };
The struct frame
contains the information needed to implement return
.
It just restores the VM state as directed:
<vm example>+= (U->) [<-D->] struct frame { vmvalue *locals; unsigned num_locals; unsigned char *pc; };
Imagine this is the representation of a bytecode procedure:
<procedure example>= (U->) struct bytecode_proc { unsigned num_locals; /* number of local variables */ unsigned char *bytecodes; /* pointer to bytecodes */ };
(For performance reasons, you probably also want each procedure annotated with its maximum stack usage. This means you can check for stack overflow only on call and return, instead of at every push. The stack usage could be computed for every procedure at startup time, by abstract intrepretation, or you could make the compiler do it.)
Now we can write call and return. This call initializes local variables either to the arguments or to zero.
<vm example>+= (U->) [<-D->] static unsigned vm_bytecode_call(Vmstate vm, int argc, vmvalue *argv, void *closure) { struct bytecode_proc *p = closure; /* save current state */ vm->suspended++; vm->suspended->locals = vm->locals; vm->suspended->num_locals = vm->num_locals; vm->suspended->pc = vm->pc; /* allocate new frame */ vm->locals += vm->num_locals; vm->num_locals = p->num_locals; /* initalize locals to arguments or zero */ assert(argc <= p->num_locals); memcpy(vm->locals, argv, argc * sizeof (*argv)); memset(vm->locals + argc, 0, (p->num_locals - argc) * sizeof (*vm->locals)); vm->pc = p->bytecodes; /* set pc */ return 0; /* start new proc with empty stack */ }
Definesvm_bytecode_call
(links are to index).
A more clever scheme would be to use the same space to represent both local variables and the value stack; that would make it possible to start a procedure just by shifting a pointer, and it wouldn't be necessary to copy the arguments.
In a return, results are left on the stack by the bytecodes, so we simply indicate the values that are already there. Return doesn't actually have to be done as a procedure; it could be done inline as a bytecode.
<vm example>+= (U->) [<-D] static unsigned vm_bytecode_return(Vmstate vm, int argc, vmvalue *argv, void *closure) { /* restore previous procedure */ vm->locals = vm->suspended->locals; vm->num_locals = vm->suspended->num_locals; vm->pc = vm->suspended->pc; vm->suspended--; return argc; }
Definesvm_bytecode_return
(links are to index).
known_procedures
table as follows:
<sample startup code>= (U->) extern struct vmproc *lookup_procedure(char *procname); static void init_procs(unsigned M, struct bytecode_proc *low_procs, unsigned N, char **high_procs) { int i; known_procedures = malloc(N * sizeof (*known_procedures)); for (i = 0; i < M; i++) { known_procedures[i].f = vm_bytecode_call; known_procedures[i].closure = low_procs+i; } for ( ; i < N; i++) { Vmproc *p = lookup_procedure(high_procs[i-M]); if (p) known_procedures[i] = *p; else { known_procedures[i].f = vm_unknown_proc; known_procedures[i].closure = high_procs[i-M]; } } }
Definesinit_procs
,lookup_procedure
(links are to index).
It's a nuisance to specify the behavior of each standard procedure in
terms of a list of vmvalue
s.
It would really be simpler if we could write most standard procedures
in C, then have a standard way of invoking such procedures from
VM code.
We can do this by stealing some technology from the implementation of
Remote Procedure Call.
Unmarshalling is the process of taking a sequence of
vmvalue
s and converting them into arguments for an ordinary
C call.
Marshalling is the process of taking a result from a C function
and converting it into a sequence of vmvalue
s.
With luck and good management, marshalling and unmarshalling code can
be generated automatically by a stub generator.
Depending on the application, the stub generator can operate on a raw
.h file or a description that provides a little more information.
Here's an example. Suppose we have the following declarations:
<sample C procedures to be converted>= (U->) typedef struct window_st *window_t; typedef struct event_struct { unsigned type; window_t win; unsigned val1, val2; } event_t; void ick_request_char_event(window_t win); event_t ick_wait_for_event(void);
Definesevent_t
,ick_request_char_event
,window_t
(links are to index).
The stub generator might produce this:
<sample output from stub generation>= (U->) static unsigned my_ick_request_char_event (Vmstate vm, int argc, vmvalue *argv, void *closure) { assert(argc == 1); ick_request_char_event((window_t) argv[0]); return 0; } static unsigned my_ick_wait_for_event (Vmstate vm, int argc, vmvalue *argv, void *closure) { event_t e; assert(argc == 0); e = ick_wait_for_event(); argv[0] = e.type; argv[1] = (vmvalue) e.win; argv[2] = e.val1; argv[3] = e.val2; return 4; } struct predefined_binding { char *name; struct vmproc proc; } procedures[] = { { "ick.request-char-event.1", { my_ick_request_char_event, 0} }, { "ick.wait-for-event.1", { my_ick_wait_for_event, 0} }, { 0, {0, 0} } };
Definesmy_ick_request_char_event
,my_ick_wait_for_event
,procedures
(links are to index).
(Actually, rather than compile in the assertions, you might have the
stub generator create two versions---one with no checking, and one
with extensive debugging, e.g., to make sure every value claimed to be
a window_t
was actually returned as a window_t
. Then the
interpreter could pick the normal version or the debugging version at
startup time.)
Of course, if a procedure is really in the inner loop, you probably
don't want the minor overhead of calling it through a stub, and
so you write a Vmproc
directly.
struct bytecode_proc
from VM memory and assign it a procedure ID
on the fly.
This would be easy to implement as another procedure.
<*>= #include <stdio.h> #include <stdlib.h> #include <string.h> #include <assert.h> <type definitions> <hypothetical call example> <sample error procedure> <procedure example> <vm example> <sample startup code> <sample C procedures to be converted> <sample output from stub generation>
This is necessary, but it can't do anything, because we haven't actually defined the format of the call instruction.
<advance VM's program counter past the call instruction>= (<-U) /* do nothing */