(This module is also available in PDF, in two-column and one-column formats.)
New learning outcome: can compile a call to a function with no arguments.
ListUtil.foldri
considered helpful (step where environment is established).
Introduction
This week you’ll complete your code generator by adding support for functions and calls. You’ll also practice for the course finale.
What am I doing?
Generate code for functions and function calls, including optimized tail calls.
Demonstrate, via tests, that functions work correctly.
Why am I doing it?
- To be able to run Scheme functions, not just assembly-language functions.
How?
Before lab you’ll develop equations of translation for function bodies (tail position).
During lab you’ll implement instructions for calls (including tail calls) and returns.
After lab, you’ll clean up any remaining issues with primitives, and you’ll test your system.
On Monday night, you’ll submit a UFT and SVM that can translate and run Scheme code in K-normal form with function calls, including optimized tail calls. You’ll also submit tests.
The module step by step
Before lab
Download updates. Update your git repository in two steps:
Be sure all your own code is committed. Confirm using
git status
.Update your working tree by running
git pull
(or you might possibly needgit pull origin main
). Expect merge conflicts in the SVM code:- Conflicts in
disasm.c
in the implementation offprintfunname
- Conflicts in
instructions.c
in the implementation ofisgetglobal
- Possible conflicts in
vmstate.c
around functioninitialize_global
If you have have merge conflicts around
fprintfunname
orisgetglobal
, then quite likely you have implemented them correctly, in which case you will want to keep your versions.You may also encounter a few merge conflicts in
asumutil.sml
, but if so, these should be easy to resolve.- Conflicts in
Study tail position. In the translation handout from module 6, study these two sections:
If you want to know more, “tail position” is not a useful search term. A better search term is “optimized tail calls.” There is a nice blog post by Richard Wild.
Write equations of translation for tail position. The section on translating expressions in tail position describes a translation informally. Define that translation formally by writing a set of translation equations, which you will bring to lab.
Make sure you write an equation for every syntactic form of K-normal form.
Make sure that every bullet point in the section is implemented by one or more of your equations.
Lab
You will extend your code generator to support functions and function calls. To help you, I have added a few utility functions to file asmutil.sml
: A.call
, A.areConsecutive
, A.return
, and A.tailcall
. Relevant functions are mentioned at the end of each step.
Check your equations for tail position. Working in groups of three, check your equations from step 3, in two steps:
Reach consensus in your group about what the equations should be.
Once you have reached consensus, share your conclusions with me. Or if you can’t reach consensus, share your disagreements with me.
Prepare to code function funcode (x₁, …, xₙ) => e in
. Replace the body of with the expression below, changing the value constructor as needed so it matches your definition of the formknf.sml
:(case e of K.FUNCODE _ => Impossible.exercise "codegen for funcode" _ => Impossible.exercise "toReturn")
Generate code for calls. In file
codegen.sml
, in your functions , , and , fill in translations of the function-call form x(x₁, …, xₙ).Do call instruction! In the instruction
first, because there you have a destination register. And be aware of the limitations in encoding ther := call x(x₁, …, xₙ)
The destination register r and the function register x are arbitrary, but the other registers are strictly constrained. If any parameters are present, it must always be true that \(x_1 = x+1\) and for every \(i\), \(x_{i+1} = x_i + 1\). That is, the register numbers x through xₙ must be consecutive. If they are not, there is a bug in your UFT, and your code generator should call .
Take advantage of the new utility functions
and , which are defined in fileasmutil.sml
.Using your equations, extend x(x₁, …, xₙ).
with a case that generates code for the function-call formTake advantage of the new utility functions
and , which are defined in fileasmutil.sml
.The call instruction needs a destination register.1 If the call is being executed only for side effect, but the return is going to update a destination register, what destination register should be updated? Pick any register that the call could kill, and use it as the destination register.
case is a little weird. The SVM
Generate code that returns values. Using your equations, add cases to
to handle all the remaining value constructors of K-normal form. In some cases, you may be able to use as a helper function.After this step, only the funcode (x₁, …, xₙ) => e form will remain to be implemented.
Fun fact: code generated by
always exits the current procedure; it never “falls through.” So when anif
expression occurs in tail position, it is not necessary to label the end of theif
expression or to include agoto
instruction targeting that label.Take advantage of the new utility function
, which is defined in fileasmutil.sml
.Generate code for function bodies. Finish your code generator by adding cases that translate function bodies (funcode (x₁, …, xₙ) => e).
Update funcode (x₁, …, xₙ) => e, it calls on the function’s body.
so that givenSimilarly update
. You may be able to use as a helper function.Confirm that your mutual recursion passes this sanity check:
- Function calls itself, and .
- Function calls itself, , and .
- Function calls itself and but not .
After lab
Wrap up your code generator
Generate good code for misused primitives. In any language implementation, bad code must not take down the translator. As long as the bad code is not run, a user should be able to translate and load the program as usual. Therefore, if a primitive is used in the wrong context or with the wrong number of arguments, nothing bad should happen unless the primitive is actually run.
To understand the context in which a primitive is used, you can pattern-match on it:
If a primitive matches
, then an application of the primitive makes sense only in .If a primitive matches
, then an application of the primitive makes sense only in .
Your code generator should handle user code in which primitives are misused in these ways:
A register-setting primitive is used for effect. No problem! By design, register-setting primitives don’t have any effect, so the primitive is implemented by an empty sequence of instructions.
A effectful primitive is used to set a register. This is what some language definitions call “an unchecked run-time error” and others call “undefined behavior.” We can do as we like. Here are some choices:
- Evaluate the primitive for its effect and leave the register unset.
- If the primitive takes any arguments, set the result register to equal the first argument.
- Set the result register to
nil
. - Replace the primitive with a call to
error
.
A primitive is used with the wrong number of arguments (optional). Each primitive has an arity which can be queried with function A case where arity doesn’t match might look like this:2
. That arity should match the number of actual parameters.(println (if #f (+ 0) 'fooled-you!))
This code runs just fine under
vscheme
oruscheme
, but if your UFT doesn’t check the arity of primitives, it will generate.vo
code that will not load.In cases like these, the most sensible thing to do is probably to convert the primitive application to
error
, with a literal error message. That change will shift the problem from load time to run time, where it belongs. But the UFT is not a production compiler, and if you want to pass these forms through so they fail at load time, that’s OK with me.
Tie up any remaining loose ends. At this point, your
codegen.sml
should be complete and should handle all cases. The only use of or any other uncaught exception should be in a case where the registers in a call are not numbered consecutively—which indicates a bug in your UFT.- Make sure every syntactic form has a well-defined translation in each of the three contexts (destinies).
Test your code generator
Lightweight testing. Get your system to pass three test cases.
Factorial. I’ve hand-translated a factorial function into K-normal form. I’ve also translated the unit test (check-expect (factorial 4) 24).
(let ([$r0 (lambda ($r1) (let* ([$r2 0] [$r2 (= $r1 $r2)]) (if $r2 1 (let* ([$r2 factorial] [$r3 1] [$r3 (- $r1 $r3)] [$r2 ($r2 $r3)]) (* $r1 $r2)))))]) (set factorial $r0)) (begin (let* ([$r0 factorial] [$r1 4] [$r0 ($r0 $r1)]) (check $r0 '4-factorial)) (let* ([$r0 24]) (expect $r0 '24)))
Copy this code into file
fact.kn
and confirm that the test passes with both my interpreter and your UFT/SVM:$ vscheme < fact.kn The only test passed. $ uft kn-vo fact.kn | svm The only test passed.
Tail calls. Translate your
overtail.vs
file from module 7 into K-normal form as fileovertail.kn
. Confirm that it translates and runs without overflow.Tail calls with parameters. Translate the
times-plus
test from module 7 into K-normal form as filetailm.kn
. (You can either translate forward from thetailm.scm
file or translate backward from assembly codetailm.vs
, both in the module 7 tests handout.) Confirm that the test translates, loads, runs, and passes.
What and how to submit
On Monday, submit the homework. In the
src/uft
directory you’ll find a fileSUBMIT.08
. That file needs to be edited to answer the same questions you answer every week.To submit, you’ll need to copy your working tree to the department servers. We recommend using
rsync
, butscp
also works.Now log into a department server, change to your working tree, and submit your entire
src
directory:provide cs106 hw08 src
or if you keep an additional
tests
directory,provide cs106 hw08 src tests
On Tuesday, submit your reflection. Create a plain text file
REFLECTION
, which will hold your claims for project points and depth points.For each project point you claim, write the number of the point, plus whatever is called for in the section “How to claim the project points”—usually a few sentences.
Now copy your
REFLECTION
file to a department server and submit it usingprovide
:provide cs106 reflection08 REFLECTION
Learning outcomes
Outcomes available for points
Learning outcomes available for project points:
Tail position. You can identify what expressions occur in tail position.
Calling conventions. You understand how a call can kill registers.
Undefined behavior. You have an idea how a translator can exploit undefined behavior.
Translation in tail position. You understand the point of having special rules for translation expressions that appear in tail position.
Totality of code generation. You know how to ensure that your translator generates code for any input—as long as there are no bugs in the compiler, code generation always succeeds.
Course design and compiler design. You can comment on the experience of splitting code generation across two modules.
This module does not include any new opportunities for depth points.
How to claim the project points
Each of the numbered learning outcomes 1 to N is worth one point, and the points can be claimed as follows:
To claim this point, identify every line of code in
codegen.sml
that pattern matches on a K-normal form expression of the form if x then e₁ else e₂. At each such line, analyze the roles of subexpressionse₁
ande₂
, and answer these two questions:At that line, are
e₁
ande₂
always in tail position, never in tail position, or potentially some of each?How do you know?
To claim this point, point to your code for
part Bpart C of step 6 (forEffect
applied to a function-call form). In a sentence or two, explain which register you use as the destination register and why you chose that register.To claim this point, identify the lines of your code that handle the case where an effectful primitive is used to set a register (part G in step 9). Explain what behavior you chose and why.
Suppose that
does not use any special rules and simply delegates to as follows:fun toReturn e = toReg returnReg e o S (A.return returnReg)
where
can be any register that won’t be overwritten bytoReg
. To claim this point,Say what changes in the generated code.
Say what Scheme program you would write to observe the effect of the change at run time.
To claim this point, identify one thing you changed in step 10 to ensure that your code generator is total. If all you had to do was add calls, refer back to the code you submitted for module 6 and note that the only uncovered cases were for calls.
This term we split code generation over two modules (6 and 8), with the implementation of call instructions (module 7) in between. I’m considering alternatives:
The course as it is:
- K-normal form
- Code generation without calls or returns
- Function calls in the SVM
- Code generation for calls and returns
Do SVM calls first:
- Function calls in the SVM
- K-normal form
- Code generation, including calls and returns
- Depth points (or other use of a free week)
Do UFT calls first:
- K-normal form
- Code generation, including calls and returns
- Function calls in the SVM
- Depth points (or other use of a free week)
To claim this point, you must say which of the potential alternatives might improve on your current experience and why. And if you prefer either of the alternatives that free up a week, you must say where in the semester that free week should be placed—and why.
You must explain your reasoning in enough depth that I get some insight into your experience.