Delayed Code Generation

pretty good

        January 19, 2026

Delayed Code Generation

        Last week I talked about my forays into Compiler Construction (1996). This week, I have worked through more of the book and I'm going to share some of my explorations on Wirth's approach to constant folding (and some other optimizations).
Part of the guiding principle of how this book wants you to design a compiler is that we should act locally to the extent that we can. No materializing syntax trees in memory, just walk the parse tree as we parse it and shove instructions out from there as directly as you can.
This is cool, and can lead to some very short, coupled, and more-or-less locally understandable code that I like a lot. It think it's a great choice for a toy compiler. I've extended the calculator from last week into a slightly more real compiler. I can write real programs now. Here's one of my test cases:
let x = 4;
while x != 0 {
    x = x - 1;
    print(x);
}

Which runs and outputs the expected 3 2 1 0.
Our stack machine has grown a few new instructions:
#[derive(Debug, Copy, Clone)]
enum Instruction {
    Num(i64),
    Add,
    AddI(i64),
    Sub,
    SubI(i64),
    Mul,
    MulI(i64),
    Div,
    DivI(i64),
    Neq,
    Eq,
    And,
    Or,
    Ld(usize),
    St(usize),
    Jmp(usize),
    JmpNZ(usize),
    JmpZ(usize),
    Swap,

    Dummy,
    DebugPrint,
}

Before, we generated code directly: walking the parse tree and emitting values as we encounter them. We can see this when we compile an expression like (x - y) * z:
let x = 1;
let y = 2;
let z = 3;
print((x - y) * z);

Compiled code:

  0 num 1
  1 st 0
  2 num 2
  3 st 8
  4 num 3
  5 st 16
  6 ld 0
  7 ld 8
  8 sub
  9 ld 16
 10 mul
 11 debug_print

Execution result:
===
-3
===

This compiled program loads x into address 0, y into address 8, and z into address 16, then we walk the parse tree, which sees us looking at the operands like this and emitting their corresponding code:
x -> ld 0
y -> ld 8
- -> sub
z -> ld 16
* -> mul

We can generate this with effectively the same code we had last week, modified slightly to be able to load variables (stuff like addr_for is uninteresting and you can probably guess what it does):
fn factor(&mut self) -> Result<(), Error> {
    match self.tokens.next() {
        Some(Ok(Token::Lparen)) => {
            self.expr()?;
            self.expect(Token::Rparen)?;
        }
        Some(Ok(Token::Num(v))) => {
            self.builder.num(v);
        }
        Some(Ok(Token::Ident(id))) => {
            self.builder.ld(self.addr_for(&id)?);
        }
        _ => return Err(Error::ParseError),
    }

    Ok(())
}

fn term(&mut self) -> Result<(), Error> {
    self.factor()?;
    while self.tokens.peek() == Some(&Ok(Token::Mul))
        || self.tokens.peek() == Some(&Ok(Token::Div))
    {
        let token = self.tokens.next().unwrap()?;
        self.factor()?;
        match token {
            Token::Mul => self.builder.mul(),
            Token::Div => self.builder.div(),
            _ => unreachable!(),
        }
    }

    Ok(())
}

fn expr(&mut self) -> Result<(), Error> {
    self.term()?;
    while self.tokens.peek() == Some(&Ok(Token::Add))
        || self.tokens.peek() == Some(&Ok(Token::Sub))
    {
        let token = self.tokens.next().unwrap()?;
        self.expr()?;
        match token {
            Token::Add => self.builder.add(),
            Token::Sub => self.builder.sub(),
            _ => unreachable!(),
        }
    }

    Ok(())
}

However, we might notice that for certain patterns, we emit bad code:
let x = 1;
print(x + 1);

Compiled code:

  0 num 1
  1 st 0
  2 ld 0
  3 num 1
  4 add
  5 debug_print
  6 num 3
  7 debug_print

Execution result:
===
2
===

This code is correct, but inefficient. We now have an addi instruction that should let us simplify this to:
Compiled code:

  0 num 1
  1 st 0
  2 ld 0
  3 addi 1
  4 debug_print
  5 num 3
  6 debug_print

Execution result:
Result:
===
2
===

Rather than explicitly putting the 1 onto the stack and then performing an add (of course, with a fully optimizing compiler we could constant-fold the whole thing, but for now we'd be happy just to get the addi version).
Even worse, we don't have any kind of ability to constant-fold with this structure:
print(2 + 3);

Compiled code:

  0 num 2
  1 num 3
  2 add
  3 debug_print

Execution result:
Result:
===
5
===

Rather than directly emitting 5.
We could fix this by doing a pass over the code after-the-fact and looking for these kinds of opportunities to fix up, or we could fix it by reifying the entire tree and looking for patterns, and then using something like the Kimballizer. But remember we're following the structure of Compiler Construction, which means we want to make exactly one pass over the code, and immediately shove out instructions to the finished program. This means we don't want to emit something bad and then fix it up later, and we also don't mind if we emit something that's suboptimal. We just want to get pretty good code on the first try.
Wirth uses a technique called delayed code generation to do this.
Currently, the contract is that if we call a parsing function like factor, we'll parse a factor, and generate code to place the result on the stack. The observation is that this is too eager: it might be that that value is a constant we're going to immediately add it to some other value, meaning it would be better to use an addi instruction.
Consider the expression x + 1. Using straight code generation, our parser will visit these terms in the order x 1 +. The execution will look like:
expr()
  term()
    factor()
      x -> ld x
    factor()
      1 -> num 1
    +   -> add

The idea behind delayed code generation is to not immediately emit code upon calling factor, but to return a value that represents where that value is. Wirth calls this kind of value an Item:
enum Item {
    Var(usize),
    Const(i64),
    Stack,
}

When we parse a factor, we might place the value in one of three different places: it might just be a variable living at some memory address, or it might be a constant, in which case we haven't placed it anywhere yet, or it might be on the stack.
We can force an Item to be on the stack with load:
fn load(&mut self, x: Item) {
    match x {
        Item::Var(var) => self.builder.ld(var),
        Item::Const(v) => self.builder.num(v),
        Item::Stack => {}, // Nothing to do.
    }
}

We can update factor to reflect this:
fn factor(&mut self) -> Result<Item, Error> {
    match self.tokens.next() {
        Some(Ok(Token::Lparen)) => {
            let item = self.expr()?;
            self.expect(Token::Rparen)?;
            Ok(item)
        }
        Some(Ok(Token::Num(v))) => Ok(Item::Const(v)),
        Some(Ok(Token::Ident(id))) => Ok(Item::Var(self.addr_for(&id)?)),
        v => Err(Error::ParseError(format!(
            "unexpected {:?} while parsing factor",
            v
        ))),
    }
}

Let's go through each of these branches to see what they do.
When we see a (, we parse an expr. expr also returns an Item telling us where to find its value, so we can just return that.
When we see a number, we don't emit any code or anything, we just say "the value I parsed is fully expressed by this constant," and return Item::Const.
When we see an identifier, we do the same thing, but return the address for that variable.
Note that none of these options have explicitly done anything yet. We have to go one level higher up to see that. Let's look at term:
fn term(&mut self) -> Result<Item, Error> {
    let mut lhs = self.factor()?;
    while self.tokens.peek() == Some(&Ok(Token::Mul))
        || self.tokens.peek() == Some(&Ok(Token::Div))
    {
        let token = self.tokens.next().unwrap()?;
        let rhs = self.factor()?;
        lhs = self.put_op(lhs, rhs, token)
    }

    Ok(lhs)
}

For the term
a * b * c

We are computing:
put_op(put_op(a, b, *), c, *)

put_op is where we actually emit instructions. It takes two Items and an operator and returns a new Item that contains the result of combining those two items with the operator. The power here is that we can look at the different cases to do some light optimization.
We can implement it like this (NOTE: this code has a subtle bug that we'll fix in a moment):
fn put_op(&mut self, lhs: Item, rhs: Item, op: Token) -> Item {
    match (lhs, rhs, op) {
        // Two constants, we can just compute the result and store the
        // result in another constant.
        (Item::Const(a), Item::Const(b), op) => {
            Item::Const(op.apply(a, b).expect("valid instruction"))
        }
        // Constant applied directly to an item, we can use an immediate instruction if one exists.
        (lhs, Item::Const(v), op) => {
            self.load(lhs);
            if let Some(ins) = op.immediate_instruction(v) {
                self.builder.push(ins);
            } else {
                // No immediate instruction for this operator, just use
                // the normal one.
                self.builder.num(v);
                self.builder
                    .push(op.instruction().expect("valid instruction"))
            }
            Item::Stack
        }
        // General case, we have to load both items.
        (lhs, rhs, op) => {
            self.load(lhs);
            self.load(rhs);
            self.builder
                .push(op.instruction().expect("valid instruction"));

            Item::Stack
        }
    }
}

And we have the corresponding methods in Token:
fn apply(&self, a: i64, b: i64) -> Option<i64> {
    Some(match self {
        Token::Add => a + b,
        Token::Sub => a - b,
        Token::Mul => a * b,
        Token::Div => a / b,
        Token::Eq => {
            if a == b {
                1
            } else {
                0
            }
        }
        Token::Neq => {
            if a != b {
                1
            } else {
                0
            }
        }
        Token::And => {
            if a != 0 && b != 0 {
                1
            } else {
                0
            }
        }
        Token::Or => {
            if a != 0 || b != 0 {
                1
            } else {
                0
            }
        }
        _ => return None,
    })
}

fn instruction(&self) -> Option<Instruction> {
    Some(match self {
        Token::Add => Instruction::Add,
        Token::Sub => Instruction::Sub,
        Token::Mul => Instruction::Mul,
        Token::Div => Instruction::Div,
        Token::Eq => Instruction::Eq,
        Token::Neq => Instruction::Neq,
        Token::And => Instruction::And,
        Token::Or => Instruction::Or,
        _ => return None,
    })
}

fn immediate_instruction(&self, v: i64) -> Option<Instruction> {
    Some(match self {
        Token::Add => Instruction::AddI(v),
        Token::Sub => Instruction::SubI(v),
        Token::Mul => Instruction::MulI(v),
        Token::Div => Instruction::DivI(v),
        _ => return None,
    })
}

And this almost works. Let's focus on what it does before we jump into what's broken (hint: it's not a problem with Wirth's version, which uses fixed registers but treats them as a stack).
Understanding what's actually going on with delayed code generation was initially a bit tricky for me, so I think it will be instructive to look at handful of cases to see the code that gets generated.
Example 1: Constant Folding
print(1 + 1);

Compiled code:

  0 num 2
  1 debug_print

Execution result:
===
2
===

Production:
call expr()
  call term()
    call factor() -> Item::Const(1)
    call factor() -> Item::Const(1)
    call put_op(Item::Const(1), Item::Const(1), +)
      -> return Item::Const(2)
call load() to put Item::Const(2) on the stack, which emits `num 2`.

Example 2: Immediate Instruction
let x = 1;
print(x + 1);

Compiled code:

  0 num 1
  1 st 0
  2 ld 0
  3 addi 1
  4 debug_print

Execution result:
===
2
===

Production:
call expr()
  call term()
    call factor() -> Item::Var(x)
    call factor() -> Item::Const(1)
    call put_op(Item::Var(x), Item::Const(1), +)
      -> call load(Item::Var(x)) which emits ld 0
         emit addi 1
         returns Item::Stack
call load(), which is a noop since the result is already on the stack.

Example 3: Variables
let x = 1;
let y = 2;
print(x + y);

Compiled code:

  0 num 1
  1 st 0
  2 num 2
  3 st 8
  4 ld 0
  5 ld 8
  6 add
  7 debug_print

Execution result:
===
3
===

Production:
call expr()
  call term()
    call factor() -> Item::Var(x)
    call factor() -> Item::Var(y)
    call put_op(Item::Var(x), Item::Var(y), +)
      -> call load(Item::Var(x)) which emits ld 0
         call load(Item::Var(y)) which emits ld 8
         emit add
         returns Item::Stack
call load(), which is a noop since the result is already on the stack.

Example 4: The buggy case
Let's look at the code generated by this program:
let a = 10;
let b = 3;
let c = 2;

print(a - (b + c));

The result should, of course, be 5. But if we look at the output, we'll see that's not the case:
Compiled code:

  0 num 10
  1 st 0
  2 num 3
  3 st 8
  4 num 2
  5 st 16
  6 ld 8
  7 ld 16
  8 add
  9 ld 0
 10 sub
 11 debug_print

Execution result:
===
-5
===

Inspecting the code, we can see the problem. In the second call to put_op, we hit the general case:
// General case, we have to load both items.
(lhs, rhs, op) => {
    // This code is assumed to put lhs on the stack, then rhs on the stack:
    self.load(lhs); // Item::Var, emits a load
    self.load(rhs); // Item::Stack, no-op

    // Now lhs is on the stack *above* rhs.

    self.builder
        .push(op.instruction().expect("valid instruction"));

    Item::Stack
}

This makes the state of our stack [rhs, lhs] instead of [lhs, rhs]. This is fine when we have commutative operations like add, but broken for non-commutative operations like sub.
The solution is simple: just check for this case and then swap the two values:
// There's a small problem here, which is that if rhs is already
// on the stack and lhs is not, we'll wind up with [rhs, lhs],
// instead of the expected [lhs, rhs], which matters for
// non-commutative operators.
if matches!(rhs, Item::Stack) && !matches!(lhs, Item::Stack) && !op.is_commutative()
{
    self.builder.swap();
}

This fixes the program:
Compiled code:

  0 num 10
  1 st 0
  2 num 3
  3 st 8
  4 num 2
  5 st 16
  6 ld 8
  7 ld 16
  8 add
  9 ld 0
 10 swap
 11 sub
 12 debug_print

Execution result:
===
5
===

This isn't a problem in Wirth's register version since the stack items there include an explicit position in the stack that they live at, so the order that they're loaded doesn't matter. In our case, there's a bit more we have to be careful about.
That's all I got for this issue. I'm just over here writing my goofy little compiler. I've always heard delayed code generation is a cool technique and I'm glad to have finally worked through it for myself. I think the original Wirth explanation is a bit hard to follow because of the mixture of input and output parameters in the Oberon code, but replacing it with a modern programming language makes things a lot clearer for me.

                            Don't miss what's next. Subscribe to NULL BITMAP by Justin Jaffray:

            Email address (required)

          Add a comment: