NULL BITMAP by Justin Jaffray logo

NULL BITMAP by Justin Jaffray

Archives
Subscribe
January 12, 2026

I deserve to write at least two or three more arithmetic expression parsers

NULL BITMAP.png

Something I believe wholeheartedly is that writing simple little programs is good, and writing the same simple little programs many times is even better. I've been reading Niklaus Wirth's Compiler Construction recently and I think Wirth's whole deal really embodies this.

Wirth is notable for his belief that simple methods of implementing ideas are better than complex ones. This sounds sort of like a standard programming platitude; "simple is better than complex," that doesn't really mean anything. But Wirth was interested in practicing this in a very specific way: what if we took fundamental ideas in programming and worked very hard at finding the simplest ways we can express them?

I think in a lot of ways this way of thinking about programming is not particularly common, at least in the circles I run in. People complain a lot about the sorts of programming that feel like solving the same problem over and over, or the same kind of problem over and over. It seems rare to me, for someone to look at problems, maybe problems that have been optimized very aggressively, and see what kind of concessions can be made in the direction of a simpler solution.

I'm working through some of Compiler Construction and this post is documenting some of the stuff I've tinkered with so far.

As a consequence of being about "one-pass" compilers, the presentation of a lot of foundational ideas are a bit different than, I think, how people are often taught things like parsing today.

Parsing is often treated as the act of taking a string from some language and producing the syntax tree that corresponds to it. This almost always means producing some data structure that represents that syntax tree. Doing this is really good and useful for a lot of reasons:

  • It lets you easily hand that syntax tree off to a handful of different consumers,
  • it lets you print out that data structure in a way that makes debugging easier,
  • you can try round-tripping that data structure as a way to enable property-based testing.

However, this is not an essential part of parsing. Many parsing algorithms can be thought of not as directly producing the parse tree, but as simply letting you walk the parse tree in-place. You can leverage that to build the parse tree, sure, but you could also do something else.

If you have these two processes:

input text -> walk implicit ast to produce explicit ast
walk explicit ast -> something else

You can actually fuse them and avoid ever actually constructing the interim AST. This notably complicates the code along some dimensions: your parsing code is more tightly coupled to your output code, rather than them speaking the common language of your syntax tree data structure, but eliminating that whole notion of the reified syntax tree is sort of nice too. It should be noted that "back in the day" this was a valuable thing to do for performance reasons, not having to ever actually construct the syntax tree meant you could save a nontrivial amount of memory, though now that isn't as important, and we can discuss the design merits from a purely aesthetic view.

Let's implement this idea. Here's a simple stack language that represents arithmetic expressions:

#[derive(Debug, Copy, Clone)]
enum Instruction {
    Num(i64),
    Add,
    Sub,
    Mul,
    Div,
}

#[derive(Debug, Default)]
struct Builder {
    program: Vec<Instruction>,
}

impl Builder {
    fn num(&mut self, v: i64) {
        self.program.push(Instruction::Num(v))
    }

    fn add(&mut self) {
        self.program.push(Instruction::Add)
    }

    fn sub(&mut self) {
        self.program.push(Instruction::Sub)
    }

    fn mul(&mut self) {
        self.program.push(Instruction::Mul)
    }

    fn div(&mut self) {
        self.program.push(Instruction::Div)
    }

    fn finish(self) -> Program {
        Program::new(self.program)
    }
}

#[derive(Debug)]
struct Program {
    code: Vec<Instruction>,
    pc: usize,
    stack: Vec<i64>,
}

impl Program {
    fn new(code: Vec<Instruction>) -> Self {
        Program {
            code,
            pc: 0,
            stack: Vec::new(),
        }
    }

    fn pop(&mut self) -> Result<i64, Error> {
        if let Some(v) = self.stack.pop() {
            Ok(v)
        } else {
            Err(Error::NotEnoughValues)
        }
    }

    fn step(&mut self) -> Result<(), Error> {
        let pc = self.pc;
        self.pc += 1;
        match self.code[pc] {
            Instruction::Num(v) => self.stack.push(v),
            Instruction::Add => {
                let a = self.pop()?;
                let b = self.pop()?;
                self.stack.push(a + b);
            }
            Instruction::Sub => {
                let a = self.pop()?;
                let b = self.pop()?;
                self.stack.push(b - a);
            }
            Instruction::Mul => {
                let a = self.pop()?;
                let b = self.pop()?;
                self.stack.push(a * b);
            }
            Instruction::Div => {
                let a = self.pop()?;
                let b = self.pop()?;
                self.stack.push(b / a);
            }
        }
        Ok(())
    }

    fn run(mut self) -> Result<i64, Error> {
        while self.pc < self.code.len() {
            self.step()?;
        }
        if let Some(v) = self.stack.pop() {
            if !self.stack.is_empty() {
                return Err(Error::TooManyValues);
            }
            Ok(v)
        } else {
            Err(Error::NotEnoughValues)
        }
    }
}

This is a pretty typical Reverse Polish Notation (RPN) calculator: processing a Num pushes that number onto the stack, and processing an arithmetic operation pops the top two values off of the stack, processes them, and places the result back onto it. So expressions like (1 + 2) * 3 are written 1 2 + 3 *, then read left to right, we push 1, push 2, pop them and add them, pushing back 3, then push the literal 3, then pop the two threes, multiply them, and push back 9.

You might think of a parser for arithmetic expressions, in some sense, as a way to translate expressions in an infix notation to RPN. Parsing expressions like (1 + 2) * 3 gives us access to the operands as we have access to the operator.

One way we could implement a compiler from infix notation to our stack language would be to build a parse tree and then walk it to output the RPN, or we could just emit the RPN directly as we parse:

#[derive(Debug)]
struct Parser<I: Iterator<Item = Result<Token, Error>>> {
    builder: Builder,
    tokens: Peekable<I>,
}

impl<I: Iterator<Item = Result<Token, Error>>> Parser<I> {
    fn new(input: I) -> Self {
        Parser {
            builder: Builder::default(),
            tokens: input.peekable(),
        }
    }

    fn expect(&mut self, tok: Token) -> Result<(), Error> {
        let v = self.tokens.next();
        if v != Some(Ok(tok)) {
            Err(Error::ParseError)
        } else {
            Ok(())
        }
    }

    fn factor(&mut self) -> Result<(), Error> {
        match self.tokens.next() {
            Some(Ok(Token::Lparen)) => {
                self.expr()?;
                self.expect(Token::Rparen)?;
            }
            Some(Ok(Token::Num(v))) => {
                self.builder.num(v);
            }
            _ => return Err(Error::ParseError),
        }

        Ok(())
    }

    fn term(&mut self) -> Result<(), Error> {
        self.factor()?;
        while self.tokens.peek() == Some(&Ok(Token::Mul))
            || self.tokens.peek() == Some(&Ok(Token::Div))
        {
            let token = self.tokens.next().unwrap()?;
            self.factor()?;
            match token {
                Token::Mul => self.builder.mul(),
                Token::Div => self.builder.div(),
                _ => unreachable!(),
            }
        }

        Ok(())
    }

    fn expr(&mut self) -> Result<(), Error> {
        self.term()?;
        while self.tokens.peek() == Some(&Ok(Token::Add))
            || self.tokens.peek() == Some(&Ok(Token::Sub))
        {
            let token = self.tokens.next().unwrap()?;
            self.expr()?;
            match token {
                Token::Add => self.builder.add(),
                Token::Sub => self.builder.sub(),
                _ => unreachable!(),
            }
        }

        Ok(())
    }

    fn run(mut self) -> Result<Program, Error> {
        self.expr()?;
        if self.tokens.peek().is_some() {
            return Err(Error::ParseError);
        }
        Ok(self.builder.finish())
    }
}

I won't bother going too in-depth about how the parser works because it's pretty traditional, but I wanted to highlight that we don't actually have to reify the parse tree to make this work: the parser implicitly gives us a walk of the syntax tree as we read it, which is pretty cool.

I suspect this is probably not how you'd want to write a production parser in 2026: memory is cheap (well, not really...) and we can afford to fit the whole thing into memory if we want to, and the organizational benefits for larger programs are pretty good: the syntax tree is a nice object that you can standardize and pass around between different subcomponents. But if you're going to treat "a compiler" as a small little project that you can bang out in a few hundred lines of code and not as a production object, it's a pretty compelling way to structure this sort of thing.

I might keep writing about this as I read the book more!

Don't miss what's next. Subscribe to NULL BITMAP by Justin Jaffray:

Add a comment:

GitHub
https://justinj...
Bluesky
Twitter
Powered by Buttondown, the easiest way to start and grow your newsletter.