Writing a Transpiler, Pt. 2
The long awaited sequel to part 1!
If you missed the first one, you can view it here.
To give a short recap, we talked about why Rust to OCaml, how a compiler/transpiler works under the hood, and how the the Rust syn
crate handles the Abstract Syntax Tree.
Now we get to go over how we've implemented it so far.
Starting the project
Since we knew what the AST looked like, we needed a way to convert it over. Before that, however, we needed a place to store our converted code.
We started with the simplest line we wanted to parse:
pub const AF_UNSPEC: ::c_int = 0;
Which we want to translate to an OCaml let
statement:
let AP_UNSPEC : c_int = 0
Before we can do anything, we need to read in our rust files and output them to ocaml files:
fn main() {
let filename = "src/empty.rs";
let mut file = File::open(&filename).expect("Unable to open file");
let mut src = String::new();
file.read_to_string(&mut src).expect("Unable to read file");
let syntax = syn::parse_file(&src).expect("Unable to parse file");
println!("{:#?}", syntax);
}
Parsing to OCaml
In order to do that, we need to start our own OCaml AST:
#[derive(Debug)]
enum OCaml {
Let {
name: String,
},
}
We started with the simplest part. We asked ourselves "Can we just take the name of the constant and convert it?"
So we wanted this output:
let AF_SPEC = 0
By focusing on the simplest problem, we were able to come up with a couple functions that parsed the Rust code into OCaml code, currently outputted as a strings.
fn rust_item_to_ocaml_item(item: syn::Item) -> Option<OCaml> {
match item {
Item::Const(ItemConst { ident: name, .. }) => Some(OCaml::Let {
name: format!("{}", name),
}),
_ => None,
}
}
fn ocaml_item_to_ocaml_code(ocaml: OCaml) -> String {
match ocaml {
OCaml::Let { name } => format!("let {} = 0", name),
}
}
Now all we had to do is collect our string objects and output them to a file.
fn main() {
...
let syntax_items: String = syntax
.items
.into_iter()
.flat_map(rust_item_to_ocaml_item)
.map(ocaml_item_to_ocaml_code)
.collect::<Vec<String>>()
.join("\n");
println!("{:#?}", syntax_items);
...
}
Adding layers
We had the basic flow down: figure out which token we need to parse, focus on creating the code to support that parsing, and output it towards our file.
We began adding layers to your syntax tree:
#[derive(Debug)]
enum OCaml {
Let {
name: String,
ty: Option<String>,
value: String,
},
}
To make it easier on ourselves, we also began implementing the Display
trait so we didnt have to convert the strings manually.
impl Display for OCaml {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
OCaml::Let { name, ty, value } => {
if let Some(ty) = ty {
write!(f, "let {} : {} = {}", name, ty, value)
} else {
write!(f, "let {} = {}", name, value)
}
}
}
}
}
One by one, we would look at the syntax tree syn
gave us, picked a type to translate, and focused on writing the code to getting that done.
For example, an OCaml literal being a String
#[derive(Debug)]
enum OCamlLiteral {
Number(String),
}
impl Display for OCamlLiteral {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
OCamlLiteral::Number(int) => write!(f, "{}", int),
}
}
}
We would create a new enum for a type, represent it how we wanted it to look in OCaml, and let Display handle the string output.
However, the code quickly became a mess, since we had enums with their display types all over the place outputting to strings.
Instead, we refactored and went with the From
and Into
traits to turn Rust types into the OCaml types we define:
pub enum OCamlExpr {
Literal(OCamlLiteral),
Path(Vec<String>),
Unary(Box<OCamlUnary>),
Binary(Box<OCamlBinary>),
}
impl From<&syn::Expr> for OCamlExpr {
fn from(value: &syn::Expr) -> Self {
match value {
syn::Expr::Lit(syn::ExprLit { lit, .. }) => OCamlExpr::Literal(lit.into()),
syn::Expr::Path(syn::ExprPath { path, .. }) => OCamlExpr::Path(SynPath(path).into()),
syn::Expr::Unary(unary) => OCamlExpr::Unary(Box::new(unary.into())),
syn::Expr::Binary(expr) => OCamlExpr::Binary(Box::new(expr.into())),
_ => todo!("{:#?} is not implemented", value),
}
}
}
This made it easier to separate the translating code and the output code.
So, in a nutshell, our new workflow became:
- Choose Rust syn type to parse
- Take smallest part and parse it
- Create an enum for that type
- Turn Rust type to OCaml type using
From
- Output using
Display
on enum type
- Move onto the next type
What next?
We're still parsing the Rust AST.
Its a lot of work that takes time, but we're take it one small step at a time.
If you're interested in viewing the progress on it, you can find the repo here!
Hopefully this letter provided some insight into how we're parsing Rust to OCaml code.
Its not extensive by any means, since the project is large.
If you have any questions, of course let me know in an email or on Twitter
Until then, have a great week!
Glitchbyte