step x
in
titles.cargo new --bin arithmetic
Project layout is:
. └── arithmetic ├── Cargo.toml └── src └── main.rs
File src/arithmetic.rustemo
E: E '+' E
| E '*' E
| '(' E ')'
| Number
;
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
cargo install rustemo-compiler
After the installation completes the compiler (rcomp
CLI tool) is available.
cd arithmetic/src/
rcomp arithmetic.rustemo
Generating parser for grammar "arithmetic.rustemo" Terminals: 6 Non-terminals: 1 Productions: 4 States: 10 CONFLICTS: In State 8:E 1: E: E Add E . {STOP, Add, Mul, PClose} 1: E: E . Add E {STOP, Add, Mul, PClose} 2: E: E . Mul E {STOP, Add, Mul, PClose} When I saw E and see token Add ahead I can't decide should I shift or reduce by production: E: E Add E In State 8:E 1: E: E Add E . {STOP, Add, Mul, PClose} 1: E: E . Add E {STOP, Add, Mul, PClose} 2: E: E . Mul E {STOP, Add, Mul, PClose} When I saw E and see token Mul ahead I can't decide should I shift or reduce by production: E: E Add E ... 4 conflict(s). 4 Shift/Reduce and 0 Reduce/Reduce. Error: Grammar is not deterministic. There are conflicts. Parser(s) not generated.
This grammar is ambiguous!
cd arithmetic/src
rcomp --dot arithmetic.rustemo
dot -Tpng -O arithmetic.dot
We get a PNG file with the DFA.
States colored red have conflicts.
1 + 2 * 3 + 4
1 + 2 * 3 + 4
While the grammar is ambiguous, the language is not!
A classic approach - grammar rework by introducing intermediate rules:
AddExpr: MulExpr AddExprRest*;
AddExprRest: Add MulExpr;
MulExpr: AtomExpr MulExprRest*;
MulExprRest: Mul AtomExpr;
AtomExpr: Number | '(' AddExpr ')';
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
We stay in the CFG realm, but this quickly leads to unmaintainable grammar + produced trees are not what the user expects.
We’ll just let the grammar be without any disambiguation.
E: E '+' E
| E '*' E
| '(' E ')'
| Number
;
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
Now, lets compile the grammar to a GLR parser.
rcomp --parser-algo glr arithmetic.rustemo
Generating parser for grammar "arithmetic.rustemo" Terminals: 6 Non-terminals: 1 Productions: 4 States: 10 Writing actions file "arithmetic_actions.rs" Writing parser file "arithmetic.rs"
Generated parser depends on rustemo
library. It is a runtime part of the Rustemo.
cargo add rustemo
Updating crates.io index Adding rustemo v0.6.0 to dependencies Features: + glr Updating crates.io index
Generated code also uses regex
, once_cell
and colored
libraries, so lets add those.
cargo add regex --no-default-features --features std,unicode-perl
cargo add once_cell colored
Now, your Cargo.toml
should look like this:
[package]
name = "arithmetic"
version = "0.1.0"
edition = "2021"
[dependencies]
colored = "2.1.0"
once_cell = "1.20.1"
regex = { version = "1.11.0", default-features = false, features = ["std", "unicode-perl"] }
rustemo = "0.6.1"
main.rs
Read expression from stdin.
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
}
main.rs
Import generated parser modules and the parser type.
use rustemo::Parser;
use std::io;
// Use the generated parser
use crate::arithmetic::ArithmeticParser;
// Include generated modules
#[rustfmt::skip]
mod arithmetic;
#[allow(unused)]
#[rustfmt::skip]
mod arithmetic_actions;
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
}
main.rs
(step 6)Parse the input and process the forest.
use rustemo::Parser;
use std::io;
// Use the generated parser
use crate::arithmetic::ArithmeticParser;
// Include generated modules
#[rustfmt::skip]
mod arithmetic;
#[allow(unused)]
#[rustfmt::skip]
mod arithmetic_actions;
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
// Parse the line and get all results.
let forest = ArithmeticParser::new().parse(&expression).unwrap();
println!("Number of interpretations = {}", forest.solutions());
for (tree_idx, tree) in forest.iter().enumerate() {
println!("**** TREE {tree_idx}");
println!("{tree:#?}");
}
}
Run with:
cargo run
Expression: 2 + 3 * 5 Number of interpretations = 2 **** TREE 0 [ [ [ "2", ], "+", [ "3", ], ], "*", [ "5", ], ] **** TREE 1 [ [ "2", ...
File arithmetic_actions.rs
<snip>
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
token.value.into()
}
#[derive(Debug, Clone)]
pub struct EC1 {
pub e_1: Box<E>,
pub e_3: Box<E>,
}
<snip>
pub fn e_c1(_ctx: &Ctx, e_1: E, e_3: E) -> E {
E::C1(EC1 {
e_1: Box::new(e_1),
e_3: Box::new(e_3),
})
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
E::Number(number)
}
This just creates an AST. But, we would like actions to perform actual calculation. Let’s do that!
E: E '+' E {Add}
| E '*' E {Mul}
| '(' E ')' {Paren}
| Number {Num}
;
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
E: left=E '+' right=E {Add}
| left=E '*' right=E {Mul}
| '(' E ')' {Paren}
| Number {Num}
;
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
rcomp
run.
In this case we don’t have manual changes so we just delete
arithmentic_actions.rs
and run rcomp
again.
rm arithmetic_actions.rs
rcomp --parser-algo glr arithmentic.rustemo
And, now our actions are more readable.
pub type Token<'i> = RustemoToken<'i, Input, TokenKind>;
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
token.value.into()
}
#[derive(Debug, Clone)]
pub struct Add {
pub left: Box<E>,
pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Mul {
pub left: Box<E>,
pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
Add(Add),
Mul(Mul),
Paren(Box<E>),
Num(Number),
}
And, now our actions are more readable.
pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E {
E::Add(Add {
left: Box::new(left),
right: Box::new(right),
})
}
pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E {
E::Mul(Mul {
left: Box::new(left),
right: Box::new(right),
})
}
pub fn e_paren(_ctx: &Ctx, e: E) -> E {
E::Paren(Box::new(e))
}
pub fn e_num(_ctx: &Ctx, number: Number) -> E {
E::Num(number)
}
We can now manually tune our actions to do calculations during the build. We don’t want to build the tree, we want to get the result of the calculation.
First, we change this:
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
token.value.into()
}
Into this:
pub type Number = i32;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
token.value.parse().unwrap()
}
to convert numbers from strings to integers.
We replace all these types.
#[derive(Debug, Clone)]
pub struct Add {
pub left: Box<E>,
pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Mul {
pub left: Box<E>,
pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
Add(Add),
Mul(Mul),
Paren(Box<E>),
Num(Number),
}
With just:
pub type E = i32;
Now, we replace actions bodies with a proper calculations.
We replace this:
pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E {
E::Add(Add {
left: Box::new(left),
right: Box::new(right),
})
}
pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E {
E::Mul(Mul {
left: Box::new(left),
right: Box::new(right),
})
}
pub fn e_paren(_ctx: &Ctx, e: E) -> E {
E::Paren(Box::new(e))
}
pub fn e_num(_ctx: &Ctx, number: Number) -> E {
E::Num(number)
}
Now, we replace actions bodies with a proper calculations.
With this:
pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E {
left + right
}
pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E {
left * right
}
pub fn e_paren(_ctx: &Ctx, e: E) -> E {
e
}
pub fn e_num(_ctx: &Ctx, number: Number) -> E {
number
}
Run it again:
cargo run
Change our main.rs
to build all trees from the forest using our new calculation
actions.
Change from this:
use rustemo::Parser;
use std::io;
// Use the generated parser
use crate::arithmetic::ArithmeticParser;
// Include generated modules
#[rustfmt::skip]
mod arithmetic;
#[allow(unused)]
#[rustfmt::skip]
mod arithmetic_actions;
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
// Parse the line and get all results.
let forest = ArithmeticParser::new().parse(&expression).unwrap();
println!("Number of interpretations = {}", forest.solutions());
for (tree_idx, tree) in forest.iter().enumerate() {
println!("**** TREE {tree_idx}");
println!("{tree:#?}");
}
}
To this:
use rustemo::Parser;
use std::io;
// Use the generated parser and builder
use crate::arithmetic::{ArithmeticParser, DefaultBuilder};
// Include generated modules
#[rustfmt::skip]
mod arithmetic;
#[allow(unused)]
#[rustfmt::skip]
mod arithmetic_actions;
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
// Parse the line and get all results.
let forest = ArithmeticParser::new().parse(&expression).unwrap();
println!("Number of interpretations = {}", forest.solutions());
// Evaluate each tree from the forest
let results = forest
.into_iter()
.map(|tree| {
let mut builder = DefaultBuilder::new();
tree.build(&mut builder)
})
.collect::<Vec<_>>();
println!("{results:?}");
}
To this:
// Parse the line and get all results.
let forest = ArithmeticParser::new().parse(&expression).unwrap();
println!("Number of interpretations = {}", forest.solutions());
// Evaluate each tree from the forest
let results = forest
.into_iter()
.map(|tree| {
let mut builder = DefaultBuilder::new();
tree.build(&mut builder)
})
.collect::<Vec<_>>();
println!("{results:?}");
But, this language is not ambiguous. We have priorities and associativities. Let’s disambiguate by using Rustemo declarative disambiguation so we can use LR instead of GLR.
E: left=E '+' right=E {Add}
| left=E '*' right=E {Mul}
| '(' E ')' {Paren}
| Number {Num}
;
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
E: left=E '+' right=E {Add, 1, left}
| left=E '*' right=E {Mul, 2, left}
| '(' E ')' {Paren}
| Number {Num}
;
terminals
Number: /\d+/;
Add: '+';
Mul: '*';
POpen: '(';
PClose: ')';
Now, we can use LR.
rcomp arithmetic.rustemo
Generating parser for grammar "arithmetic.rustemo" Terminals: 6 Non-terminals: 1 Productions: 4 States: 10 Writing actions file "arithmetic_actions.rs" Writing parser file "arithmetic.rs"
main.rs
for LRNow, we don’t get a forest from the LR parser but the result of the default builder evaluation. Let’s fix that.
main.rs
for LR
We change main.rs
from this:
use rustemo::Parser;
use std::io;
// Use the generated parser and builder
use crate::arithmetic::{ArithmeticParser, DefaultBuilder};
// Include generated modules
#[rustfmt::skip]
mod arithmetic;
#[allow(unused)]
#[rustfmt::skip]
mod arithmetic_actions;
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
// Parse the line and get all results.
let forest = ArithmeticParser::new().parse(&expression).unwrap();
println!("Number of interpretations = {}", forest.solutions());
// Evaluate each tree from the forest
let results = forest
.into_iter()
.map(|tree| {
let mut builder = DefaultBuilder::new();
tree.build(&mut builder)
})
.collect::<Vec<_>>();
println!("{results:?}");
}
main.rs
for LR (step 10)To this:
use rustemo::Parser;
use std::io;
// Use the generated parser
use crate::arithmetic::ArithmeticParser;
// Include generated modules
#[rustfmt::skip]
mod arithmetic;
#[allow(unused)]
#[rustfmt::skip]
mod arithmetic_actions;
fn main() {
let mut expression = String::new();
// Read the line from the input
println!("Expression:");
io::stdin()
.read_line(&mut expression)
.expect("Failed to read line.");
// Parse the line and get the result.
let result = ArithmeticParser::new().parse(&expression).unwrap();
println!("Result = {result}");
}
Thanks!
Q&A