Calculator
This tutorial assumes that you have Rustemo CLI properly installed. Refer to
the CLI section if you have trouble running rcomp
command.
In this tutorial we'll create a simple calculator with 4 arithmetic operations:
+
, -
, *
, and /
.
We would like to parse and calculate expressions like this:
8 + 4 / 2 - 3.2 * 2
This tutorial is rather lengthy as it covers all aspect of usage workflow, from creating Rust project, Rustemo grammar file to implementing types and actions. Also, this tutorial tries to be a gentle introduction providing all intermediate steps in details. Thus, this should serve as a full introduction and a prerequisite for the following tutorials as it explains the usual workflow in working with Rustemo. Following tutorials will assume that the user is familiar with the usual workflow.
Create the project
We assume that we are building a small demo program which accepts an expression from the user and prints the result or informative error if the expression doesn't conform to the grammar.
So, lets first build a new project:
cargo new --bin calculator
In this project we'll create a calculator.rustemo
cd calculator/src
touch calculator.rustemo
In the rest of the tutorial we assume that grammar is written in the above
calculator.rustemo
file.
The grammar
To parse such expressions we start with a grammar that describe the syntax and lexical structure of our expressions.
We start by a simple idea that each operation is binary, consisting of two operand and an infix operator:
<left operand> <operator> <right operand>
So, let's write that down in Rustemo grammar notation. First, we may say that our base expression is a single operation:
Expression: Operand Operator Operand;
Our operands are numbers. Let's define a lexical syntax for operands. We do that
in a grammar section that starts with the terminals
keyword. This section
should go after the main part (syntax part) of the grammar. For lexical
definitions, we either use plain strings if the content is fixed, or regular
expressions, if the content is different between lexemes. See the section on
terminals for the explanation.
In this case we have numbers which differs but we can encode their structure using regular expression:
terminals
Operand: /\d+(\.\d+)?/;
So, our operand should have at least one digit, after which optionally follows a dot and more digits. We could make this more elaborate but let's make it simple for now.
And what is the operator
? It can be any of the +
, -
, *
, /
so we can
encode that in a regex also:
Operator: /\+|-|\*|\//;
Symbols +
and *
have a special interpretation in regular expressions so we
must escape them. Symbol /
is used to start/end the regex in Rustemo so we
must escape that also.
Now, our full grammar is:
Expression: Operand Operator Operand;
terminals
Operand: /\d+(\.\d+)?/;
Operator: /\+|-|\*|\//;
The problem with our grammar is that the only thing we could ever parse with it
is a single operation, like 3 + 4
. If we add more operations our parser, built
with this grammar, won't work anymore. We'll see how to extend the parser later
but let's now move on.
Generating the parser
Let's run rcomp
command to generate the parser code from the grammar:
rcomp calculator.rustemo
If you get no output there were no errors. If you made an error in the grammar you will get a report with the line and column where the error was and what is expected at that location.
For example, if we forget colon after the rule name we get:
Error at calculator.rustemo:1:11:
...Expression -->Operand Operato...
Expected one of Colon, OBrace.
Parser(s) not generated.
After the parser generator is run successfully you should have files
calculator.rs
and calculator_actions.rs
generated.
File calculator.rs
is the parser while calculator_actions.rs
in the default
configuration contains deduced types for the AST (Abstract Syntax Tree)
together with functions/actions used by the builder during the parsing process
to construct the AST.
calculator.rs
is regenerated whenever you run rcomp
command.
Actions in file calculator_actions.rs
, on the other hand, are not fully
regenerated. Only missing actions/types will be regenerated. This enables you to
provide manual modifications of the actions/types as long as you retain the same
name. As we shall soon see, this is a nice feature which provides a quick way to
have a working parser with default AST output which can be later tuned as needed.
Adding dependencies
Our generated parser code calls Rustemo code so we must add rustemo
crate as a
dependency:
cargo add rustemo
Your Cargo.toml
should look like this:
[package]
name = "calculator1"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
# A relative path to rustemo crate is used here for usage in the rustemo project tree.
# In your projects you should just specify the version.
rustemo = { version = "0.7", path = "../../../../../rustemo" }
Running the parser
Now, let's open the main.rs
file and write this for the header:
#![allow(unused)] fn main() { use rustemo::Parser; use std::io; // Use the generated parser use crate::calculator::CalculatorParser; // Include generated modules #[rustfmt::skip] mod calculator; #[allow(unused)] #[rustfmt::skip] mod calculator_actions; }
and this as the main
function which will serve as entry point for our program:
fn main() { let mut expression = String::new(); // Read the line from the input println!("Expression:"); io::stdin() .read_line(&mut expression) .expect("Failed to read line."); // Parse the line and get the result. let result = CalculatorParser::new().parse(&expression); // Print the result using Debug formatter. println!("{:#?}", result); }
So, our initial program will accept the string from the console input, parse it
using the parser generated in the calculator
module and print the result.
Run the program:
cargo run
You should see a prompt which expect you to enter the expression. If we enter:
2 + 3
We get debug output from the parser and at the end you should see:
Action: Accept
Ok(
Expression {
operand_1: "2",
operator: "+",
operand_3: "3",
},
)
But if you make a mistake, like:
2 + / 3
You get:
Err(
Error {
message: "...2 + -->/ 3\n...\nExpected Operand.",
file: Some(
"<str>",
),
location: Some(
Location {
start: LineBased(
LineBased {
line: 1,
column: 4,
},
),
end: None,
},
),
},
)
Congrats! You have made your first Rustemo parser!
Although, it still have many issues which we'll fix in the rest of the tutorial.
Extending the grammar
Ok, let's see can we parse the example expression from the beginning of the tutorial:
8 + 4 / 2 - 3.2 * 2
Run the program and give it the expression:
$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Err(
Error {
message: "...8 + 4 -->/ 2 - 3.2 * 2\n...\nExpected STOP.",
file: Some(
"<str>",
),
location: Some(
Location {
start: LineBased(
LineBased {
line: 1,
column: 6,
},
),
end: None,
},
),
},
)
As we already discussed above, the grammar we created so far can only parse an expression consisting of a single operation. We can see that the parser reported the error at the occurrence of the second operation.
Let's extend our grammar to allow for expressions of arbitrary length.
First, we shall start from the idea the each expression can be represented as an Abstract Syntax Tree (AST):
These kinds of trees resemble the gist of the underlying expressions. We can observe two things:
- AST shows the priority of operations.
*
and/
bind stronger than+
and-
. We see that operations with higher priority will be lower in the tree. - Each node in the tree is a sub-expression. Numbers are the most basic expressions, each operation is a sub-expression consisting of left sub-tree/sub-expression, operation and right sub-tree/sub-expression.
So, we could write grammar rules as:
Add: Expression '+' Expression;
Sub: Expression '-' Expression;
...
Where Add
, Sub
... are all expressions, thus:
Expression: Add | Sub | Mul | Div;
The definition is recursive, e.g. Add
references Expression
through its
operands and Expression
references Add
.
Finally, we can write this in a more compact form as a single grammar rule
(contracting Expression
to E
):
E: E '+' E
| E '-' E
| E '*' E
| E '/' E
| Number;
terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';
We can read this as:
Expression is:
- Expression plus Expression or
- Expression minus Expression or
- ...
- or a number
We must add operations to the terminals
section for the sake of giving names
to symbols as the names are needed for the generated code. We can use those
terminals directly in the syntax part e.g. like:
E: E Plus E
but I find it more readable to use strings in this case.
Of course, you can use either references or strings, or even combine the two approaches.
Let's run rcomp
over our new grammar to generate the parser.
$ rcomp calculator.rustemo
In State 7:E
E: E Plus E . {STOP, Plus, Minus, Mul, Div}
E: E . Plus E {STOP, Plus, Minus, Mul, Div}
...
16 conflict(s). 16 Shift/Reduce and 0 Reduce/Reduce.
Error: Grammar is not deterministic. There are conflicts.
Parser(s) not generated.
Woops! What went wrong?
Rustemo is talking about some conflicts and that the grammar is not deterministic. This just means that LR parser cannot be produced with the grammar because LR is a deterministic style of parsing where parser must know exactly what operation to perform in each state based solely on the next token. With this grammar it is not possible.
The report also gives detailed information about problematic spots. The report talks about states, those are LR automaton states as the LR parsing is based on constructing a deterministic push-down automaton (PDA) which is used during parsing.
To learn more see the section on parsing and conflicts resolution.
In the report we see segments like this:
In State 10:E
E: E Div E . {STOP, Plus, Minus, Mul, Div}
E: E . Plus E {STOP, Plus, Minus, Mul, Div}
E: E . Minus E {STOP, Plus, Minus, Mul, Div}
E: E . Mul E {STOP, Plus, Minus, Mul, Div}
E: E . Div E {STOP, Plus, Minus, Mul, Div}
When I saw E and see token Minus ahead I can't decide should I shift or reduce by production:
E: E Div E
Each parser state represent a location where parser can be in the parsing
process. The location is depicted by the dot in the productions above. So in
this state parser saw E
or E Div E
before and if there is Minus
ahead it
won't be able to decide if it should construct a Div
sub-tree making Div
operation to bind tighter or to consume the Minus
symbol which follows in
anticipation to construct Minus
sub-tree first.
Obviously, we have an issue with our operation priorities. Indeed, this grammar is ambiguous as, if we forget about priorities, input expressions can yield many possible trees1. LR parsers must always produce only one tree as LR parsing is deterministic.
GLR parsing is non-deterministic so it can be used with ambiguous grammars.
This could be resolved by transforming our grammar to encode priorities but the process is tedious, the resulting grammar becomes less readable and the resulting trees are far from intuitive to navigate and process.
Luckily, Rustemo has declarative disambiguation mechanisms which makes these issues easy to solve. These disambiguation information are specified inside of curly braces per production or per grammar rule.
The priority is specified by an integer number. The default priority is 10. Productions with higher priority will be the first to be reduced. Think of the reduction as producing a tree node where the node type is determined by the left-hand side of the production while the children are right-hand side.
Ok, knowing this let's extend our grammar:
E: E '+' E {1}
| E '-' E {1}
| E '*' E {2}
| E '/' E {2}
| Number;
terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';
Nice, we now have priorities defined. +
and -
operations are of priority 1
while *
and /
are of priority 2
. So, in the above conflict, division will
be reduced before Minus
symbol (subtraction) is taken into consideration.
Let's run rcomp
again:
$ rcomp calculator.rustemo
...
8 conflict(s). 8 Shift/Reduce and 0 Reduce/Reduce.
Error: Grammar is not deterministic. There are conflicts.
Parser(s) not generated.
Hm... we still have ambiguities but we've cut them in half. Let's see what are those remaining issues:
In State 10:E
E: E Div E . {STOP, Plus, Minus, Mul, Div}
E: E . Plus E {STOP, Plus, Minus, Mul, Div}
E: E . Minus E {STOP, Plus, Minus, Mul, Div}
E: E . Mul E {STOP, Plus, Minus, Mul, Div}
E: E . Div E {STOP, Plus, Minus, Mul, Div}
When I saw E and see token Mul ahead I can't decide should I shift or reduce by production:
E: E Div E
Ok, now we don't have issues with priorities but we have issues with associativities. When the parser saw division and there is multiplication symbol ahead it doesn't know what to do. Should it favor division or multiplication? They are of the same priority. We know that for all 4 operation parser should favor the operation it has already seen, or to put it simply, it should go from left to right when the priorities are the same2.
Rustemo also have declarative associativity specification. Associativity is
specified by keywords left
(or reduce
) and right
(or shift
).
Let's fix our grammar:
E: E '+' E {1, left}
| E '-' E {1, left}
| E '*' E {2, left}
| E '/' E {2, left}
| Number;
terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';
And run rcomp
again.
$ rcomp calculator.rustemo
$
It seems that everything is fine. Let's check by running our project:
$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Action: Accept
Ok(
C2(
EC2 {
e_1: C1(
EC1 {
e_1: C5(
"8",
),
e_3: C4(
EC4 {
e_1: C5(
"4",
),
e_3: C5(
"2",
),
},
),
},
),
e_3: C3(
EC3 {
e_1: C5(
"3.2",
),
e_3: C5(
"2",
),
},
),
},
),
)
That's it! We have a working grammar. It wasn't that hard after all, was it? :)
But, the work is not over yet. We got a parser but the AST is not that pretty and how can we evaluate our expressions? If you want to learn how to do that read on.
Improving AST
When we run rcomp
for our grammar we got two files generated: calculator.rs
and calculator_actions.rs
. The first one is the parser that is regenerated
from scratch whenever we run rcomp
. The second contains AST nodes' types and
actions used by the parser to construct AST nodes during parsing. This file can
be manually modified and the modification will be retained when rcomp
is run
as we shall see in the next section.
At this point we shall focus on the form of AST types that Rustemo generated for
us. Open calculator_actions.rs
file and examine its content. We can see that
for each grammar rule production we got struct generated with the name of the
grammar rule followed by Cx
sufix (C
for Choice and x
is an ordinal
number).
#![allow(unused)] fn main() { /// This file is maintained by rustemo but can be modified manually. /// All manual changes will be preserved except non-doc comments. use ::rustemo::{Context, Token as BaseToken}; use super::calculator::{self, TokenKind}; pub type Input = str; pub type Ctx<'i> = super::calculator::Context<'i, Input>; #[allow(dead_code)] pub type Token<'i> = BaseToken<'i, Input, TokenKind>; pub type Number = String; pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.into() } #[derive(Debug, Clone)] pub struct EC1 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct EC2 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct EC3 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct EC4 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub enum E { C1(EC1), C2(EC2), C3(EC3), C4(EC4), Number(Number), } pub fn e_c1(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C1(EC1 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_c2(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C2(EC2 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_c3(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C3(EC3 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_c4(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C4(EC4 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_number(_ctx: &Ctx, number: Number) -> E { E::Number(number) } }
You can expand and see the whole code in these snippets by using "Show hidden lines" option in the upper right corner of the code box.
And for each grammar rule we have an enum combining the variant structs:
#![allow(unused)] fn main() { /// This file is maintained by rustemo but can be modified manually. /// All manual changes will be preserved except non-doc comments. use ::rustemo::{Context, Token as BaseToken}; use super::calculator::{self, TokenKind}; pub type Input = str; pub type Ctx<'i> = super::calculator::Context<'i, Input>; #[allow(dead_code)] pub type Token<'i> = BaseToken<'i, Input, TokenKind>; pub type Number = String; pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.into() } #[derive(Debug, Clone)] pub struct EC1 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct EC2 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct EC3 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct EC4 { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub enum E { C1(EC1), C2(EC2), C3(EC3), C4(EC4), Number(Number), } pub fn e_c1(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C1(EC1 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_c2(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C2(EC2 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_c3(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C3(EC3 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_c4(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::C4(EC4 { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_number(_ctx: &Ctx, number: Number) -> E { E::Number(number) } }
Notice also that recursive types are boxed (e.g. pub e_1: Box<E>;
) as they
should be.
Ok, that is nice but look at those struct names and fields. They don't give us clue about operations, operands etc. It would be pretty hard to use these types.
Let's improve it a bit. First we can specify production kinds which is just a nice name for each production which can be used in the code.
We could workaround the issue by making a separate grammar rule for each operation like suggested above:
Add: E '+' E;
Sub: E '-' E;
...
E: Add | Sub |...
But let's do it with production kinds to see that alternative.
#![allow(unused)] fn main() { E: E '+' E {Add, 1, left} | E '-' E {Sub, 1, left} | E '*' E {Mul, 2, left} | E '/' E {Div, 2, left} | Number; terminals Number: /\d+(\.\d+)?/; Plus: '+'; Minus: '-'; Mul: '*'; Div: '/'; }
Notice the additions of Add
, Sub
... in the curly braces.
Regenerate the parser, but first delete actions so they can be regenerated also.
$ rm calculator_actions.rs
$ rcomp calculator.rustemo
Now, if we open calculator_actions.rs
we'll see that structs are named after
productions kinds. Nice!
#![allow(unused)] fn main() { /// This file is maintained by rustemo but can be modified manually. /// All manual changes will be preserved except non-doc comments. use ::rustemo::{Context, Token as BaseToken}; use super::calculator::{self, TokenKind}; pub type Input = str; pub type Ctx<'i> = super::calculator::Context<'i, Input>; #[allow(dead_code)] pub type Token<'i> = BaseToken<'i, Input, TokenKind>; pub type Number = String; pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.into() } #[derive(Debug, Clone)] pub struct Add { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct Sub { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct Mul { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct Div { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub enum E { Add(Add), Sub(Sub), Mul(Mul), Div(Div), Number(Number), } pub fn e_add(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::Add(Add { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_sub(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::Sub(Sub { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_mul(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::Mul(Mul { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_div(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::Div(Div { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } pub fn e_number(_ctx: &Ctx, number: Number) -> E { E::Number(number) } }
But, what about fields. It is certainly not nice to have those generic e_1, e_3
names. To fix these we can use
assignments which is a
mechanism to both define AST nodes' field names and specify to the parser what
to retain in the AST during parsing. Some parts of the input are syntax noise
and should not be kept in the AST.
To change field names add assignments for left
and right
operands in each
operation:
#![allow(unused)] fn main() { E: left=E '+' right=E {Add, 1, left} | left=E '-' right=E {Sub, 1, left} | left=E '*' right=E {Mul, 2, left} | left=E '/' right=E {Div, 2, left} | Number; terminals Number: /\d+(\.\d+)?/; Plus: '+'; Minus: '-'; Mul: '*'; Div: '/'; }
Delete actions and rerun rcomp
. Now you'll see that the generated structs have
nicely named fields.
#![allow(unused)] fn main() { /// This file is maintained by rustemo but can be modified manually. /// All manual changes will be preserved except non-doc comments. use ::rustemo::{Context, Token as BaseToken}; use super::calculator::{self, TokenKind}; pub type Input = str; pub type Ctx<'i> = super::calculator::Context<'i, Input>; #[allow(dead_code)] pub type Token<'i> = BaseToken<'i, Input, TokenKind>; pub type Number = String; pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.into() } #[derive(Debug, Clone)] pub struct Add { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Sub { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Mul { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Div { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub enum E { Add(Add), Sub(Sub), Mul(Mul), Div(Div), Number(Number), } pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E { E::Add(Add { left: Box::new(left), right: Box::new(right), }) } pub fn e_sub(_ctx: &Ctx, left: E, right: E) -> E { E::Sub(Sub { left: Box::new(left), right: Box::new(right), }) } pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E { E::Mul(Mul { left: Box::new(left), right: Box::new(right), }) } pub fn e_div(_ctx: &Ctx, left: E, right: E) -> E { E::Div(Div { left: Box::new(left), right: Box::new(right), }) } pub fn e_number(_ctx: &Ctx, number: Number) -> E { E::Number(number) } }
Much easier to work with!
In general, it is a good practice to use assignments and production kinds from the start as they represent valuable information (a sort of docs). The AST types generator uses those information, as you have seen.
Let's run our calculator just to make sure it works:
$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Action: Accept
Ok(Sub(Sub { left: Add(Add { left: C5("8"),
right: Div(Div { left: C5("4"), right: C5("2") }) }),
right: Mul(Mul { left: C5("3.2"), right: C5("2") }) }))
Everything is fine. Let's move on.
Calculating expressions
The last piece of the puzzle is to make our calculator useful by doing the actual calculation.
When we are parsing input we are doing what we call "semantic analysis" which purpose is to transform the input to some other form which is of use in the domain we are working in.
Usually, the end result of the analysis is some form of AST or more often ASG (Abstract Semantic Graph). In our case, semantic analysis can directly produce the end result of the calculation so that the result of the full process of parsing is a single number which represents the result.
Semantic analysis is done by the actions in calculator_actions.rs
. The actions
are Rust functions which are called during reductions to reduce a production
(right-hand side) to the sub-result (left-hand side of the grammar rule). In our
case, each reduction should basically be a calculation of a sub-expression
result.
As you have seen in the previous section, you need to delete actions if you want them regenerated on each Rustemo run. You can also remove some parts of it and those parts will get regenerated. The logic is that Rustemo will only add missing parts (identified by its name) of the file (missing AST types and action functions) but will not modify already existing. This enables manual modification of parts we want to tune to our likings.
We will now change types and actions to calculate sub-results instead of building AST nodes.
First, we'll start with the Number
rule which is defined as a String
:
#![allow(unused)] fn main() { pub type Number = String; }
We know that our number should be float
, so change it:
#![allow(unused)] fn main() { pub type Number = f32; }
Just bellow the Number
type we see the first action which is called to
transform the parsed number to Number
. token.value
is a string slice
containing the number.
#![allow(unused)] fn main() { pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.into() } }
Previously our Number
was String
but now it is a float so we should change
this action to parse the string and produce a float:
#![allow(unused)] fn main() { pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.parse().unwrap() } }
Now, we see that E
type which represents sub-results is enum.
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub enum E { Add(Add), Sub(Sub), Mul(Mul), Div(Div), Number(Number), } }
A sub-result should be float also. So replace the enum with:
#![allow(unused)] fn main() { pub type E = f32; }
Also, structs for values of variants Add
, Sum
... are not needed anymore so
remove them from the file.
Okay, we have fixed generated types, now let's see our expressions' actions. They are of the form:
#![allow(unused)] fn main() { /// This file is maintained by rustemo but can be modified manually. /// All manual changes will be preserved except non-doc comments. use ::rustemo::{Context, Token as BaseToken}; use super::calculator::{self, TokenKind}; pub type Input = str; pub type Ctx<'i> = super::calculator::Context<'i, Input>; #[allow(dead_code)] pub type Token<'i> = BaseToken<'i, Input, TokenKind>; pub type Number = String; pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.into() } #[derive(Debug, Clone)] pub struct Add { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Sub { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Mul { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Div { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub enum E { Add(Add), Sub(Sub), Mul(Mul), Div(Div), Number(Number), } pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E { E::Add(Add { left: Box::new(left), right: Box::new(right), }) } pub fn e_sub(_ctx: &Ctx, left: E, right: E) -> E { E::Sub(Sub { left: Box::new(left), right: Box::new(right), }) } pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E { E::Mul(Mul { left: Box::new(left), right: Box::new(right), }) } pub fn e_div(_ctx: &Ctx, left: E, right: E) -> E { E::Div(Div { left: Box::new(left), right: Box::new(right), }) } pub fn e_number(_ctx: &Ctx, number: Number) -> E { E::Number(number) } }
So they produce AST node while they should do the calculation. So let's change all of them:
#![allow(unused)] fn main() { /// This file is maintained by rustemo but can be modified manually. /// All manual changes will be preserved except non-doc comments. use ::rustemo::{Context, Token as BaseToken}; use super::calculator::{self, TokenKind}; pub type Input = str; pub type Ctx<'i> = super::calculator::Context<'i, Input>; #[allow(dead_code)] pub type Token<'i> = BaseToken<'i, Input, TokenKind>; pub type Number = f32; pub fn number(_ctx: &Ctx, token: Token) -> Number { token.value.parse().unwrap() } pub type E = f32; pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E { left + right } pub fn e_sub(_ctx: &Ctx, left: E, right: E) -> E { left - right } pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E { left * right } pub fn e_div(_ctx: &Ctx, left: E, right: E) -> E { left / right } pub fn e_number(_ctx: &Ctx, number: Number) -> E { number } #[allow(dead_code)] type Dummy = u32; }
Very simple, each action is doing its calculation! Notice the last one which is
called to reduce Number
to E
. It just returns the number as it is already a
float produced by the number
action.
That's it. We have a fully working calculator!
Let's run it just to verify:
$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Action: Accept
Ok(3.6)
If you have made it through here, well done!
Now, just to make sure that you have connected all the dots try to solve exercises bellow.
Exercises
-
Extend the language to support power operator (
^
). What are priority and associativity of this operation? -
Support unary minus operation.
-
Support trigonometric operations in a function call style (e.g.
sin(34) + 2 * cos(12)
) -
Support variables before an expression. E.g:
a = 5 b = 2 * a a + 2 * sin(b)
Extend
main.rs
to accept input line-by-line. Whenever the user enter assignment to the variable the program will update its internal state by evaluating RHS of the assignment and keeping variable values in some mapping data structure of your choice. Each defined variable can be used in subsequent expressions. When the user enters just an expression without assignment the parser should evaluate and print the result.
Footnotes
The number of trees (interpretations) of an arithmetic expression can be calculated by the Catalan number sequence. For our example with 4 operations from the beginning the number of trees will be 14.
This doesn't hold for some other operations. See Exercises and think about associativity of the power operation.