Grammar language
This section describe the grammar language, its syntax and semantics rules.
The Rustemo grammar specification language is based on BNF with syntactic sugar extensions which are optional and build on top of a pure BNF. Rustemo grammars are based on Context-Free Grammars (CFGs) and are written declaratively. This means you don't have to think about the parsing process like in e.g. PEGs. Ambiguities are dealt with explicitly (see the section on conflicts).
The structure of the grammar
Each grammar file consists of two parts:
- derivation/production rules,
- terminal definitions which are written after the keyword
terminals
.
Each derivation/production rule is of the form:
<symbol>: <expression> ;
where <symbol>
is a grammar non-terminal and <expression>
is one or more
sequences of grammar symbol references separated by choice operator |
.
For example:
Fields: Field | Fields "," Field;
Here Fields
is a non-terminal grammar symbol and it is defined as either a
single Field
or, recursively, as Fields
followed by a string terminal ,
and than by another Field
. It is not given here but Field
could also be
defined as a non-terminal. For example:
Field: QuotedField | FieldContent;
Or it could be defined as a terminal in terminals section:
terminals
Field: /[A-Z]*/;
This terminal definition uses regular expression recognizer.
Terminals
Terminal symbols of the grammar define the fundamental or atomic elements of your language, tokens or lexemes (e.g. keywords, numbers).
Terminals are specified at the end of the grammar file, after production rules,
following the keyword terminals
.
Tokens are recognized from the input by a lexer
component. Rustemo provides a
string lexer out-of-the-box which enable lexing based on recognizers provided in
the grammar. If more control is needed, or if non-textual context has been
parsed a custom lexer must be provided. See the lexers section
for more.
Each terminal definition is in the form:
<terminal name>: <recognizer>;
where <recognizer>
can be omitted if custom lexer is used.
The default string lexer enables specification of two kinds of terminal recognizers:
- String recognizer
- Regex recognizer
String recognizer
String recognizer is defined as a plain string inside single or double quotes. For example, in a grammar rule:
MyRule: "start" OtherRule "end";
"start"
and "end"
will be terminals with string recognizers that match
exactly the words start
and end
. In this example we have recognizers inlined
in the grammar rule.
For each string recognizer you must provide its definition in the terminals
section in order to define a terminal name.
terminals
Start: "start";
End: "end";
You can reference the terminal from the grammar rule, like:
MyRule: Start OtherRule End;
or use the same string recognizer inlined in the grammar rules, like we have
seen before. It is your choice. Sometimes it is more readable to use string
recognizers directly. But, anyway you must always declare the terminal in the
terminals
section for the sake of providing names which are used in the code
of the generated parser.
Regular expression recognizer
Or regex recognizer for short is a regex pattern written inside slashes
(/.../
).
For example:
terminals
Number: /\d+/;
This rule defines terminal symbol Number
which has a regex recognizer that
will recognize one or more digits from the input.
You cannot write regex recognizers inline like you can do with string recognizers. This constraint is introduced because regexes are not that easy to write and they don't add to readability so it is always better to reference regex terminal by name in grammar rules.
During regex construction a ^
prefix is added to the regex from the grammar to
make sure that the content is matched at the current input position. This can be
an issue if you use a pattern like A|B
in your regex as it translates to
^A|B
which matches either A
at the current position or B
in the rest of
the input. So, the workaround for now is to use (A|B)
, i.e. always wrap
alternative choices in parentheses.
Usual patterns
This section explains how some common grammar patterns can be written using just a plain Rustemo BNF-like notation. Afterwards we'll see some syntax sugar extensions which can be used to write these patterns in a more compact and readable form.
One or more
This pattern is used to match one or more things.
For example, Sections
rule below will match one or more Section
.
Sections: Section | Sections Section;
Notice the recursive definition of the rule. You can read this as
Sections
is either a single Section orSections
followed by aSection
.
Please note that you could do the same with this rule:
Sections: Section | Section Sections;
which will give you similar result but the resulting tree will be different. Notice the recursive reference is now at the end of the second production.
Previous example will reduce sections early and then add another section to it, thus the tree will be expanding to the left. The example in this note will collect all the sections and than start reducing from the end, thus building a tree expanding to the right. These are subtle differences that are important when you start writing your semantic actions. Most of the time you don't care so use the first version as it is more efficient in the context of the LR parsing.
Zero or more
This pattern is used to match zero or more things.
For example, Sections
rule below will match zero or more Section
.
Sections: Section | Sections Section | EMPTY;
Notice the addition of the EMPTY
choice at the end. This means that matching
nothing is a valid Sections
non-terminal. Basically, this rule is the same as
one-or-more except that matching nothing is also a valid solution.
Same note from the above applies here to.
Optional
When we want to match something optionally we can use this pattern:
OptHeader: Header | EMPTY;
In this example OptHeader
is either a Header
or nothing.
Syntactic sugar - BNF extensions
Previous section gives the overview of the basic BNF syntax. If you got to use various BNF extensions (like Kleene star) you might find writing patterns in the previous section awkward. Since some of the patterns are used frequently in the grammars (zero-or-more, one-or-more etc.) Rustemo provides syntactic sugar for this common idioms using a well known regular expression syntax.
Optional
Optional match can be specified using ?
. For example:
A: 'c'? B Num?;
B: 'b';
terminals
Tb: 'b';
Tc: 'c';
Num: /\d+/;
Here, we will recognize B
which is optionally preceded with c
and followed
by Num
.
Lets see what the parser will return optional inputs.
In this test:
#![allow(unused)] fn main() { #[test] fn optional_1_1() { let result = Optional1Parser::new().parse("c b 1"); output_cmp!( "src/sugar/optional/optional_1_1.ast", format!("{result:#?}") ); } }
for input c b 1
the result
will be:
Ok(
A {
tc_opt: Some(
Tc,
),
b: Tb,
num_opt: Some(
"1",
),
},
)
If we leave the number out and try to parse c b
, the parse will succeed and the result will be:
Ok(
A {
tc_opt: Some(
Tc,
),
b: Tb,
num_opt: None,
},
)
Notice that returned type is A
struct with fields tc_opt
and num_opt
of
Optional
type. These types are auto-generated based on the grammar. To learn
more see section on AST types/actions code generation.
Syntax equivalence for optional
operator
S: B?;
terminals
B: "b";
is equivalent to:
S: BOpt;
BOpt: B | EMPTY;
terminals
B: "b";
Behind the scenes Rustemo will create BOpt
rule. All syntactic sugar
additions operate by creating additional rules in the grammar during
parser compilation.
One or more
One-or-more match is specified using +
operator.
For example:
A: 'c' B+ Ta;
B: Num;
terminals
Ta: 'a';
Tc: 'c';
Num: /\d+/;
After c
we expect to see one or more B
(which will match a number) and at
the end we expect a
.
Let's see what the parser will return for input c 1 2 3 4 a
:
#![allow(unused)] fn main() { #[test] fn one_or_more_2_2() { let result = OneOrMore2Parser::new().parse("c 1 2 3 4 a"); output_cmp!( "src/sugar/one_or_more/one_or_more_2_2.ast", format!("{result:#?}") ); } }
The result will be:
Ok(
[
"1",
"2",
"3",
"4",
],
)
We see in the previous example that default AST building actions will drop
string matches as fixed content is not interesting for analysis and usually
represent syntax noise which is needed only for performing correct parsing.
Also, we see that one-or-more will be transformed to a Vec
of matched values
(using the vec
annotation, see bellow). Of, course, this
is just the default. You can change it to fit your needs. To learn more see the
section on builders.
Syntax equivalence for one or more
:
S: A+;
terminals
A: "a";
is equivalent to:
S: A1;
@vec
A1: A1 A | A;
terminals
A: "a";
Zero or more
Zero-or-more match is specified using *
operator.
For example:
A: 'c' B* Ta;
B: Num;
terminals
Ta: 'a';
Tc: 'c';
Num: /\d+/;
This syntactic sugar is similar to +
except that it doesn't require rule to
match at least once. If there is no match, resulting sub-expression will be an
empty list.
Let's see what the parser based on the given grammar will return for input c 1 2 3 a
.
#![allow(unused)] fn main() { #[test] fn zero_or_more_2_1() { let result = ZeroOrMore2Parser::new().parse("c 1 2 3 a"); output_cmp!( "src/sugar/zero_or_more/zero_or_more_2_1.ast", format!("{result:#?}") ); } }
The result will be:
Ok(
Some(
[
"1",
"2",
"3",
],
),
)
But, contrary to one-or-more we may match zero times. For example, if input is
c a
we get:
Ok(
None,
)
Syntax equivalence for zero or more
:
S: A*;
terminals
A: "a";
is equivalent to:
S: A0;
@vec
A0: A1 {nops} | EMPTY;
@vec
A1: A1 A | A;
terminals
A: "a";
So using of *
creates both A0
and A1
rules. Action attached to A0
returns a list of matched a
and empty list if no match is found. Please note
the usage of nops
. In case
prefer_shift
strategy is used, using nops
will perform both REDUCE
and
SHIFT
during GLR parsing if what follows zero or more might be another
element in the sequence. This is most of the time what you need.
Repetition modifiers
Repetitions (+
, *
, ?
) may optionally be followed by a modifier in square
brackets. Currently, this modifier can only be used to define a separator. The
separator is defined as a terminal rule reference.
For example, for this grammar:
A: 'c' B Num+[Comma];
B: 'b' | EMPTY;
terminals
Num: /\d+/;
Comma: ',';
Tb: 'b';
Tc: 'c';
We expect to see c
, followed by optional B
, followed by one or more numbers
separated by a comma (Num+[Comma]
).
If we give input c b 1, 2, 3, 4
to the parser:
#![allow(unused)] fn main() { #[test] fn one_or_more_1_1_sep() { let result = OneOrMore1SepParser::new().parse("c b 1, 2, 3, 4"); output_cmp!( "src/sugar/one_or_more/one_or_more_1_1_sep.ast", format!("{result:#?}") ); } }
we get this output:
Ok(
A {
b: Some(
Tb,
),
num1: [
"1",
"2",
"3",
"4",
],
},
)
Syntax equivalence of one or more with separator
:
S: A+[Comma];
terminals
A: "a";
Comma: ",";
is equivalent to:
S: A1Comma;
@vec
A1Comma: A1Comma Comma A | A;
terminals
A: "a";
Comma: ",";
Making the name of the separator rule a suffix of the additional rule
name makes sure that only one additional rule will be added to the
grammar for all instances of A+[Comma]
, i.e. same base rule with the
same separator.
Parenthesized groups
You can use parenthesized groups at any place you can use a rule reference. For example:
S: a (b* a {left} | b);
terminals
a: "a";
b: "b";
Here, you can see that S
will match a
and then either b* a
or b
. You can
also see that meta-data can be applied at a per-sequence
level (in this case {left}
applies to sequence b* a
).
Here is a more complex example which uses repetitions, separators, assignments and nested groups.
S: (b c)*[comma];
S: (b c)*[comma] a=(a+ (b | c)*)+[comma];
terminals
a: "a";
b: "b";
c: "c";
comma: ",";
Syntax equivalence `parenthesized groups`:
S: c (b* c {left} | b);
terminals
c: "c";
b: "b";
is equivalent to:
S: c S_g1;
S_g1: b_0 c {left} | b;
b_0: b_1 | EMPTY;
b_1: b_1 b | b;
terminals
c: "c";
b: "b";
So using parenthesized groups creates additional `_g<n>` rules (`S_g1` in the
example), where `n` is a unique number per rule starting from `1`. All other
syntactic sugar elements applied to groups behave as expected.
Greedy repetitions
*
, +
, and ?
operators have their greedy counterparts. To make an
repetition operator greedy add !
(e.g. *!
, +!
, and ?!
). These
versions will consume as much as possible before proceeding. You can
think of the greedy repetitions as a way to disambiguate a class of
ambiguities which arises due to a sequence of rules where earlier
constituent can match an input of various length leaving the rest to the
next rule to consume.
Consider this example:
S: "a"* "a"*;
It is easy to see that this grammar is ambiguous, as for the input:
a a
We have 3 solutions:
1:S[0->3]
a_0[0->1]
a_1[0->1]
a[0->1, "a"]
a_0[2->3]
a_1[2->3]
a[2->3, "a"]
2:S[0->3]
a_0[0->0]
a_0[0->3]
a_1[0->3]
a_1[0->1]
a[0->1, "a"]
a[2->3, "a"]
3:S[0->3]
a_0[0->3]
a_1[0->3]
a_1[0->1]
a[0->1, "a"]
a[2->3, "a"]
a_0[3->3]
If we apply greedy zero-or-more to the first element of the sequence:
S: "a"*! "a"*;
We have only one solution where all a
tokens are consumed by the first
part of the rule:
S[0->3]
a_0[0->3]
a_1[0->3]
a_1[0->1]
a[0->1, "a"]
a[2->3, "a"]
a_0[3->3]
EMPTY
built-in rule
There is a special EMPTY
rule you can reference in your grammars. EMPTY
rule
will reduce without consuming any input and will always succeed, i.e. it is
empty recognition.
Named matches (assignments)
In the section on builders you can see that struct fields
deduced from rules, as well as generated semantic actions parameters, are named
based on the <name>=<rule reference>
part of the grammar. We call these named matches
or assignments
.
Named matches
enable giving a name to a rule reference directly in the
grammar.
In the calculator example:
E: left=E '+' right=E {Add, 1, left}
| left=E '-' right=E {Sub, 1, left}
| left=E '*' right=E {Mul, 2, left}
| left=E '/' right=E {Div, 2, left}
| base=E '^' exp=E {Pow, 3, right}
| '(' E ')' {Paren}
| Num {Num};
terminals
Plus: '+';
Sub: '-';
Mul: '*';
Div: '/';
Pow: '^';
LParen: '(';
RParen: ')';
Num: /\d+(\.\d+)?/;
we can see usage of assignments to name recursive references to E
in the first
four alternatives as left
and right
since we are defining binary operations,
while the fifth alternative for power operation uses more descriptive names
base
and exp
.
Now, with this in place, generated type for E
and two operations (/
and
^
), and the semantic action for +
operation will be:
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub struct Div { pub left: Box<E>, pub right: Box<E>, } #[derive(Debug, Clone)] pub struct Pow { pub base: Box<E>, pub exp: Box<E>, } #[derive(Debug, Clone)] pub enum E { Add(Add), Sub(Sub), Mul(Mul), Div(Div), Pow(Pow), Paren(Box<E>), Num(Num), } pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E { E::Add(Add { left: Box::new(left), right: Box::new(right), }) } }
Notice the names of fields in Div
and Pow
structs. Also, the name of
parameters in e_add
action. They are derived from the assignments.
Without the usage of assignments, the same generated types and action would be:
#![allow(unused)] fn main() { #[derive(Debug, Clone)] pub struct Div { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub struct Pow { pub e_1: Box<E>, pub e_3: Box<E>, } #[derive(Debug, Clone)] pub enum E { Add(Add), Sub(Sub), Mul(Mul), Div(Div), Pow(Pow), Paren(Box<E>), Num(Num), } pub fn e_add(_ctx: &Ctx, e_1: E, e_3: E) -> E { E::Add(Add { e_1: Box::new(e_1), e_3: Box::new(e_3), }) } }
Where these names are based on the name of the referenced rule and the position inside the production.
Rule/production meta-data
Rules and productions may specify additional meta-data that can be used to guide parser construction decisions. Meta-data is specified inside curly braces right after the name of the rule, if it is a rule-level meta-data, or after the production body, if it is a production-level meta-data. If a meta-data is applied to the grammar rule it is in effect for all production of the rule, but if the same meta-data is defined for the production it takes precedence.
Currently, kinds of meta-data used during parser construction are as follows:
- disambiguation rules
- production kinds
- user meta-data
Disambiguation rules
These are special meta-data that are used during by Rustemo during grammar compilation to influence decision on LR automata states' actions.
See sections on parsing and resolving LR conflicts.
There are some difference on which rules can be specified on the production and terminal level.
Disambiguation rules are the following:
-
priority - written as an integer number. Default priority is 10. Priority defined on productions have influence on both reductions on that production and shifts of tokens from that production. Priority defined on terminals influence the priority during tokenization. When multiple tokens can be recognized on the current location, those that have higher priority will be favored.
-
associativity -
right
/left
orshift
/reduce
. When there is a state where competing shift/reduce operations could be executed this meta-data will be used to disambiguate. These meta-data can be specified on both productions and terminals level. If during grammar analysis there is a state where associativity is defined on both production and terminal the terminal associativity takes precedence.See the [calculator tutorial](./tutorials/calculator/calculator.md) for an
example of priority/associativity usage. There is also an example in the section on resolving LR conflicts.
-
global shift preference control -
nops
andnopse
. One of the standard techniques to resolve shift/reduce conflicts is to prefer shift always which yields a greedy behavior. This global settings can be altered during grammar compilation.nops
(no prefer shift) can be used on a production level to disable this preference for the given production if enabled globally.nopse
(no prefer shift over empty) is used to disable preferring shift over empty reductions only.
Production kinds
These meta-data are introduced to enable better deduction of function/parameter names in the generated code. The way to write the production kind is to write an identifier in camel-case.
For example:
E: E '+' E {Add, 1, left}
| E '-' E {Sub, 1, left}
| E '*' E {Mul, 2, left}
| E '/' E {Div, 2, left}
| Number;
terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';
Add
, Sub
, Mul
and Div
are production kinds. These will influence the
name of the parameters, fields etc. in the generated code.
See the section of improving AST in the calculator tutorial for more info.
User meta-data
Arbitrary meta-data can be attached to rules or productions. The form of each is
<name>: <value>
where <name>
should be any valid Rust identifier while
<value>
can be any of:
- integer number
- float number
- string in double or single quotes
- keywords
true
orfalse
for boolean values
These meta-data are supported syntactically but are not used at the moment. In the future semantic actions will have access to these values which could be used do alter building process in a user defined way.
Example
This test shows various meta-data applied at both rule and production level.
#![allow(unused)] fn main() { #[test] fn productions_meta_data_inheritance() { let grammar: Grammar = r#" S {15, nopse}: A "some_term" B {5} | B {nops}; A {bla: 10}: B {nopse, bla: 5} | B {7}; B {left}: some_term {right} | some_term; terminals some_term: "some_term"; "# .parse() .unwrap(); assert_eq!(grammar.productions.len(), 7); assert_eq!(grammar.productions[ProdIndex(1)].prio, 5); // Inherited assert!(grammar.productions[ProdIndex(1)].nopse); assert_eq!(grammar.productions[ProdIndex(1)].meta.len(), 0); // Inherited assert_eq!(grammar.productions[ProdIndex(2)].prio, 15); assert!(grammar.productions[ProdIndex(2)].nops); // Inherited assert!(grammar.productions[ProdIndex(2)].nopse); assert_eq!( 5u32, match grammar.productions[ProdIndex(3)].meta.get("bla").unwrap() { crate::lang::rustemo_actions::ConstVal::Int(i) => i.into(), _ => panic!(), } ); assert_eq!(grammar.productions[ProdIndex(3)].meta.len(), 1); // Inherited assert_eq!(grammar.productions[ProdIndex(4)].prio, 7); assert_eq!( 10u32, match grammar.productions[ProdIndex(4)].meta.get("bla").unwrap() { crate::lang::rustemo_actions::ConstVal::Int(i) => i.into(), _ => panic!(), } ); assert_eq!( grammar.productions[ProdIndex(5)].assoc, Associativity::Right ); // Inherited assert_eq!(grammar.productions[ProdIndex(6)].assoc, Associativity::Left); } }
Rule annotations
Rule annotation are written before grammar rule name using @action_name
syntax. Annotations are special built-in meta-data used to change the generated
AST types and/or actions.
Currently, there is only one annotation available - vec
, which is used to
annotate rules that represent zero-or-more or one-or-more patterns. When this
annotation is applied the resulting AST type will be Vec
. Automatically
generated actions will take this into account if default builder is used (see
the section on builders).
vec
annotation is implicitly used in *
and +
syntax sugar. See the
relevant sections for the equivalent grammars using the vec
annotation.
For example, you can use @vec
annotation in grammar rules that have the
following patterns:
// This will be a vector of Bs. The vector may be empty.
@vec
A: A B | B | EMPTY;
// This is the same but the vector must have at least one element after
// a successful parse (and here we've changed the order in the first production)
@vec
A: B A | B;
This is just a convenience and a way to have a default type generated up-front. You can always change AST types manually.
Grammar comments
In Rustemo grammar, comments are available as both line comments and block comments:
// This is a line comment. Everything from the '//' to the end of line is a comment.
/*
This is a block comment.
Everything in between `/*` and '*/' is a comment.
*/
Handling keywords in your language
By default parser will match given string recognizer even if it is part of some larger word, i.e. it will not require matching on the word boundary. This is not the desired behavior for language keywords.For example, lets examine this little grammar:
S: "for" name=ID "=" from=INT "to" to=INT;
terminals
ID: /\w+/;
INT: /\d+/;
This grammar is intended to match statement like this one:
for a=10 to 20
But it will also match:
fora=10 to20
which is not what we wanted.
Rustemo allows the definition of a special terminal rule KEYWORD
. This rule
must define a regular expression recognizer.
Any string recognizer in the grammar that can be also recognized by the
KEYWORD
recognizer is treated as a keyword and is changed during grammar
construction to match only on word boundary.
For example:
S: "for" name=ID "=" from=INT "to" to=INT;
terminals
ID: /\w+/;
INT: /\d+/;
KEYWORD: /\w+/;
Now,
fora=10 to20
will not be recognized as the words for
and to
are recognized to be
keywords (they can be matched by the KEYWORD
rule).
This will be parsed correctly:
for a=10 to 20
As =
is not matched by the KEYWORD
rule and thus doesn't require to be
separated from the surrounding tokens.
Rustemo uses integrated scanner so this example:
for for=10 to 20
will be correctly parsed. `for` in `for=10` will be recognized as `ID` and
not as a keyword `for`, i.e. there is no lexical ambiguity due to tokenizer
separation.
Handling whitespaces and comments (a.k.a Layout) in your language
The default string lexer skips whitespaces. You can take control over this
process by defining a special grammar rule Layout
. If this rule is found in
the grammar the parser will use it to parse layout before each token. This is
usually used to parse whitespaces, comments, or anything that is not relevant
for the semantics analysis of the language.
For example, given the grammar:
// Digits with some words in between that should be ignored.
S: Digit TwoDigits Digit+;
TwoDigits: Digit Digit;
Layout: LayoutItem+;
LayoutItem: Word | WS;
terminals
Digit: /\d/;
Word: /[a-zA-Z]+/;
WS: /\s+/;
We can parse an input consisting of numbers and words but we will get only numbers in the output.
#![allow(unused)] fn main() { let result = LayoutParser::new().parse("42 This6 should be 8 ignored 9 "); }
If default AST builder is used, the result will be:
Ok(
S {
digit: "4",
two_digits: TwoDigits {
digit_1: "2",
digit_2: "6",
},
digit1: [
"8",
"9",
],
},
)
You can see that all layout is by default dropped from the result. Of course,
you can change that by changing the generated actions. The layout is passed to
each action through the Context
object (ctx.layout
).
For example, the generic tree builder preserves the layout on the tree nodes. The result from the above parse if generic tree builder is used will be:
Ok(
NonTermNode {
prod: S: Digit TwoDigits Digit1,
location: [1,0-1,30],
children: [
TermNode {
token: Digit("\"4\"" [1,0-1,1]),
layout: None,
},
NonTermNode {
prod: TwoDigits: Digit Digit,
location: [1,1-1,8],
children: [
TermNode {
token: Digit("\"2\"" [1,1-1,2]),
layout: None,
},
TermNode {
token: Digit("\"6\"" [1,7-1,8]),
layout: Some(
" This",
),
},
],
layout: None,
},
NonTermNode {
prod: Digit1: Digit1 Digit,
location: [1,19-1,30],
children: [
NonTermNode {
prod: Digit1: Digit,
location: [1,19-1,20],
children: [
TermNode {
token: Digit("\"8\"" [1,19-1,20]),
layout: Some(
" should be ",
),
},
],
layout: Some(
" should be ",
),
},
TermNode {
token: Digit("\"9\"" [1,29-1,30]),
layout: Some(
" ignored ",
),
},
],
layout: Some(
" should be ",
),
},
],
layout: None,
},
)
Here is another example that gives support for both line comments and block comments like the one used in the grammar language itself:
Layout: LayoutItem*;
LayoutItem: WS | Comment;
Comment: '/*' Corncs '*/' | CommentLine;
Corncs: Cornc*;
Cornc: Comment | NotComment | WS;
terminals
WS: /\s+/;
CommentLine: /\/\/.*/;
NotComment: /((\*[^\/])|[^\s*\/]|\/[^\*])+/;