Parser parameters¶
There are several parameters you can pass during the parser construction. The
mandatory first parameter is the Grammar
instance. Other parameters are
explained in the rest of this section.
All parameters described here work for both parglare.Parser
and
parglare.GLRParser
classes.
actions¶
This parameters is a dict of actions keyed by the name of the grammar rule.
layout_actions¶
This parameter is used to specify actions called when the rules of layout sub-grammar are reduced. This is rarely needed but there are times when you would like to process matched layout (e.g. whitespaces, comments).
It is given in the same format as actions
parameter, a dict of callables keyed
by grammar rule names.
ws¶
This parameter specifies a string whose characters are considered to be
whitespace. By default its value is '\n\r\t '
. It is used if layout
sub-grammar
(LAYOUT
grammar rule) is not defined. If LAYOUT
rule is given in the grammar
it is used instead and this parameter is ignored.
build_tree¶
A boolean whose default value is False
. If set to True
parser will call
implicit actions that will build the parse tree.
call_actions_during_tree_build¶
By default, this parameter is set to False
. If set to True
, parser will call
actions during the parse tree parse tree building
process. The return value of each action will be discarded, since they directly
affect the parse tree building process.
Warning
Use this parameter with a special care when GLR is used, since actions will be called even on trees that can't be completed (unsuccessful parses).
consume_input¶
A boolean whose value is True
by default. If True
the whole input must be
consumed for the parse to be considered successful. This is most of the time
what you want. If set to False
then LR parser will parse as much as possible
and leave the rest of the input unconsumed while GLR parser will produce all
possible parses with both completely and incompletely consumed input.
Warning
Be aware that setting this option to False
for GLR usually leads to high
level of ambiguity and multiple parses as any substring from beginning of
the input that parses will be considered a valid parse.
prefer_shifts¶
By default set to True
for LR parser and to False
for GLR parser. In case of
shift/reduce conflicts this strategy would favor shift over
reduce. You can still use associativity
rules to decide per production.
You can disable this rule on per-production basis by using nops
on the
production.
Warning
Do not use prefer_shifts
if you don't understand the implications. Try to
understand conflicts and
resolution strategies.
prefer_shifts_over_empty¶
By default set to True
for LR parser and to False
for GLR parser. In case
of shift/reduce conflicts on empty reductions this strategy
would favor shift over reduce. You can still
use associativity rules to decide per
production.
You can disable this rule on per-production basis by using nopse
on the
production.
Warning
Do not use prefer_shifts_over_empty
if you don't understand the
implications. Try to understand conflicts and
resolution strategies.
error_recovery¶
By default set to False
. If set to True
default error recovery will be used.
If set to a Python function, the function will be called to recover from errors.
For more information see Error recovery.
debug/debug_layout¶
This parameter if set to True
will put the parser in debug mode. In this mode
parser will print a detailed information of its actions to the standard output.
To put layout subparser in the debug mode use the debug_layout
parameter. Both
parameters are set to False
by default.
For more information see Debugging
debug_colors¶
Set this to True
to enable terminal colors in debug/trace output. False
by
default.
tables¶
The value of this parameter is either parglare.LALR
or parglare.SLR
and it
is used to choose the type of LR tables to create. By default LALR
tables are
used with a slight twist to avoid Reduce/Reduce conflicts that may happen with
pure LALR tables. This parameter should not be used in normal circumstances and
is provided more for experimentation purposes.
force_load_table¶
LR table is loaded from <grammar_file_name>.pgt
file if the file exists and is
newer than all of the grammar files, root and imported. If any of the grammar
file modification time is greater than the modification time of the cached LR
table file, table is recalculated and persisted. If you are deploying the parser
in a way that will change file modification times which would trigger table
calculation you can set force_load_table
to True
. If this flag is set no
modification check will be performed and table calculation will happen only if
.pgt
file doesn't exist.
table¶
You can pass precomputed parsing table here. This is useful for implementing
custom parse table caching. None
value for this parameter (the default)
instructs parser to build (or fetch from cache) it's own tables internally.
Example flow for custom caching is shown in an example.
Warning
Be careful to provide parse tables compatible with parser type. Passing
tables containing conflicts to Parser
class will probably result in an
error, but passing tables with automatically resolved conflicts
(prefer_shifts=True
) to GLRParser
will result in parser which may skip
proper parses.
parse
and parse_file
calls¶
parse
call is used to parse input string or list of objects. For parsing of
textual file parse_file
is used.
These two calls accepts the following parameters:
-
input_str
- first positional and mandatory parameter only forparse
call - the input string/list of objects. -
position
- the start position to parse from. By default 0. -
extra
- an object used for arbitrary user state kept during parsing. It will be accessible on context-like objects. If not given an instance ofdict
will be created. -
file_name
- first positional and mandatory parameter only forparse_file
call - the name/path of the file to parse.
Token
class¶
This class from parglare.parser
is used to represent lookahead tokens. Token
is a concrete matched terminal from the input stream.
Attributes:
-
symbol
(Terminal
) - terminal grammar symbol represented by this token, -
value
(list
orstr
) - matched part of the input stream, -
additional_data
(list
) - additional information returned by a custom recognizer. -
length
(int
) - length of the matched input.