Parse trees

During shift/reduce operations parser will call actions. If build_tree parser constructor parameter is set to True the default actions for building parse tree nodes will be called. In the case of GLR parser multiple trees can be built simultaneously (the parse forest).

The nodes of parse trees are instances of either NodeTerm for terminal nodes (leafs of the tree) or NodeNonTerm for non-terminal nodes (intermediate nodes).

Each node of the tree has following attributes:

  • start_position/end_position - the start and end position in the input stream where the node starts/ends. It is given in absolute 0-based offset. To convert to line/column format for textual inputs you can use parglare.pos_to_line_col(input_str, position) function which returns tuple (line, column). Of course, this call doesn't make any sense if you are parsing a non-textual content.

  • layout_content - the layout that preceeds the given tree node. The layout consists of whitespaces/comments.

  • symbol - a grammar symbol this node is created for.

Additionally, each NodeTerm has:

  • value - the value (a part of input_str) which this terminal represents. It is equivalent to input_str[start_position:end_position].

Additionally, each NodeNonTerm has:

  • children - sub-nodes which are also of NodeNonTerm/NodeTerm type. NodeNonTerm is iterable. Iterating over it will iterate over its children.

  • production - a grammar production whose reduction created this node.

Each node has a tree_str() method which will return a string representation of the sub-tree starting from the given node. If called on a root node it will return the string representation of the whole tree.

For example, parsing the input 1 + 2 * 3 -1 with the expression grammar from the quick start will look like this if printed with tree_str():

E[0]
E[0]
  E[0]
    number[0, 1]
  +[2, +]
  E[4]
    E[4]
      number[4, 2]
    *[6, *]
    E[8]
      number[8, 3]
-[10, -]
E[11]
  number[11, 1]