'dynparser' is a python package that provides a runtime-configurable parser.
It is a true LL(n) parser and uses backtracking to achieve LL(n).
Terminal and Nonterminal symbols are represented as python classes.
To create a Nonterminal, derive from NTE
:
class Program(NTE):pass
A Terminal Symbol is derived from TE
and can create a regular
expression to direct to tokenizer:
class Value(TE): expression = re.compile("\d+")
TE
provides a defaul-re: \w+
.
The syntax is created by adding rules to a database:
add_rule(Program, [ "constant", Value ])
Such a syntax can be parsed easily:
parse_tree = parse("constant 123")
To add an additional rule in the syntax, inheritance can be utilized.
class Program(NTE):pass class Statement(NTE):pass class PrintStatement(Statement):pass class SetStatement(Statement):pass add_rule(PrintStatement, [ "print", Value ] ) add_rule(SetStatement, [ "set", Name, "=", Value] ) add_rule(Program, [ Statement ])
The parser will detect the inheritance and try to match
PrintStatement
as well as SetStatement
when a Statement
is
required. This reflects the 'isa' relationship.
This would represent the EBNF:
Program: Statement . Statement: PrintStatement | SetStatement . PrintStatement: "print" Value . SetStatement: "set" Name "=" Value .
However, the inheritance rule is only added when a rule for the
subclass is added via add_rule
.
Take this EBNF:
SimpleProgram: "{" SimpleProgram "}" | "x" .
To create choices, additional rules must be defined:
class SimpleProgram(NTE): pass add_rule(SimpleProgram, [ "{", SimpleProgram, "}"] ) add_rule(SimpleProgram, "x")
This is similar to the inheritance approach above, but not identical because an option implemented this way can have different productions, as inhertance-choices are purely forwarding rules.
Take this EBNF:
List: Value { "," Value } .
To parse lists, the following syntax is used:
class Rep1(NTE):pass class Value(NTE):pass class List(NTE):pass add_rule(List, [ Value, [Rep1] ]) add_rule(Rep1, [ ",", Value ])
But this will change in future, hopefully.
The function parse
returns the top production class
instantiated. All NTE classes have the field items
which contain
respective TE’s, NTE’s instantiated. TE classes have a member value
bound to the parsed value.
For constant TE’s ( like in this rule: ["is", "+"]), TE classes are
generated at runtime, containing a regular expression matching exactly
the given string. Those TE classes have their name set to
"TE_"+name
, where name is exactly the given constant string.
Since the parser returns a parse tree with the NTE/TE classes instantiated, semantic actions can be easily defined by implementing methods in the classes.
Python allows adding methods to classes at runtime, therefore it is no biggie to add methods directly to NTE and TE:
import new # add needed stuff to the base classes NTE.genpos = gendb.GEN_HERE NTE.render = new.instancemethod(render,None, NTE) NTE.template = Template(r"/* not implemented: ${nte.__class__} */")
However, to direct the parser while it is running, the following hooks are provided:
This method is called on an NTE/TE when it is about to be
parsed. Semantic checks can be implemented here. When the method
returns False
, the NTE/TE cannot be parsed.
Syntax rules can be added while parsing. This means one can extend the syntax arbitrarily.
The project hla
(https://github.com/mru00/hla) uses this. The syntax
is extended by uses
statements. These statements import new python
modules which in turn define new TE’s, NTE’s and call add_rule
. The
magic happens in the Uses.onparse method.
The tokenizer is as dynamic as the parser. Since TE’s can provide the regular expressions they match, the tokenizer can be configured at runtime.
Currently, no hierarchial namespaces are implemented. Probably this could be done outside the parser anyway.
Namespace is provided in the module namespace
. It provides simple
stores, indexed with strings and contain key-value-pairs.
# when reading a variable declaration add_symbol('variable', self.items[1].value, self)
class SimpleExpression(NTE): def calc(self): val = self.items[0].calc() for rep in self.items[1:]: val = rep.items[0].calc(val, rep.items[1].calc()) return val class Term(NTE): def calc(self): val = self.items[0].calc() for rep in self.items[1:]: val = rep.items[0].calc(val, rep.items[1].calc()) return val class Factor(NTE):pass # note the inheritance: class NumberFactor(Factor): def calc(self): return int(self.items[0].value) class ExpressionFactor(Factor): def calc(self): return self.items[1].calc() class AdditionOperator(TE): expression = r"\+|-" def calc(self, left, right): return { "+" : lambda: left+right, "-" : lambda: left-right }[self.value]() class MultiplicationOperator(TE): expression = r"\*|/|mod" def calc(self, left, right): return { "*" : lambda: left*right, "/" : lambda: left/right, "mod": lambda: left%right}[self.value]() class Number(TE): expression = r"\d+" class Sign(TE): expression = r"-|\+" class SignOpt(NTE):pass class Rep2(NTE):pass class Rep3(NTE):pass add_rule(SignOpt, (Sign,)) add_rule(SignOpt, ()) add_rule(Rep2, (AdditionOperator, Term )) add_rule(Rep3, (MultiplicationOperator, Factor)) add_rule(SimpleExpression, [ Term, [Rep2] ]) add_rule(Term, [ Factor, [Rep3] ]) add_rule(NumberFactor, [ Number ]) add_rule(ExpressionFactor, [ "(", SimpleExpression, ")" ]) self.assertEquals(parse("5+4", SimpleExpression).calc(), 5+4) self.assertEquals(parse("5+4+9+9+8+7", SimpleExpression).calc(), 5+4+9+9+8+7) self.assertEquals(parse("1*2+(5*3)+(10/2)", SimpleExpression).calc(), 1*2+(5*3)+(10/2)) self.assertEquals(parse("10/9*(10*(10))+1-1", SimpleExpression).calc(), 10/9*(10*(10))+1-1)
Have a look at the unit tests, in /test
. They show some features of
the parser, and real-world examples.