Skip to content

mru00/dynparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dynparser - a dynamic parser that can extend the syntax during runtime

Abstract

'dynparser' is a python package that provides a runtime-configurable parser.

It is a true LL(n) parser and uses backtracking to achieve LL(n).

1. Overview

Terminal and Nonterminal symbols are represented as python classes.

To create a Nonterminal, derive from NTE:

class Program(NTE):pass

A Terminal Symbol is derived from TE and can create a regular expression to direct to tokenizer:

class Value(TE):
	expression = re.compile("\d+")

TE provides a defaul-re: \w+.

The syntax is created by adding rules to a database:

add_rule(Program, [ "constant", Value ])

Such a syntax can be parsed easily:

parse_tree = parse("constant 123")

2. Options through inheritance

To add an additional rule in the syntax, inheritance can be utilized.

class Program(NTE):pass
class Statement(NTE):pass
class PrintStatement(Statement):pass
class SetStatement(Statement):pass

add_rule(PrintStatement, [ "print", Value ] )
add_rule(SetStatement, [ "set", Name, "=", Value] )
add_rule(Program, [ Statement ])

The parser will detect the inheritance and try to match PrintStatement as well as SetStatement when a Statement is required. This reflects the 'isa' relationship.

This would represent the EBNF:

Program:
  Statement
  .
Statement:
  PrintStatement
  | SetStatement
  .
PrintStatement:
  "print" Value
  .
SetStatement:
  "set" Name "=" Value
  .

However, the inheritance rule is only added when a rule for the subclass is added via add_rule.

3. Choices

Take this EBNF:

SimpleProgram:
  "{" SimpleProgram "}"
  | "x"
  .

To create choices, additional rules must be defined:

class SimpleProgram(NTE): pass
add_rule(SimpleProgram, [ "{", SimpleProgram, "}"] )
add_rule(SimpleProgram, "x")

This is similar to the inheritance approach above, but not identical because an option implemented this way can have different productions, as inhertance-choices are purely forwarding rules.

4. Repetitions

Take this EBNF:

List: Value { "," Value }
  .

To parse lists, the following syntax is used:

class Rep1(NTE):pass
class Value(NTE):pass
class List(NTE):pass
add_rule(List, [ Value, [Rep1] ])
add_rule(Rep1, [ ",", Value ])

But this will change in future, hopefully.

5. Parse Tree

The function parse returns the top production class instantiated. All NTE classes have the field items which contain respective TE’s, NTE’s instantiated. TE classes have a member value bound to the parsed value.

For constant TE’s ( like in this rule: ["is", "+"]), TE classes are generated at runtime, containing a regular expression matching exactly the given string. Those TE classes have their name set to "TE_"+name, where name is exactly the given constant string.

6. Hooks, Semantic actions

Since the parser returns a parse tree with the NTE/TE classes instantiated, semantic actions can be easily defined by implementing methods in the classes.

Python allows adding methods to classes at runtime, therefore it is no biggie to add methods directly to NTE and TE:

import new

# add needed stuff to the base classes
NTE.genpos = gendb.GEN_HERE
NTE.render = new.instancemethod(render,None, NTE)
NTE.template =  Template(r"/* not implemented: ${nte.__class__} */")

However, to direct the parser while it is running, the following hooks are provided:

6.1. NTE.canparse, TE.canparse

This method is called on an NTE/TE when it is about to be parsed. Semantic checks can be implemented here. When the method returns False, the NTE/TE cannot be parsed.

6.2. NTE.onparse, TE.onparse

Called when an TE/NTE was successfully parsed. For example, this can be used to create entries in the Namespace.

7. Dynamic Syntax

Syntax rules can be added while parsing. This means one can extend the syntax arbitrarily.

The project hla (https://github.com/mru00/hla) uses this. The syntax is extended by uses statements. These statements import new python modules which in turn define new TE’s, NTE’s and call add_rule. The magic happens in the Uses.onparse method.

8. Tokenizer

The tokenizer is as dynamic as the parser. Since TE’s can provide the regular expressions they match, the tokenizer can be configured at runtime.

9. Namespaces

Currently, no hierarchial namespaces are implemented. Probably this could be done outside the parser anyway.

Namespace is provided in the module namespace. It provides simple stores, indexed with strings and contain key-value-pairs.

# when reading a variable declaration
add_symbol('variable', self.items[1].value, self)

10. Examples

10.1. Expressions

class SimpleExpression(NTE):
    def calc(self):
        val = self.items[0].calc()
        for rep in self.items[1:]:
            val = rep.items[0].calc(val, rep.items[1].calc())
        return val

class Term(NTE):
    def calc(self):
        val = self.items[0].calc()
        for rep in self.items[1:]:
            val = rep.items[0].calc(val, rep.items[1].calc())
        return val

class Factor(NTE):pass

# note the inheritance:

class NumberFactor(Factor):
    def calc(self):
        return int(self.items[0].value)

class ExpressionFactor(Factor):
    def calc(self):
        return self.items[1].calc()

class AdditionOperator(TE):
    expression = r"\+|-"
    def calc(self, left, right):
        return { "+" : lambda: left+right,
                 "-" : lambda: left-right }[self.value]()

class MultiplicationOperator(TE):
    expression = r"\*|/|mod"
    def calc(self, left, right):
        return { "*" : lambda: left*right,
                 "/" : lambda: left/right,
                 "mod": lambda: left%right}[self.value]()

class Number(TE):  expression = r"\d+"
class Sign(TE):    expression = r"-|\+"
class SignOpt(NTE):pass

class Rep2(NTE):pass
class Rep3(NTE):pass

add_rule(SignOpt, (Sign,))
add_rule(SignOpt, ())
add_rule(Rep2, (AdditionOperator, Term ))
add_rule(Rep3, (MultiplicationOperator, Factor))

add_rule(SimpleExpression, [ Term, [Rep2] ])
add_rule(Term, [ Factor, [Rep3] ])
add_rule(NumberFactor, [ Number ])
add_rule(ExpressionFactor, [ "(", SimpleExpression, ")" ])

self.assertEquals(parse("5+4", SimpleExpression).calc(),
                  5+4)
self.assertEquals(parse("5+4+9+9+8+7", SimpleExpression).calc(),
                  5+4+9+9+8+7)
self.assertEquals(parse("1*2+(5*3)+(10/2)", SimpleExpression).calc(),
                  1*2+(5*3)+(10/2))

self.assertEquals(parse("10/9*(10*(10))+1-1", SimpleExpression).calc(),
                          10/9*(10*(10))+1-1)

10.2. Unit Tests

Have a look at the unit tests, in /test. They show some features of the parser, and real-world examples.

10.3. Project 'hla'

About

a dynamic parser [python]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages