dynparser - a dynamic parser that can extend the syntax during runtime

Abstract

'dynparser' is a python package that provides a runtime-configurable parser.

It is a true LL(n) parser and uses backtracking to achieve LL(n).

1. Overview

Terminal and Nonterminal symbols are represented as python classes.

To create a Nonterminal, derive from NTE:

class Program(NTE):pass

A Terminal Symbol is derived from TE and can create a regular expression to direct to tokenizer:

class Value(TE):
	expression = re.compile("\d+")

TE provides a defaul-re: \w+.

The syntax is created by adding rules to a database:

add_rule(Program, [ "constant", Value ])

Such a syntax can be parsed easily:

parse_tree = parse("constant 123")

2. Options through inheritance

To add an additional rule in the syntax, inheritance can be utilized.

class Program(NTE):pass
class Statement(NTE):pass
class PrintStatement(Statement):pass
class SetStatement(Statement):pass

add_rule(PrintStatement, [ "print", Value ] )
add_rule(SetStatement, [ "set", Name, "=", Value] )
add_rule(Program, [ Statement ])

The parser will detect the inheritance and try to match PrintStatement as well as SetStatement when a Statement is required. This reflects the 'isa' relationship.

This would represent the EBNF:

Program:
  Statement
  .
Statement:
  PrintStatement
  | SetStatement
  .
PrintStatement:
  "print" Value
  .
SetStatement:
  "set" Name "=" Value
  .

However, the inheritance rule is only added when a rule for the subclass is added via add_rule.

3. Choices

Take this EBNF:

SimpleProgram:
  "{" SimpleProgram "}"
  | "x"
  .

To create choices, additional rules must be defined:

class SimpleProgram(NTE): pass
add_rule(SimpleProgram, [ "{", SimpleProgram, "}"] )
add_rule(SimpleProgram, "x")

This is similar to the inheritance approach above, but not identical because an option implemented this way can have different productions, as inhertance-choices are purely forwarding rules.

4. Repetitions

Take this EBNF:

List: Value { "," Value }
  .

To parse lists, the following syntax is used:

class Rep1(NTE):pass
class Value(NTE):pass
class List(NTE):pass
add_rule(List, [ Value, [Rep1] ])
add_rule(Rep1, [ ",", Value ])

But this will change in future, hopefully.

5. Parse Tree

The function parse returns the top production class instantiated. All NTE classes have the field items which contain respective TE’s, NTE’s instantiated. TE classes have a member value bound to the parsed value.

For constant TE’s ( like in this rule: ["is", "+"]), TE classes are generated at runtime, containing a regular expression matching exactly the given string. Those TE classes have their name set to "TE_"+name, where name is exactly the given constant string.

6. Hooks, Semantic actions

Since the parser returns a parse tree with the NTE/TE classes instantiated, semantic actions can be easily defined by implementing methods in the classes.

Python allows adding methods to classes at runtime, therefore it is no biggie to add methods directly to NTE and TE:

import new

# add needed stuff to the base classes
NTE.genpos = gendb.GEN_HERE
NTE.render = new.instancemethod(render,None, NTE)
NTE.template =  Template(r"/* not implemented: ${nte.__class__} */")

However, to direct the parser while it is running, the following hooks are provided:

6.1. NTE.canparse, TE.canparse

This method is called on an NTE/TE when it is about to be parsed. Semantic checks can be implemented here. When the method returns False, the NTE/TE cannot be parsed.

6.2. NTE.onparse, TE.onparse

Called when an TE/NTE was successfully parsed. For example, this can be used to create entries in the Namespace.

7. Dynamic Syntax

Syntax rules can be added while parsing. This means one can extend the syntax arbitrarily.

The project hla (https://github.com/mru00/hla) uses this. The syntax is extended by uses statements. These statements import new python modules which in turn define new TE’s, NTE’s and call add_rule. The magic happens in the Uses.onparse method.

8. Tokenizer

The tokenizer is as dynamic as the parser. Since TE’s can provide the regular expressions they match, the tokenizer can be configured at runtime.

9. Namespaces

Currently, no hierarchial namespaces are implemented. Probably this could be done outside the parser anyway.

Namespace is provided in the module namespace. It provides simple stores, indexed with strings and contain key-value-pairs.

# when reading a variable declaration
add_symbol('variable', self.items[1].value, self)

10. Examples

10.1. Expressions

class SimpleExpression(NTE):
    def calc(self):
        val = self.items[0].calc()
        for rep in self.items[1:]:
            val = rep.items[0].calc(val, rep.items[1].calc())
        return val

class Term(NTE):
    def calc(self):
        val = self.items[0].calc()
        for rep in self.items[1:]:
            val = rep.items[0].calc(val, rep.items[1].calc())
        return val

class Factor(NTE):pass

# note the inheritance:

class NumberFactor(Factor):
    def calc(self):
        return int(self.items[0].value)

class ExpressionFactor(Factor):
    def calc(self):
        return self.items[1].calc()

class AdditionOperator(TE):
    expression = r"\+|-"
    def calc(self, left, right):
        return { "+" : lambda: left+right,
                 "-" : lambda: left-right }[self.value]()

class MultiplicationOperator(TE):
    expression = r"\*|/|mod"
    def calc(self, left, right):
        return { "*" : lambda: left*right,
                 "/" : lambda: left/right,
                 "mod": lambda: left%right}[self.value]()

class Number(TE):  expression = r"\d+"
class Sign(TE):    expression = r"-|\+"
class SignOpt(NTE):pass

class Rep2(NTE):pass
class Rep3(NTE):pass

add_rule(SignOpt, (Sign,))
add_rule(SignOpt, ())
add_rule(Rep2, (AdditionOperator, Term ))
add_rule(Rep3, (MultiplicationOperator, Factor))

add_rule(SimpleExpression, [ Term, [Rep2] ])
add_rule(Term, [ Factor, [Rep3] ])
add_rule(NumberFactor, [ Number ])
add_rule(ExpressionFactor, [ "(", SimpleExpression, ")" ])

self.assertEquals(parse("5+4", SimpleExpression).calc(),
                  5+4)
self.assertEquals(parse("5+4+9+9+8+7", SimpleExpression).calc(),
                  5+4+9+9+8+7)
self.assertEquals(parse("1*2+(5*3)+(10/2)", SimpleExpression).calc(),
                  1*2+(5*3)+(10/2))

self.assertEquals(parse("10/9*(10*(10))+1-1", SimpleExpression).calc(),
                          10/9*(10*(10))+1-1)

10.2. Unit Tests

Have a look at the unit tests, in /test. They show some features of the parser, and real-world examples.

10.3. Project 'hla'

See https://github.com/mru00/hla

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
doc		doc
src/dynparser		src/dynparser
test		test
.gitignore		.gitignore
Makefile		Makefile
README.asciidoc		README.asciidoc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

src/dynparser

src/dynparser

test

test

.gitignore

.gitignore

Makefile

Makefile

README.asciidoc

README.asciidoc

Repository files navigation

dynparser - a dynamic parser that can extend the syntax during runtime

Abstract

1. Overview

2. Options through inheritance

3. Choices

4. Repetitions

5. Parse Tree

6. Hooks, Semantic actions

6.1. NTE.canparse, TE.canparse

6.2. NTE.onparse, TE.onparse

7. Dynamic Syntax

8. Tokenizer

9. Namespaces

10. Examples

10.1. Expressions

10.2. Unit Tests

10.3. Project 'hla'

About

Releases

Packages

Languages

mru00/dynparser

Folders and files

Latest commit

History

Repository files navigation

dynparser - a dynamic parser that can extend the syntax during runtime

Abstract

1. Overview

2. Options through inheritance

3. Choices

4. Repetitions

5. Parse Tree

6. Hooks, Semantic actions

6.1. NTE.canparse, TE.canparse

6.2. NTE.onparse, TE.onparse

7. Dynamic Syntax

8. Tokenizer

9. Namespaces

10. Examples

10.1. Expressions

10.2. Unit Tests

10.3. Project 'hla'

About

Resources

Stars

Watchers

Forks

Languages