Skip to content

jerryyeezus/nlp-summarization

Repository files navigation

RST Parser

Basic Description

RST parser for document-level discourse parsing. The parsing algorithm is shift-reduce parsing, and the parsing model is a offline trained multi-class classifier.

To obtain a good performance, you can:

  • add more features into the feature generator (in feature.py)
  • tune the parameters in parsing model (in model.py). For now, I simply use LinearSVC with default parameter setting.

Demo

Start from "main.py" for a demo

Modules

  • tree: any operation about an RST tree is included in this module. For example
    • Build general/binary RST tree from annotated file
    • Binarize a general RST tree to the binary form (original RST trees in the RST treebank may not in the binary form)
    • Generate bracketing sequence for evaluation
    • Write an RST tree into file (not implemented yet)
    • Generate Shift-reduce parsing action examples
    • Get all EDUs from the RST tree
  • parser: an implementation of the shift-reduce parsing algorithm, including following functions:
    • Initialize parsing status given a sequence of texts
    • Change the status according to a specific parsing action
    • Get the status of stack/queue
    • Check whether should stop parsing
  • model: an parsing model module, where a trained parsing model could predict parsing actions. This module includes:
    • Batch training on the data generated by the data module
    • Predict parsing actions for a given feature set
    • Save/load parsing model
  • feature: an feature generator, which can generate features from current stack/queue status.
  • data: generate training data for offline training

Main Classes

(For all the following functions, please refer to the code for more explanation)

  • RSTTree (in tree module):
    • build(): Build an binary RST tree from an annotated discourse file
    • generate_sample(): Generate a sequence of parsing actions and the corresponding training examples, which can be used for offline training on parsing model
    • getedutext(): Get a sequence of EDU texts from the given RST tree
    • bracketing: Generate bracketing sequence for evaluation
  • SRParser (in parser module):
    • init(texts): Initialize the queue status from the given text sequence. Each element in this sequence will be treated as an EDU
    • operate(action_tuple): Change the queue/stack according to the action tuple, for example, the operation (Shift, None, None) will move one element from the head of the queue to the top of the stack
    • getparsetree(): Return the entire RST tree
  • FeatureGenerator (in feature module):
    • features(): the major generator which could extract all the necessary features from current queue/stack. You can extend this generator by calling other sub-functions in it.
  • ParsingModel (in model module):
    • train(trnM, trnL): Offline training on the parsing model (aka, a multi-class classifier) from the given training data trnM and corresponding labels trnL
    • predict(features): Predict a parsing action according to the given feature generator
    • sr_parse(texts): Performing shift-reduce RST parsing on the given text sequence. Each element in this sequence will be treated as an EDU
  • Data (in data module):
    • buildvocab(thresh): Build feature vocab by removing some low-frequency features. The same vocab will also be used for future parsing work in test stage.
    • buildmatrix(): Build data matrix for offline training
    • savematrix(fname): Save data matrix and corresponding labels into fname
    • getvocab(): Get feature vocab
    • savevocab(fname): Save feature vocab and relation mapping (from relations to indices) into fname

Reference

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published