Skip to content

NLP-Discourse-SoochowU/rst_dp2019

Repository files navigation

Transition based Bottom-up RST-style Text-level Discourse Parser

-- General Information

   1. This is an RST-style text-level discourse parser that produces discourse 
      rhetorical trees in a bottom-up mode.
   2. This study explores the representation learning of non-leaf tree nodes. 
      -- As shown in Figure 3, we find that our model prefers state transition 
      information and the principle component of text spans when the EDU number 
      of text span grows (i.e., a deeper tree). 
      -- As shown in Figure 4, this work also finds that with better representation 
      (automatic information flow incorporation), the proposed parser obtains better 
      performance on upper-layer nodes.

Image text

-- Required Packages

   torch==0.4.0 
   numpy==1.14.1
   nltk==3.3
   stanfordcorenlp==3.9.1.1

-- Train Your Own RST Parser

    Run main.py

-- RST Parsing with Raw Documents

   1. Prepare your raw documents in data/raw_txt in the format of *.out
   2. Run the Stanford CoreNLP with the given bash script corpus_rst.sh 
      using the command "./corpus_rst.sh ". Of course, if you use other 
      models for EDU segmentation then you do not need to perform the 
      action in step 2.
   3. Run parser.py to parse these raw documents into objects of rst_tree 
      class (Wrap them into trees).
      - segmentation (or you can use your own EDU segmenter)
      - wrap them into trees, saved in "data/trees_parsed/trees_list.pkl"
   4. Run drawer.py to draw those trees out by NLTK
   Note: We did not provide parser codes and it can be easily implemented referring to our previous project.

rst_dp2018

-- Reference

Please read the following paper for more technical details

Longyin Zhang, Xin Tan, Fang Kong and Guodong Zhou, A Recursive Information Flow Gated Model for RST-Style Text-Level Discourse Parsing.

-- Developer

  Longyin Zhang
  Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, China
  mail to: zzlynx@outlook.com, lyzhang9@stu.suda.edu.cn

-- License

   Copyright (c) 2019, Soochow University NLP research group. All rights reserved.
   Redistribution and use in source and binary forms, with or without modification, are permitted provided that
   the following conditions are met:
   1. Redistributions of source code must retain the above copyright notice, this list of conditions and the
      following disclaimer.
   2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the
      following disclaimer in the documentation and/or other materials provided with the distribution.

About

A recent study on English RST-style Text-level Discourse Parsing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published