-- General Information
1. This is an RST-style text-level discourse parser that produces discourse
rhetorical trees in a bottom-up mode.
2. This study explores the representation learning of non-leaf tree nodes.
-- As shown in Figure 3, we find that our model prefers state transition
information and the principle component of text spans when the EDU number
of text span grows (i.e., a deeper tree).
-- As shown in Figure 4, this work also finds that with better representation
(automatic information flow incorporation), the proposed parser obtains better
performance on upper-layer nodes.
-- Required Packages
torch==0.4.0
numpy==1.14.1
nltk==3.3
stanfordcorenlp==3.9.1.1
-- Train Your Own RST Parser
Run main.py
-- RST Parsing with Raw Documents
1. Prepare your raw documents in data/raw_txt in the format of *.out
2. Run the Stanford CoreNLP with the given bash script corpus_rst.sh
using the command "./corpus_rst.sh ". Of course, if you use other
models for EDU segmentation then you do not need to perform the
action in step 2.
3. Run parser.py to parse these raw documents into objects of rst_tree
class (Wrap them into trees).
- segmentation (or you can use your own EDU segmenter)
- wrap them into trees, saved in "data/trees_parsed/trees_list.pkl"
4. Run drawer.py to draw those trees out by NLTK
Note: We did not provide parser codes and it can be easily implemented referring to our previous project.
-- Reference
Please read the following paper for more technical details
-- Developer
Longyin Zhang
Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, China
mail to: zzlynx@outlook.com, lyzhang9@stu.suda.edu.cn
-- License
Copyright (c) 2019, Soochow University NLP research group. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that
the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the
following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the
following disclaimer in the documentation and/or other materials provided with the distribution.