Generates text summaries from given inputs through several methods. supports Korean only, for now.
Currently implemented methods:
- Pointer-Generator Network (most codes are from this project with some modification.)
- TextRank (by
textrankr
library) Transformer
This is a sub-project of skku-coop-project backed by SK Planet.
hanja
textrankr
django
tensorflow >= 2.0.0
koalanlp
NOTE: Now all requirements are included. No need to download.
- Automated bash script is included. Now you just need to run
run_demo.sh
only. - I recommend to use python virtual environment
venv
to isolate the development environment.
Before running demo page server, you should export your working directory as PYTHONPATH
. Try with this:
export PYTHONPATH=$PYTHONPATH:path/to/project
Also this project uses Stanford CoreNLP library to regularize input sentences. We assume you already installed Java runtime & downloaded CoreNLP library.
Like PYTHONPATH
, to run the server you need to specify CoreNLP jar file into CLASSPATH
. Try with this:
export CLASSPATH=$CLASSPATH:path/to/corenlp/stanford-corenlp-(version number).jar
Since datasets are too large to upload on Github, the files are uploaded on Google Drive. Download with links below:
- Korean Dataset (brought from BigKinds news data using news-crawler)
- English Dataset (brought from CNN/Dailymail dataset, using cnn-dailymail)
- Korean Pretrained Model
- English Pretrained Model (brought from here)
Dataset location
original: project-root/data/
preprocessed: project-root/src/sum/pgn/data/
Model location
project-root/src/sum/pgn/model/
Note that you have to extract content only. Do not create subdirectory under the location.
The demo page is made with Django framework. To run the demo, try this:
python src/demo/manage.py runserver