AbstrEncap

A naïve implementation of abstractive summarization (of texts) using LSTMs.

Introduction

We train our RNN on «BigPatent» (cf. https://www.aclweb.org/anthology/P19-1212.pdf) which is already divided to training, development and test sets lying in directories with names a-h,y. The dataset can be obtained from https://evasharma.github.io/bigpatent/ after downloading it from the google drive link. Roughly speaking The dataset contains millions of descriptions and abstracts of patents.

As the dataset contains long texts with varied lengths of summaries we perform a preliminary step before training: We split each text to paragraphs beginning in a key sentences. In order to obtain the key sentences in each paragraph we: 0. Create a list of sentences for the text

calculate for each sentence a representing vector. This vector is is obtained by multiplying every word2vec vector of each word by the tf-idf score of the word inside the text.
Apply a forward "PageRanking" method (cf. https://www.aclweb.org/anthology/P04-3020.pdf), with cosine similarity.
Suppose the "summary" (abstract) of the text contains N sentences, we say a sentence in the original text is a key sentence if its score is one of the N maximal ones.

Then we train an RNN with 3 LSTMs with an implementation of Bahdanau's attention (https://arxiv.org/pdf/1409.0473.pdf). In the future we may test if Loung's attention or usage of local attention can improve the network.

Requirements

All requirements are specified in requirements.txt

Usage

An example of usage is given in abs_sum.py for reading, parsing and learning BigPatent database.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
abs_sum.py		abs_sum.py
b_attention.py		b_attention.py
globalatt.py		globalatt.py
jsonl.py		jsonl.py
parse_dbs.py		parse_dbs.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

abs_sum.py

abs_sum.py

b_attention.py

b_attention.py

globalatt.py

globalatt.py

jsonl.py

jsonl.py

parse_dbs.py

parse_dbs.py

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

AbstrEncap

A naïve implementation of abstractive summarization (of texts) using LSTMs.

Introduction

Requirements

Usage

About

Releases

Packages

Languages

schwartznir/AbstrEncap

Folders and files

Latest commit

History

Repository files navigation

AbstrEncap

A naïve implementation of abstractive summarization (of texts) using LSTMs.

Introduction

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Languages