Skip to content

The ability of GPT-2 to create abstractive summaries with fine-tuning using only keywords, without pre-training.

License

Notifications You must be signed in to change notification settings

ogulcanertunc/Abstractive-Text-Summarization

Repository files navigation

Contributors Forks Stargazers MIT License LinkedIn


Logo

Abstractive Text Summarization

...
Explore the docs »

Table of Contents
  1. About The Project
  2. File Descriptions
  3. Usage
  4. Roadmap
  5. License
  6. Contact
  7. Acknowledgements

About The Project

![Product Name Screen Shot][product-screenshot]

Text Summarization is the task of extracting important information from the original text document. In this process, the extracted information is produced as a report and presented to the user as a short summary. It is very difficult for people to understand and interpret the content of different types of texts, the language structure used and the subject matter are the most important factors in this.

In this project, our model is set up over a dataset created from news, by addressing abstractive text summarization methods, which are a state-of-the-art. This article collectively summarizes and decipheres the various methodologies, challenges, and problems of abstractive summarization. The importance of abstractive summary is to save time, as understanding both the abundance of documents and the required and unnecessary documents in many industries is a huge waste of time.


Built With

This section should list any major frameworks that you built your project using. Leave any add-ons/plugins for the acknowledgements section. Here are a few examples.


File Descriptions


Usage

First, download the required dataset and preprocess it, then save your torch files by preprocessing for gpt. Read the saved PyTorch files into the model training script. Save the trained models here and then create your summary with the metrics you want.


Roadmap

Methodology

[methodology Road]

  • Overview of Summarization; In our project, we will first discuss the types and methods of summarizing.
  • Using NLP to Summarize Tasks; Everything we do to create a summary is considered in terms of NLP.
  • Generative Pre-Trained Transformer; Enabling the creation of summaries using the productivity of the gpt-2.
  • Pre-Training with Transfer Learning; It involves the creation of a model that we will set up to make the abstracts we create abstractive, using the structure of GPT-2 with 1.5 billion parameters.
  • Fine tuning; A detailed and difficult process that needs to be done in order to increase the success of the model.
  • Analysis; Comparison of the created summaries with real summaries.

Summary Types

  • Extractive Summarization:
    Extractive type of summarization creates a weight depending on the frequency of the words in the texts and takes the values and text fragments assigned to these words and presents them as a summary, this is a summary method that can be considered simple and has poor semantic integrity.

  • Abstract Summarization:
    The model created in abstract summarization includes intuitive approaches to trying to understand all the content in the text and to create a summary based on this understanding. This makes the created summary similar to human-made. Model training is very important when creating the model for summary, but creating this abstractive summary is very difficult and complex.


Model Architecture

[Model Arch]

Our dataset consists of approximately 4500 news, the reason we choose news for text summarization is to ensure that it is consistent in its post-training text creation task as it has almost every topic.

We train our model with the news and keywords that we create in our data set.

Generally, the preferred study here is to create models with educational texts and summaries. but we worked with keywords where we could focus on the core topics of the news with the GPT-2 and fine-tuning ability.

With the pre-trained GPT-2 model, we fine-tune our model with dataset, what we have. In doing this, we used the transformer architecture, which used by google for the first time.


Pre-Training With Transfer Learning

  • One of the biggest challenges of supervised learning in the NLP field is the hassle of finding appropriate data to train and mask the classifier. Here, by masking a certain part of the text given for learning at certain intervals, it prevents the algorithm from directly producing copy-paste style text. We trained our GPT-2 model for 2 main tasks Language modeling (LM) 2. Multiple choice (MC) prediction task.

  • Our LM task projects the hidden state into the word embedding output layer and applies a cross-loss of entropy to a keyword and text. For the MC task, we supported the simple classification task by adding "beginning of text" and "end of text" tokens.

  • Like BERT, GPT-2 can automatically generate text by itself, but to create it by relating it to the word before the word it creates. In other words, it is obtained by generating left from n-1 token inputs for the nth token output. The multiple-choice learning token is the tensor of specifies the i'th element,and wich is the correct keyword hash pair.


Model Training

The full version of the GPT-2 has more than 1.5 billion parameters, since the personal computers we are currently using will not be sufficient for this process, for this reason we used the DistilGPT2 version in our model, and we set our epoch as 5 in training. In training set we got 3200 news from the main dataset and we thought that news were divided into 4 randomly parts. We applied the same division process to our validation set.


Model Training Sequence Generation

The output of the model we produced with GPT-2 is a tensor containing line length and word sizes. This shows a probability distribution, we applied softmax to make it more meaningful and we applied the scaling process called temperature (t) to avoid a distorted distribution in reshaping.

Applying the temperature process affects the distribution if words are used with low probability, so to avoid this negativity we created a new metric for each summary using the method called top-p sampling.


Analysis

[Analysis]

The ROGUE score is a measurement method that generally indicates the quality of text in creating a summary. The closer the summary score to 1, the higher the quality of the summary created. Rogue values include word combinations like in n-grams. When we looked at our results, we saw that when enough fine-tuning was done with a small dataset, we could get really good results.

Short

The project has been trained as an epoch value of 5 with distilGPT2, which we can call the first version. A simple nltk method has been used to generate keywords. In the next step we will try to increase the consistency of the project by using BERT to generate keywords.

After solving the keyword problem, we plan to train our model with larger packages of GPT2 to produce more successful and high-score projects.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

LinkedIn

E-Mail

Project Link: https://github.com/ogulcanertunc/Abstractive-Text-Summarization

Acknowledgements

About

The ability of GPT-2 to create abstractive summaries with fine-tuning using only keywords, without pre-training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages