Conditional BERT contextual augmentation and BERT with Self-Supervised Attention on GoEmotions

Pytorch Implementation of GoEmotions, Conditional BERT contextual augmentation and BERT with Self-Supervised Attention with Huggingface Transformers

About GoEmotions

Dataset labeled 58000 Reddit comments with 28 emotions

admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral
dataset with three different taxonomies placed in data

Requirements

Clone this repo
Python 3.6
Install Pytorch==1.4.0
Install the rest of the requirements: pip install -r requirements.txt

To perform dataset analysis on GoEmotions

python analyze_dataset.py [--aug]

Hyperparameters can be changed from the json files in config directory. By default, the python script runs dataset analysis on data/original/train.tsv, with labels defined in data/original/labels.txt. If run with the aug flag on, dataset analysis will be performed on the augmented training dataset stored at data/original/train_augmented_*.tsv by default, without reading the labels file (augmented training dataset is generated using CBERT with a label distribution threshold of user's choice).

To Run Vanilla BERT with GoEmotions

python run_goemotions.py --taxonomy original

To Run Conditional BERT for generating new examples

First, finetune the conditional BERT model with the original training dataset.

python cbert_finetune.py

Second, use the model saved in the previous step to generate new examples. The original examples, masked version and predicted version are stored in separate files in data/original by default.

python cbert_augdata.py

Third, remove duplicates, sanitize and merge the newly generated into the original training corpus.

python cbert_merge.py

To Run BERT with Self-Supervised Attention on GoEmotions

cd ssa_BERT
python run_ssa.py

To run BERT with Self-Supervised Attention on the augmented GoEmotions dataset, simply change the default value of train_data_file defined in ssa_BERT/run_ssa.py.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
data		data
dataset_analysis		dataset_analysis
ssa_BERT		ssa_BERT
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze_dataset.py		analyze_dataset.py
cbert_augdata.py		cbert_augdata.py
cbert_dataset.py		cbert_dataset.py
cbert_finetune.py		cbert_finetune.py
cbert_merge.py		cbert_merge.py
cbert_utils.py		cbert_utils.py
data_loader.py		data_loader.py
model.py		model.py
multilabel_pipeline.py		multilabel_pipeline.py
requirements.txt		requirements.txt
run_goemotions.py		run_goemotions.py
utils.py		utils.py

License

CandyDong/11747-A4

Folders and files

Latest commit

History

Repository files navigation

Conditional BERT contextual augmentation and BERT with Self-Supervised Attention on GoEmotions

About GoEmotions

Requirements

To perform dataset analysis on GoEmotions

To Run Vanilla BERT with GoEmotions

To Run Conditional BERT for generating new examples

To Run BERT with Self-Supervised Attention on GoEmotions

Reference

About

Resources

License

Stars

Watchers

Forks

Languages