Skip to content

elangovana/bert-classification

Repository files navigation

Build Status

BERT text classification SST2 using PyTorch

We train Stanford Sentiment Treebank - 2 (SST2) using BERT

Dataset

We use the Stanford Sentiment Treebank - 2

Setting up locally

  1. Install python 3.7.4

  2. Set up requirements.

    pip install -r tests/requirements.txt
  3. Verify set up

    export PYTHONPATH=./src
    pytest

SST2

  1. Preprocess data to split data into train , test and val sample files and save them to processdata directory

    export PYTHONPATH=src
    datadir=tmp
    
    python src/utils/sst2_split_utils.py --sentencefile $datadir/datasetSentences.txt  --sentiment $datadir/sentiment_labels.txt  --dictionary $datadir/dictionary.txt --split $datadir/datasetSplit.txt --outdir processdata

About

This classifies text using BERT

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published