Perform a Sentiment analysis using BERT on IMDB Movie Ratings Dataset.
IMDB dataset having 50K movie reviews for natural language processing or Text analytics.This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. The Dataset provides a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms.
BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language. It has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks, including Question Answering (SQuAD v1.1), Natural Language Inference (MNLI), and others.
- First of all create a virtual environment and run requirements.txt file
- For model training download the BERT_BASE_UNCASED model from here.
- You can also refer to the official hugginface documentation for BERT from here.
- After downloading extract the zip file under ..src/input folder.
- Go to the ..src/training/config.py to check the configurations and dataset and the model path and change accordingly.
- After following the above steps navigate to ..src/training/ and run the app.py to begin training.
- After training is completed the weights of the model will be saved as per the name and location given in the config file.