Classify which year a text was written using multiple methods
To recreate the datasets from scratch, including extracting the year and surface text from .xml.bz
files downloaded from Språkbanken, removing unwanted data, shuffling, and sampling, please see the programs under the data
directory. All are runnable and accept the --help
argument.
pip3 install -r requirements.txt
To run the default Bayes model on samples with size 1000 characters, run:
python3 sk_learn <dataset_directory_path> -t bayes -s 1000
To see additional options, run:
python3 sk_learn --help
TensorFlow and Keras has to be installed See TensorFlow installation page and Keras installation page
python3 runeberg/classifier.py [conv|multiconv|lstm]