Experimenting with a perceptron in python using NLP and supervised learning to try and classify strings of text using previous examples where there doesn't seem to be any obvious way to classify them.
python3
nltk
spacy
textblob
vadersentiment
The characters inputs are concerned with the amount of each character that is used.
The words inputs are concerned with things like the length of a string, types of words, more meta-data kind of analysis
The meanings inputs are concerned with trying to derive subtle meanings from the words
Has been able to classify sentences from the following texts
95% Success Rate comparing Thomas Hobbes - Leviathan && Rene Descartes - Meditations
80% Success Rate comparing John Stuart Mill - On Liberty && Shakespeare - The Tempest
77% Success Rate comparing John Stuart Mill - On Liberty && John Stuart Mill - Utilitarianism
There are two ways of running the classifier
This way will run the classifier with an epoch of 100, and running on the input text files.
python3 nlc.py -it file_one.txt file_two.txt -e 100
This way will run the classifier with the input files as before, with an optional bias value of 7, if left blank it will default to 0.
Setting the threshold and print_at_epoch values will mean that at every 10th epoch the classifier will run the tests, print the output, and if the success rate is higher or equal to the threshold it will store the current set of weights. At the end if there are any weights saved then the weights will be averaged and the final tests will be run with those weights.
python3 nlc.py -it file_one.txt file_two.txt -e 1000 -t 0.8 -p 10 -b 7
To write files containing data to test it needs to be in the format of
#1T
This is the first training example that will be tested and alter the weights with.
#2T
This is the second, every time an epoch is finished the ordering is randomized to try and maximize success
#3
After writing training examples this is how you write the tests that will be used to test the success, these will not be included in the training data to train perceptron
When testing against two different files the files need to have the same amount of training data
See the example directory for examples and something to use to run the classifier
The classifier is currently set up to allow a maximum of 250 words per sentence, can be altered in the source in words, characters and meanings directory files