#Lyrics Predictor
This is the final project for an Artificial Intelligence class, in which we implemented a word predictor for a corpus of Rock and Pop song lyrics. Our algorithm is a weighted combination of an N-gram model with discounting and back-off, and an N-gram for tags.
Our results and analysis can be found in the project presentation or the report.
To run the project use python 3.4
The following dependencies are needed:
- nltk 3
- pandas
- bokeh
in corpus/raw/ and corpus/lyric_corpus/
- run cl_client.py using a songlist file to crawl lyrics
- run corpus_builder.py to clean the raw data
- manually remove empty files, non english lyrics, etc.
- run category.py to generate a category file
in analysis/
- run basic_statistics.py for basic meassurements on corpus
- run collocation.py for bigram and trigram collocations in POP and ROCK
in analysis/
- run linearCombination.py
- run perplexity.py
- run predictWord.py
- run testSimpleNgram.py
- run testSmoothing.py
- run tryAlpha.py
in nGram/ the following models and taggers can be found:
- nGramModel.py
- NgramTagModel.py
- trainTagger.py