Testing various algorithms for text retrieval
#TF-IDF Coupling the lecture information with this Wiki entry: http://en.wikipedia.org/wiki/Tf%E2%80%93idf#cite_note-understanding-5, on TF-IDF, I implemented a simple TF-IDF application in Python 2.7.8 I used Visual Studio 2013 to create the project because I wanted to take Python development in VS 2013 for a test drive. All I can say is "meh". Not great, but for this simple project not bad.
If you are not using VS 2013, and I wouldn't blame you, you can still take the three source files and and execute them from the Python terminal. The file app.py has the main() function.
These are the files and their uses
*app.py - Main file and what you should call from the terminal e.g. cd into the working directory and execute python app.py *corpus.py - Just a file that contains some short TextBlobs simulating 3 documents. *tfidf.py - A module containing the TFIDF class that has some various implementations of both TF and IDF.
YOU WILL HAVE TO install TextBlob e.g. pip install textblob AND READ THE INSTRUCTIONS - http://textblob.readthedocs.org/en/latest/api_reference.html
email me if you have questions: chuck@chucksailer.com AND by all means please Fork and enhance, critique, offer advise, leave abusive comments or whatever.