machine learning & data-mining stuffs written in python. archive for the main repo in Bitbucket
This repo consists of several projects:
-
class: spam classifiers (Naive-Bayes & Fisher's method) as suggested in http://www.gigamonkeys.com/book/practical-a-spam-filter.html . There's a validation phase which uses the ENRON datasets.
-
rec: recommendation engine based on the collective preferences. test phase will use the MovieLens dataset
-
search_engine: a simple search engine based on multiple metrics, including the original PageRank. also included a scrapy crawler for retrieving text in the The New Yorker website (DISCLAIMER I am not responsible for any illegal use or any kind of abuse. Use at your own risk)
-
twitter-analysis: several methods used to analyze sentiments based on live-tweets with geo-location information which are retrieved using a dummy account's OAuth (so i dont find it neccessary to remove those information)