WikiSentiment

automatic categorization of user interactions in Wikipedia

Homepage

http://github.com/whym/wikisentiment

Contact

http://whym.org

Overview

preprocessing:

For each entry:

Extract raw features and put it to a MongoDB :

{
  "entry" {
    "rev_id":   2894772,
    "title": "Yosri",
    "content": {
      "added": [ "Hi This is ....", ],
      "removed"" []
    },
    "comment": "Hi This is ....",
    "timestamp": "...",
    "sender": {},
    "receiver": {}
  },
  "labels": {
     "debate":  false,
     "other":   false,
     "template": true,
     "welcome"   true,
     "suggest":  true,
     "invite":  false,
     "minor":   false,
     "vandal":  false
  },
  "features": {
    "ngram":   {"type": "assoc", "values": {...}},
    "SentiWN": {"type": "assoc", "values": {...}},
    ...
  }
  "vector": {
    "1": True,
    "2": True,
    "101": True,
    ...
  },
  ...
}

Convert the raw features into vectors, and update all entries in the MongoDB. (Different selection of features and/or hash kernels may be used here.)
For each entry, add it to the training set.
Train a classifier with the training set.
Output the resulting model.

Testing:

Load the model and construct a classifier.
For each entry, output it and the label predicted by the classifier.

Usage

Obtain a list of revisioin IDs or list of actual messages as CSV.

Requirements

Following python modules are required.

urllib2
pymongo
nltk (wordnet)
murmur
liblinear, liblinearutil

Todo

Support exporting and importing models
Efficient pipelining of Wikipedia API call, feature extraction and database insert with producer-consumer style
Add a visualization script for error analysis.
Support other languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
LICENSE		LICENSE
README.rst		README.rst
abuselog_revs.py		abuselog_revs.py
calc_performance.py		calc_performance.py
convert_csv.py		convert_csv.py
convert_from_multilabel.py		convert_from_multilabel.py
convert_to_multilabel.py		convert_to_multilabel.py
cross_validation.sh		cross_validation.sh
extract_features.py		extract_features.py
fextract.py		fextract.py
join_csv.py		join_csv.py
load_dataset.py		load_dataset.py
myutils.py		myutils.py
patterns.txt		patterns.txt
run_wikilove_categorized.sh		run_wikilove_categorized.sh
setup_wikilove.sh		setup_wikilove.sh
split_dataset.py		split_dataset.py
test.py		test.py
train.py		train.py
wikilove_categorized.py		wikilove_categorized.py
wikilove_revs.py		wikilove_revs.py

License

whym/wikisentiment

Folders and files

Latest commit

History

Repository files navigation

WikiSentiment

automatic categorization of user interactions in Wikipedia

Overview

Usage

Requirements

Todo

See also

About

Resources

License

Stars

Watchers

Forks

Languages