Python wrapper for the Java machine learning workbench Weka using the javabridge library.
Requirements:
- Python 2.7 (does not work with Python 3)
- javabridge (>= 1.0.11)
- matplotlib (optional)
- pygraphviz (optional)
- PIL (optional)
- Oracle JDK 1.6+
Uses:
- Weka (3.7.12)
Detailed instructions and links to videos on installing the library are located here.
Please note, that you need a build environment to compile some libraries from source.
You can post questions, patches or enhancement requests in the following Google Group:
https://groups.google.com/forum/#!forum/python-weka-wrapper
See python-weka-wrapper-examples
repository for example code on the various APIs. Also, check out the sphinx
documentation in the doc directory. You can generate HTML documentation
using the make html
command in the doc directory.
Available online documentation:
- Full documentation
- Shortcuts
- Command-line
- API
- Examples
Below are some examples of command-line use of the library. You can find these also on PyPi.
Artifical data can be generated using one of Weka's data generators, e.g., the Agrawal
classification generator:
python weka/datagenerators.py \ weka.datagenerators.classifiers.classification.Agrawal \ -o /tmp/out.arff
Filtering a single ARFF dataset, removing the last attribute using the Remove
filter:
python weka/filters.py \ -i /my/datasets/iris.arff \ -o /tmp/out.arff \ -c last \ weka.filters.unsupervised.attribute.Remove \ -R last
Example on how to cross-validate a J48
classifier (with confidence factor 0.3) on the iris UCI dataset:
python weka/classifiers.py \ -t /my/datasets/iris.arff \ -c last \ weka.classifiers.trees.J48 -C 0.3
Example on how to perform classes-to-clusters evaluation for SimpleKMeans
(with 3 clusters) using the iris UCI dataset:
python weka/clusterers.py \ -t /my/datasets/iris.arff \ -c last \ weka.clusterers.SimpleKMeans -N 3
You can perform attribute selection using BestFirst
as search algorithm and CfsSubsetEval
as evaluator as follows:
python weka/attribute_selection.py \ -i /my/datasets/iris.arff \ -x 5 \ -n 42 \ -s "weka.attributeSelection.BestFirst -D 1 -N 5" weka.attributeSelection.CfsSubsetEval \ -P 1 \ -E 1
Associators, like Apriori
, can be run like this:
python weka/associators.py \ -t /my/datasets/lung-cancer.arff \ weka.associations.Apriori -N 9 -I