python-weka-wrapper

Python wrapper for the Java machine learning workbench Weka using the javabridge library.

Requirements:

Python 2.7 (does not work with Python 3)
javabridge (>= 1.0.11)
matplotlib (optional)
pygraphviz (optional)
PIL (optional)
Oracle JDK 1.6+

Uses:

Weka (3.7.12)

Installation

Detailed instructions and links to videos on installing the library are located here.

Please note, that you need a build environment to compile some libraries from source.

Forum

You can post questions, patches or enhancement requests in the following Google Group:

https://groups.google.com/forum/#!forum/python-weka-wrapper

Code examples

See python-weka-wrapper-examples repository for example code on the various APIs. Also, check out the sphinx documentation in the doc directory. You can generate HTML documentation using the make html command in the doc directory.

Available online documentation:

Command-line examples

Below are some examples of command-line use of the library. You can find these also on PyPi.

Data generators

Artifical data can be generated using one of Weka's data generators, e.g., the Agrawal classification generator:

python weka/datagenerators.py \
    weka.datagenerators.classifiers.classification.Agrawal \
    -o /tmp/out.arff

Filters

Filtering a single ARFF dataset, removing the last attribute using the Remove filter:

python weka/filters.py \
    -i /my/datasets/iris.arff \
    -o /tmp/out.arff \
    -c last \
    weka.filters.unsupervised.attribute.Remove \
    -R last

Classifiers

Example on how to cross-validate a J48 classifier (with confidence factor 0.3) on the iris UCI dataset:

python weka/classifiers.py \
    -t /my/datasets/iris.arff \
    -c last \
    weka.classifiers.trees.J48
    -C 0.3

Clusterers

Example on how to perform classes-to-clusters evaluation for SimpleKMeans (with 3 clusters) using the iris UCI dataset:

python weka/clusterers.py \
    -t /my/datasets/iris.arff \
    -c last \
    weka.clusterers.SimpleKMeans
    -N 3

Attribute selection

You can perform attribute selection using BestFirst as search algorithm and CfsSubsetEval as evaluator as follows:

python weka/attribute_selection.py \
    -i /my/datasets/iris.arff \
    -x 5 \
    -n 42 \
    -s "weka.attributeSelection.BestFirst -D 1 -N 5"
    weka.attributeSelection.CfsSubsetEval \
    -P 1 \
    -E 1

Associator

Associators, like Apriori, can be run like this:

python weka/associators.py \
    -t /my/datasets/lung-cancer.arff \
    weka.associations.Apriori -N 9 -I

Name		Name	Last commit message	Last commit date
Latest commit History 563 Commits
doc		doc
java		java
media/channel		media/channel
python		python
tests		tests
.gitignore		.gitignore
CHANGES.rst		CHANGES.rst
DESCRIPTION.rst		DESCRIPTION.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE.md		RELEASE.md
setup.cfg		setup.cfg
setup.py		setup.py

License

shrurastogi/python-weka-wrapper

Folders and files

Latest commit

History

Repository files navigation

python-weka-wrapper

Installation

Forum

Code examples

Command-line examples

Data generators

Filters

Classifiers

Clusterers

Attribute selection

Associator

About

Resources

License

Stars

Watchers

Forks

Languages