Skip to content

shishirdash/python-weka-wrapper

 
 

Repository files navigation

python-weka-wrapper

Python wrapper for the Java machine learning workbench Weka using the javabridge library.

Requirements:

  • Python 2.7 (does not work with Python 3)
  • javabridge (>= 1.0.11)
  • matplotlib (optional)
  • pygraphviz (optional)
  • PIL (optional)
  • Oracle JDK 1.6+

Uses:

  • Weka (3.7.12)

Installation

Detailed instructions and links to videos on installing the library are located here.

Please note, that you need a build environment to compile some libraries from source.

Forum

You can post questions, patches or enhancement requests in the following Google Group:

https://groups.google.com/forum/#!forum/python-weka-wrapper

Code examples

See python-weka-wrapper-examples repository for example code on the various APIs. Also, check out the sphinx documentation in the doc directory. You can generate HTML documentation using the make html command in the doc directory.

Available online documentation:

Command-line examples

Below are some examples of command-line use of the library. You can find these also on PyPi.

Data generators

Artifical data can be generated using one of Weka's data generators, e.g., the Agrawal classification generator:

python weka/datagenerators.py \
    weka.datagenerators.classifiers.classification.Agrawal \
    -o /tmp/out.arff

Filters

Filtering a single ARFF dataset, removing the last attribute using the Remove filter:

python weka/filters.py \
    -i /my/datasets/iris.arff \
    -o /tmp/out.arff \
    -c last \
    weka.filters.unsupervised.attribute.Remove \
    -R last

Classifiers

Example on how to cross-validate a J48 classifier (with confidence factor 0.3) on the iris UCI dataset:

python weka/classifiers.py \
    -t /my/datasets/iris.arff \
    -c last \
    weka.classifiers.trees.J48
    -C 0.3

Clusterers

Example on how to perform classes-to-clusters evaluation for SimpleKMeans (with 3 clusters) using the iris UCI dataset:

python weka/clusterers.py \
    -t /my/datasets/iris.arff \
    -c last \
    weka.clusterers.SimpleKMeans
    -N 3

Attribute selection

You can perform attribute selection using BestFirst as search algorithm and CfsSubsetEval as evaluator as follows:

python weka/attribute_selection.py \
    -i /my/datasets/iris.arff \
    -x 5 \
    -n 42 \
    -s "weka.attributeSelection.BestFirst -D 1 -N 5"
    weka.attributeSelection.CfsSubsetEval \
    -P 1 \
    -E 1

Associator

Associators, like Apriori, can be run like this:

python weka/associators.py \
    -t /my/datasets/lung-cancer.arff \
    weka.associations.Apriori -N 9 -I

About

Python wrapper for Weka using javabridge.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.8%
  • Java 1.1%
  • Makefile 0.1%