User Guide

PennAI is a platform to help researchers leverage supervised machine learning techniques to analyze data without needing an extensive data science background, and can also assist more experienced users with tasks such as choosing appropriate models for data. Users interact with PennAI via a web interface that allows them to execute machine learning experiments and explore generated models, and has an AI recommendation engine that will automatically choose appropriate models and parameters. Dataset profiles are generated and added to a knowledgebase as experiments are run, and the recommendation engine learns from this to give more informed recommendations as it is used. This allows the AI recommender to become tailored to specific data environments. PennAI comes with an initial knowledgebase generated from the PMLB benchmark suite.

Installation

PennAI is a multi-container docker project that uses (Docker-Compose).

Requirements

Docker
- Most recent stable release, minimum version is 17.06.0
  - Official Docker Website Getting Started
  - Official Docker Installation for Windows
- Runtime Memory: We recommend docker to be configured with at least 6GB of runtime memory (Mac configuration, Windows configuration). By default, docker starts with 2G runtime memory.
Docker-Compose (Version 1.22.0 or greater, Linux only) - Separate installation is only needed for linux, docker-compose is bundled with windows and mac docker installations
- Linux Docker-Compose Installation

Installation

Download the production zip from the latest release
Unzip the archive
From the command line, navigate to the pennai directory and load the images into docker with the following commands:

docker load --input .\images\pennai_lab.tar
docker load --input .\images\pennai_machine.tar
docker load --input .\images\pennai_dbmongo.tar

Using PennAI

Starting and Stopping

To start PennAI, from the PennAI directory run the command docker-compose up. To stop PennAI, kill the process with ctrl+c and wait for the server to shut down.

To reset the datasets and experiments in the server, start PennAI with the command docker-compose up --force-recreate or run the command docker-compose down after the server has stopped.

User Interface

Once the webserver is up, connect to http://localhost:5080/ to access the website. You should see the Datasets page with the datasets in the data/datasets/user directory.

Adding Datasets

One can add new datasets using a UI form within the website or manually add new datasets to the data directory. Datasets have the following restrictions:

Datasets must have the extension .csv or .tsv
Datasets cannot have any null or empty values
Dataset features must be either numeric, categorical, or ordinal.
Only the label column or categorical or ordinal features can contain string values.

Uploading Using the Website

To upload new datasets from the website, click the "Add new Datasets" button on the Datasets page to navigate to the upload form. Select a file using the form's file browser and enter the corresponding information about the dataset: the name of the dependent column, a JSON of key/value pairs of ordinal features, for example {"ord" : ["first", "second", "third"]}, and a comma separated list of categorical column names without quotes, such as cat1, cat2. Once uploaded, the dataset should be available to use within the system.

Adding Initial Datasets to the Data Directory

Labeled datasets can also be loaded when PennAI starts by adding them to the data/datasets/user directory. PennAI must be restarted if new datasets are added while it is running. If errors are encountered when validating a dataset, they will appear in a log file in target/logs/loadInitialDatasets.log and that dataset will not be uploaded. Data can be placed in subfolders in this directory.

By default, the column with the label should be named 'class'. If the labeled column has a different name or if the dataset has categorical or orinal features, this can be specified in a json configuration file. The coresponding configuration file must be in the same directory as the dataset. If the file is named myDatafile.*sv, the configuration file must be named myDatafile_metadata.json

Example configuration file:

{
	"target_column":"my_custom_target_column_name",
	"categorical_features" : ["cat1", "cat2"],
	"ordinal_features" : {"ord" : ["first", "second", "third"]}
}

Analyzing Data

To run a classification machine learning experiment, from the click 'Build New Experiment', choose the desired algorithm and experiment parameters and click 'Launch Experiment'. To start the AI, from the Datasets page click the AI toggle. The AI will start issuing experiments according to the parameters in config/ai.config.

From the Datasets page, click 'completed experiments' to navigate to the Experiments page for that dataset filtered for the completed experiments. If an experiment completed successfully, use the 'Actions' dropdown to download the fitted model for that experiment and a python script that can be used to run the model on other datasets. Click elsewhere on the row to navigate to the experiment Results page.

Downloading and Using Models

A pickled version of the fitted model and an example script for using that model can be downloded for any completed experiment from the Experiments page.

Please see the jupiter notebook script demo for instructions on using the scripts and model exported from PennAI to reproduce the findings on the results page and classify new datasets.

Developer Docs

The developer guide is available here

Name		Name	Last commit message	Last commit date
Latest commit History 3,637 Commits
ai		ai
config		config
data		data
dockers		dockers
docs		docs
lab		lab
machine		machine
metalearning		metalearning
mock_experiment		mock_experiment
tests		tests
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
.noseids		.noseids
.travis.yml		.travis.yml
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
docker-compose-doc-builder.yml		docker-compose-doc-builder.yml
docker-compose-int-test.yml		docker-compose-int-test.yml
docker-compose-multi-machine.yml		docker-compose-multi-machine.yml
docker-compose-production.yml		docker-compose-production.yml
docker-compose-unit-test.yml		docker-compose-unit-test.yml
docker-compose.yml		docker-compose.yml
setup.py		setup.py

License

weklica/pennai

Folders and files

Latest commit

History

Repository files navigation

User Guide

Installation

Requirements

Installation

Using PennAI

Starting and Stopping

User Interface

Adding Datasets

Uploading Using the Website

Adding Initial Datasets to the Data Directory

Analyzing Data

Downloading and Using Models

Developer Docs

About

Resources

License

Stars

Watchers

Forks

Languages