Skip to content

Pipesom is a web-app with the purpose of giving basic data-processing services to people with little or no data-science and programming skills.

Notifications You must be signed in to change notification settings

V-for-Vaggelis/Pipesom

Repository files navigation

Pipesom

Pipesom is a web-app destined to make data-preprocessing easier. The target audience is scientists of several fields. The user uploads a csv file containing data of several variables of interest. By submitting the file, some graphs are displayed, to provide a basic understanding of the dataset.

Code dependencies

Python:

How to run

At the moment, the app is only available in developer mode, so some steps are required to use the app:

  1. Install python in your system. If you're just getting started, anaconda is recommended.
  2. Open a terminal that can run python, and install all the dependencies by running: pip install "dependency-name" for each one of them.
  3. Clone this repository: git clone https://github.com/V-for-Vaggelis/Pipesom.git, or download and unzip it if you're not familiar with git.
  4. On your terminal, navigate to the project's directory and run python app.py, then wait until a status of running indicating the host's port appears on the terminal. Hosting example
  5. Open your favorite browser and navigate to the hosting port, for example: localhost: 5000, the app should instantly appear. Browser example

How to use

Upload a csv file and submit it. After a few minutes you should get some plots back.

Warning: The app is very sensitive to to the input file's format, follow the examples in the "input-examples" directory to create a valid file. The file should follow the rules below:

  1. First row should be the names of the variables seperated by commas
  2. All the other rows should contain numeric values of those variables seperated by commas
  3. All the variables must have equal number of data, meaning that all columns for each row should be filled
  4. All missing values should be filled with naN

Interpret the results

There are two plots displayed:

  1. A correlation matrix, which just shows the linear relationships between the variables.
  2. The feature planes of each variable, after a self organizing map has been trained and adjusted to the data, where the values are normalized around the mean. Variables that exhibit similar behavior here (similar colors in same regions of the grids) can have strong non-linear relationships. If a variable has more than 70% of missing values it is not added in the analysis.

Ideas for improvement

  1. Make the app more user-friendly by improving the GUI.
  2. Give feature selection capability to the user via the trained SOM network.
  3. Give the user the ability to tune the SOM network's parameters (with proper guidance) to achieve better results.

About

Pipesom is a web-app with the purpose of giving basic data-processing services to people with little or no data-science and programming skills.

Resources

Stars

Watchers

Forks

Packages

No packages published