Pipesom is a web-app destined to make data-preprocessing easier. The target audience is scientists of several fields. The user uploads a csv file containing data of several variables of interest. By submitting the file, some graphs are displayed, to provide a basic understanding of the dataset.
Python:
At the moment, the app is only available in developer mode, so some steps are required to use the app:
- Install python in your system. If you're just getting started, anaconda is recommended.
- Open a terminal that can run python, and install all the dependencies by running:
pip install "dependency-name"
for each one of them. - Clone this repository:
git clone https://github.com/V-for-Vaggelis/Pipesom.git
, or download and unzip it if you're not familiar with git. - On your terminal, navigate to the project's directory and run
python app.py
, then wait until a status of running indicating the host's port appears on the terminal. - Open your favorite browser and navigate to the hosting port, for example:
localhost: 5000
, the app should instantly appear.
Upload a csv file and submit it. After a few minutes you should get some plots back.
Warning: The app is very sensitive to to the input file's format, follow the examples in the "input-examples" directory to create a valid file. The file should follow the rules below:
- First row should be the names of the variables seperated by commas
- All the other rows should contain numeric values of those variables seperated by commas
- All the variables must have equal number of data, meaning that all columns for each row should be filled
- All missing values should be filled with naN
There are two plots displayed:
- A correlation matrix, which just shows the linear relationships between the variables.
- The feature planes of each variable, after a self organizing map has been trained and adjusted to the data, where the values are normalized around the mean. Variables that exhibit similar behavior here (similar colors in same regions of the grids) can have strong non-linear relationships. If a variable has more than 70% of missing values it is not added in the analysis.
- Make the app more user-friendly by improving the GUI.
- Give feature selection capability to the user via the trained SOM network.
- Give the user the ability to tune the SOM network's parameters (with proper guidance) to achieve better results.