Skip to content

Generates test files for the SOMToolbox consisting of clusters with different properties

License

Notifications You must be signed in to change notification settings

rieder91/som-cluster-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOMToolbox Dataset Generator - Clusters

Built With

Purpose

This script generates datasets to be used with the SOMToolbox. The focus is on clusters. The following settings can be configured:

  • name of the output files
  • number of dimensions
  • number of clusters
  • for each cluster:
  • distribution type
  • cardinality
  • density

Both the cardinality and density range from 1 (low) to 10 (high). The following distribution types are supported:

  • Binomial
  • Exponential
  • Gaussian
  • Pareto
  • Poisson

If the program is run with <= 2 dimensions it will also display a plot of the data.

The resulting vector and template vector files can be used by the SOMToolbox to train a SOM. The goal is to analyze the resulting maps and visualize how different cluster configuration can be detected.

Running the program

First you'll want to install the dependencies. Afterwards you can run the program itself:

pip install -r requirements.txt
python generator.py <path-to-input-file>

In case the installation of the dependencies doesn't work out-of-the-box, they can also be installed manually:

The input file is a JSON file. The format should be self-explanatory given an example:

{
  "dimensions": 2,
  "export_name": "example1",
  "clusters": [
    {
      "type": "binomial",
      "cardinality": "4",
      "density": "8"
    },
    {
      "type": "exponential",
      "cardinality": "5",
      "density": "7"
    },
    {
      "type": "pareto",
      "cardinality": "9",
      "density": "6"
    },
    {
      "type": "poisson",
      "cardinality": "9",
      "density": "2"
    },
    {
      "type": "gaussian",
      "cardinality": "3",
      "density": "10"
    }
  ]
}

The results are written to the output folder with the name configured in the input file.

About

Generates test files for the SOMToolbox consisting of clusters with different properties

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published