GitHub - UIUCSinhaLab/GEMSTAT_refinement_sampler: A tool to allow meaningful comparison of different refinement methods for GEMSTAT models.

Installation

After cloning the git repository, it is necessary to run the following commands:

git submodule init git submodule update

This loads parts of this project that come from other git repositories.

Experiments

An experiment is defined by a numerically named .bash file in REFINEMENT_SETTINGS, for example REFINEMENT_SETTINGS/1.bash

Refinement Methods

A refinement method is defined by an executable file in the METHODS subdirectory.

See METHODS/EXAMPLE for a bash script that parses the command-line parameters that are expected from a refinement method, you can put anything in METHODS and as long as it uses the correct command-line parameters and environment variables, producing the right output files, it should work.

The intention is that you will wrap calls to GEMSTAT with a bash script here.

Scoring Methods

A scoring method is defined by an executable in SCORING whose STDOUT will be just one floating-point score with no newline.

See SCORING/SSE for an example, there are several command-line parameters that a scoring method should be sensitive to. I should explain them

--data [DIRECTORY] A data directory containing the training data as it was input when the method was trained
--parfile [FILENAME] The parfile whence refinement began (to be symmetric to the refinement method command-line parameters)
--parout [FILENAME] The parfile after refinement. (Could be used if you want to score something about the properties of the par files, such as distance from known true parameters...)
--out [FILENAME] The output from cross-validation. Currently that output might contain the ground-truth, but ideally, you should get the ground-truth from the --data option, since in the future we might give no data at cross-validation time, to prevent methods from somehow cheating.

Datasets

TODO: Mention the subdirectory structure for storing a dataset

A dataset directory must contain the following:

template.par (optional, but used if you want to randomly generate starting points on-the-fly.)
base/ (required by the system)
- seqs.fa (assumed by gemstat)
- whatever_else
ORTHO/
- whatever_your_training_ortholog_name_was
  - whatever files you want to have overwrite the base dataset
- another_ortholog
  - whatever files are specific to this ortholog

Defining Ensembles

Ensembles of starting-points are stored in the ENSEMBLES directory.

An ensemble definition make take one of three forms:

A randomly generated ensemble from a template.par file in the dataset. This will be expected if your REFINEMMENT_SETTING file does not specify an "ENSEMBLE_NAME" environment variable.
A fixed ensemble from an ASCII format table of values, substituted into a template.par file. If the file ENSEMBLES/${ENSEMBLE_NAME} is a regular file, that file is assumed to be a table of values which will be used this way.
A directory of .par files. If the file ENSEMBLES/${ENSEMBLE_NAME} is a directory, it is assumed to contain files named 1.par 2.par ... etc.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
ENSEMBLES		ENSEMBLES
ENSEMBLE_REFINE		ENSEMBLE_REFINE
GEM		GEM
METHODS		METHODS
REFINEMENT_SETTINGS		REFINEMENT_SETTINGS
SCHED		SCHED
SCORING		SCORING
bin		bin
data_for_experiments		data_for_experiments
lib		lib
.gitignore		.gitignore
.gitmodules		.gitmodules
INSTALL.md		INSTALL.md
README.md		README.md
requirements.txt		requirements.txt

UIUCSinhaLab/GEMSTAT_refinement_sampler

Folders and files

Latest commit

History

Repository files navigation

Installation

Experiments

Refinement Methods

Scoring Methods

Datasets

Defining Ensembles

About

Resources

Stars

Watchers

Forks

Languages