GitHub - gatagat/stage-propagation: Label propagation approach to Drosophila embryo stage annotation

#Label propagation pipeline The pipeline consists of two blocks, a classification based on single samples, and a label propagation, which uses a similarity between the samples to denoise the labels assigned by the classifier. So the classifier provides unaries whereas the similarities provide the pairwise terms.

Requirements

Install Python packages listed in python.requirements.
COIN LP (clp) from http://www.coin-or.org/projects/Clp.xml

Data files

List file and truth file

Input data is specified in listfiles as CSV tables which have at least one column called ID. Minimal file should look like this:

ID
001
002
003
...

Meta information can be contained inside a header commented by # signs and encoded in YAML format. The interpretation of the individual CSV columns is entirely up to the different scripts, so a file with one image per sample can look like this:

#path_prefix=/an/optional/path/prefix
ID	path
001	somedir/file001.png
002 dir2/file002.png
003 anotherdir/file003.png

The file can contain any number of other columns. A truth file is just a list file with an additional integer column identified by truth item in meta data. Labels are stored in meta data as well:

#basedir=/an/optional/path/prefix
#truth_labels={ 1: 'classA', 2: 'classB' }
#truth=class
ID	path	class
001	somedir/file001.png	1
002 dir2/file002.png	2
003 anotherdir/file003.png	1

Feature file and weights file

A feature file is a CSV table of the following form:

#methodname=compute_features_method
#argsfile=argsfile
#additional_optional_data1=...(eg. a dictionary filename)
ID	f1	f2	f3	...
001 2.	.4	-1	...
002	3.  .5  .1	...
003 2.	.7	-2	...

Column ID should match with the corresponding list file, all other columns are considered to be features.

A weights file is a feature file with a square matrix of weights saved as features and indexed by sample IDs:

#methodname=...
ID	001	002	003	...
001 2.	.4	-1	...
002	3.  .5  .1	...
003 2.	.7	-2	...

Classifier file

A pickled (serialized) bzip2-compressed Python dictionary with the following entries:

features = { 'methodname': ..., 'argsfile': ... }
meta = ...
truth = ...
classifier = classifier_object

Prediction file

A prediction file is a truth file with additional columns:

pred containing the predicted class, and
probN containing the posterior probability of the class N.

Propagator file

A pickled (serialized) bzip2-compressed Python dictionary with the following entries:

propagator = { propagator_params... }
meta = ...

Classification

features.py - computes features from input data
classifier_train.py - trains a classifier
classifier_predict.py - predicts labels using an existing classifier file

features.py

Computes features for all the input data. It contains a function compute_features(method_name, method_args, data) which based on method_name calls the specific method for each sample in data:

features = []
for sample in data:
	features += method_table[method_name](sample, cache=cache, **method_args)
return features, feature_names

Command-line interface:

features.py -m method_name -a argsfile -l listfile

This runs feature extraction and saves output features for data listed specified in the listfiles into features.csv.

classifier_train.py

Runs feature extraction, selects optimal hyper-parameters by cross-validation, learns the final model from all the data, and finally saves the model. It contains a function train_classifier(features, truth). Prints out training evaluation.

Command-line interface:

train.py -m svm -t truthfile -f featurefile

where truth file is a listfile with columns ID, path, and truth.

It creates a file classifier.dat.

classifier_predict.py

Loads an existing classifier and applies it to the data. Outputs predictions.

Command-line interface:

predict.py -l listfile -m classifierfile

It creates a prediction file pred.csv.

Label propagation

dissimilarities.py - computes dissimilarities of sample pairs
propagate_train.py - finds optimal hyper-parameters of label propagation
propagate_predict.py - propagates labels from a classifier using weights
evaluate.py - evaluates propagated labels using ground truth

weights.py

Evaluates mutual similarities between all the sample pairs. Defines a function compute_dissimilarities(method_name, method_args, data).

Command-line interface:

dissimilarities.py -m method_name -a argsfile -l listfile

It creates a weights file dissim.csv.

A threshold item in the arguments files defines how are the similarities thresholded. These similarities are used to setup edge weights for the label propagation, so thresholding makes the used graph sparse.

propagate_train.py

Searches for optimal hyper-parameters defined in an arguments file. This having at least partial truth for a set of listfiles, predictions from an already learned classifier on the whole listfiles, and dissimilarities for all the listfiles. Defines a function train(method_name, method_args, data).

Command-line interface:

propagate-train.py -m method_name -p predfile1 predfile2 ... -d dissimfile1 dissimfile2 ... -t truthfile1 truthfile2 ...

Arguments file can look like this:

hyper_params: [ unary_weight, bandwidth ]
scoring: accuracy
unary_weight: [ 0.25, 0.5, 1., 2., 4., 8. ]
bandwidth: [ .5, 1., 5., 10., 50. ]

propagate_predict.py

Propagates labels according to predictions for different samples and according to the edge weights computed by dissimilarities.py. Defines a function propagate(method_name, method_args, predictions, dissim).

Command-line interface:

propagate.py -m model -p predictionfile -d dissimfile

It creates a prediction file prop.csv.

evaluate.py

Evaluates the provided labels against the truthfile.

Command-line interface:

test.py -t truthfile -p predictionfile

It creates various output SVG, and HTML files.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
experiments		experiments
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
_bw.py		_bw.py
_chaincode.py		_chaincode.py
_distances.py		_distances.py
add_bad_outline.py		add_bad_outline.py
annotators.py		annotators.py
biblio.txt		biblio.txt
classifier_predict.py		classifier_predict.py
classifier_train.py		classifier_train.py
colormaps.py		colormaps.py
convert.py		convert.py
copy_images.py		copy_images.py
dissimilarities.py		dissimilarities.py
dissimilarities_expression.py		dissimilarities_expression.py
dissimilarities_thresh.py		dissimilarities_thresh.py
evaluate.py		evaluate.py
features.py		features.py
features_chaincode.py		features_chaincode.py
generate.py		generate.py
harmonic_function.py		harmonic_function.py
links		links
notes.txt		notes.txt
plot_propagator_cv_results.py		plot_propagator_cv_results.py
plot_synthetic.py		plot_synthetic.py
plot_weights.py		plot_weights.py
propagate_predict.py		propagate_predict.py
propagate_train.py		propagate_train.py
python.requirements		python.requirements
semisupervised.py		semisupervised.py
split_truth.sh		split_truth.sh
subsample.py		subsample.py
tsh.py		tsh.py
utils.py		utils.py
vizu_knn.py		vizu_knn.py

gatagat/stage-propagation

Folders and files

Latest commit

History

Repository files navigation

Requirements

Data files

List file and truth file

Feature file and weights file

Classifier file

Prediction file

Propagator file

Classification

features.py

classifier_train.py

classifier_predict.py

Label propagation

weights.py

propagate_train.py

propagate_predict.py

evaluate.py

About

Resources

Stars

Watchers

Forks

Languages