Evaluation of Semantic Similarity Measures

This repository includes different scripts in python and groovy used in the project of evaluating semantic similarity measures on their sensitivity to annotation size and difference

Scripts for data generation

annotations.py - This script is used for reformat the database annotations. It uses gene_association files GAF version 2.0 and creates the annotations file where each line represent gene and its annotations separated by tabs.
gen_annotations.py - This script generates random annotations of the same size as in files generated by annotations.py script.

Scripts for computing similarity measures

Sim.groovy, SimPairwise.groovy - This groovy scripts are used for computing groupwise and pairwise similarity measures for the given annotations file. Requires Gene Ontology file in OBO Format. Outputs a file with similarity values for each entry with all the other entries.
SimHP.groovy, SimHPPairwise.groovy - This groovy scripts are used for computing groupwise and pairwise similarity measures for the given annotations file. Requires Human Phenotype Ontology file in OBO Format. Outputs a file with similarity values for each entry with all the other entries.
SimGDPairwise.groovy - This groovy script is used for computingpairwise similarity measures for the between genes and disease annotations. Requires Human Phenotype Ontology file in OBO Format. Outputs a file with similarity values for each gene with all the diseases.

Scripts for evaluating the similarities

correlation.py - This script is used for computing Spearman and Pearson correlations between similarity values and annotations size. Requires the annotations file and file with similarity values.
interactions.py - This script is used for computing ROC AUC for protein-protein interaction predictions. We use similarity values as predictions score and BioGrid interaction data as our test data. BioGRID Tab 2.0 formatted files are used.
gene_disease.py - This script is used for evaluating similarity measures on gene-disease association predictions.

Scripts for generating plots

plot_figures.py, plot_figures_pairwise.py - This scripts are used to generate plots from similarity measures values. Requires annotations file and similarity values.

Repository also includes shell scripts for running some scripts for multiple files.

Generated similarity values and plots can be found here: http://www.cbrc.kaust.edu.sa/onto/sim-eval/

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
libs		libs
.gitignore		.gitignore
Convert.java		Convert.java
FunctionSim.groovy		FunctionSim.groovy
LICENSE		LICENSE
README.md		README.md
Sim.groovy		Sim.groovy
Sim.java		Sim.java
SimDOPairwise.groovy		SimDOPairwise.groovy
SimDeepGOPairwise.groovy		SimDeepGOPairwise.groovy
SimGDPairwise.groovy		SimGDPairwise.groovy
SimHP.groovy		SimHP.groovy
SimHPPairwise.groovy		SimHPPairwise.groovy
SimPairwise.groovy		SimPairwise.groovy
annotations.py		annotations.py
convertall.sh		convertall.sh
corrall.sh		corrall.sh
correlation.py		correlation.py
correlation_sgd.py		correlation_sgd.py
data.py		data.py
filter_gaf.py		filter_gaf.py
gen_annotations.py		gen_annotations.py
gen_table.py		gen_table.py
gene_disease.py		gene_disease.py
interactions.py		interactions.py
plot.py		plot.py
plot_avg_sim.py		plot_avg_sim.py
plot_figures.py		plot_figures.py
plot_figures_pairwise.py		plot_figures_pairwise.py
plotall.sh		plotall.sh
plotall_figures.sh		plotall_figures.sh
plotall_figures_pairwise.sh		plotall_figures_pairwise.sh
pom.xml		pom.xml
rename.py		rename.py
runall.sh		runall.sh
summary.sh		summary.sh
tableall.sh		tableall.sh
test.py		test.py
test_new.py		test_new.py
testall.sh		testall.sh
utils.py		utils.py

License

bio-ontology-research-group/pgsim

Folders and files

Latest commit

History

Repository files navigation

Evaluation of Semantic Similarity Measures

Scripts for data generation

Scripts for computing similarity measures

Scripts for evaluating the similarities

Scripts for generating plots

About

Resources

License

Stars

Watchers

Forks

Languages