iSOM_GSN

An Integrative Approach for Transforming Multi-omic Data into Gene Similarity Networks via Self-organizing Maps

To replicate the algorithm use the following steps:

Step 1 (1_PreProcessing.py)- Process the data using pre-processing template provided. To use this template;create three different files of each type of omic data; for a set of samples(Patient id) and a set of common genes. Replace the patient id's with patient_list and gene_list. As a result of the preprocessing script the result will be a merge file which has all three omic data for a set of patients/samples

Step 2 (2_CreateTemplate.py)- Once the data is pre-processed proceed with the creation of template file; we conside only gene expression data for this step. replace the path of the file at "pathGE" variable. As a result of this script we get coordinates or mapping of genes. i.e., (x,y) coordinates for each gene; make a note of this to use in next step.

Step 3 (3_CreateRGBImages.py)- replace the mappings or cordinates obtained in previous step with "pos" variable in this script. replace the path variable "pathGE" with the file generated at the first step. At the end of this script you have an image for each patient.

Step 4 (4_MoveFilesRandomly.py)- This file is optional; this is used to randomly divide the images into test and training set

Step 5 (5_Apply_cnn-metrics.py)- Use this file to run CNN algorithm on the images generated in the previous step; replace the path of reflect the actual paths.

Input Files: This algorithm considers only 3 types of Omic data, Initially the data from cbioPortal is process and formatted into 3 files. These files are gene_expression.csv, cna_norm.csv and dna_methylation.csv. After applying Step 1 a merged file with all three omics of the shortlisted genes is generated. This sample file for 20 genes is attached which is called Mergefile_top20.csv. This file is used as input for step 2 and so on.

Note: To include more than three omic data the code needs modification. The pre-processing file is created to process the data of formats mentioned in gene_expression.csv, cna_norm.csv and dna_methylation.csv. If the format is changed then the pre-processing file need to be adjusted accordingly.

SAMPLE RUN INSTRUCTIONS

To run code for sample data follow below steps :

1- Create a new directory in you local system for example : Test

2- Place Mergefile_top20.csv and MoCSOM.py files in that folder

3- Make sure python3 is installed and directory classpaths are set accordingly

4- Run MoCSOM.py from that folder

On successfull run the script should create 2 folders under your test directory named Training and Test and the console should show the epochs progressed. Capture.png shows output of the ideal run.

PS: This code works only on Python3.6 as Keras library is not compatible with 3.7+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1_PreProcessing.py

1_PreProcessing.py

2_CreateTemplate.py

2_CreateTemplate.py

3_CreateRGBImages.py

3_CreateRGBImages.py

4_MoveFilesRandomly.py

4_MoveFilesRandomly.py

5_Apply_cnn-metrics.py

5_Apply_cnn-metrics.py

Capture.PNG

Capture.PNG

Mergefile_top20.csv

Mergefile_top20.csv

MoCSOM.py

MoCSOM.py

README.md

README.md

Repository files navigation

iSOM_GSN

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
1_PreProcessing.py		1_PreProcessing.py
2_CreateTemplate.py		2_CreateTemplate.py
3_CreateRGBImages.py		3_CreateRGBImages.py
4_MoveFilesRandomly.py		4_MoveFilesRandomly.py
5_Apply_cnn-metrics.py		5_Apply_cnn-metrics.py
Capture.PNG		Capture.PNG
Mergefile_top20.csv		Mergefile_top20.csv
MoCSOM.py		MoCSOM.py
README.md		README.md

NaziaFatima/iSOM_GSN

Folders and files

Latest commit

History

Repository files navigation

iSOM_GSN

About

Resources

Stars

Watchers

Forks

Languages