network_analysis

network analysis of correlated mutations in proteins

code status

in active development

performs network analysis (based on "maximal information coefficient") of correlated mutations in proteins.

download strains from fludb.org
run split_fasta.py to split the strain file to individual (protein) fasta files
align using MUSCLE/MAFFT
run deduplication script - duplicate_remover.py
run graph_analysis_2p.py or graph_analysis_1p.py for computing MICs and creating CSV files with node and edge data for a given dataset
run load_graph_db_from_csv.py to load neo4j from CSV files
run create_networkx_graph_from_csv.py to create a graphml from CSV files
import graphml into Gephi to create visualizations and perform analysis
run create_protein_graph.py to create macro/protein level graphs and visualizations from graphml
run create_degree_clustering_plots.py to create degree/clustering plots from graphml
run create_plots_from_neo4j.py to create node and edge count plots from data loaded in neo4j
run cooccurence_counts.py to perform residue coocurrence analysis
run create_mic_histogram.py to create a histogram of MICs
run in_out_comparision.py to perform entropy and solvent accessibility comparision of in-network and out-of-network residues

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
src		src
README.md		README.md