GitHub - abulovic/SuperExonRetriver2000: Super complicated bioinformatic file juggling and management system

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
ExoLocator		ExoLocator
.gitignore		.gitignore
README		README
cfg_example.tar.gz		cfg_example.tar.gz
prot_output.txt		prot_output.txt
species.txt		species.txt

Repository files navigation

# Installation Instructions

Downloading the software is unfortunately not enough. There are some applications which you must have installed and even after that, you need to configure the application. Let's try and minimize the effort, shall we?

## The required software
In order for this application to work, you need to have the following software installed:
 - standard Python distribution and BioPython module (working version was 1.59*)
 - blastall tools
 - SW# tool for Smith-Waterman alignment on graphic cards (https://github.com/mkorpar/swSharp)
 - mafft alignment tool

In order for the application to work, you need to have a local Ensembl mirror. You can download such a mirror from the Ensembl FTP website (http://www.ensembl.org/info/data/ftp/index.html). 


## The required configuration files
The configuration files are located in the Exolocator/cfg directory.

The required files are:
 - command_line_tools.cfg
 - directory_tree.cfg
 - logging.cfg
 - referenced_species_mapping.txt
 - status_file_keys.txt

The last two files you can leave as they are. 


### command line tools configuration file
Example of the command line tools configuration file is:

    [blast]
    expectation = 1.e-2
    blastp = blastall -p blastp -e %s -m 7
    blastn = blastall -p blastn -e %s -m 7
    tblastn = blastall -p tblastn -e %s -m 7
    
    [wise]
    wise = genewise
    flags = -genes -silent
    
    [sw#]
    sw# = /home/john_doe/.../swSharp/sw#
    
    [mafft]
    mafft = mafft --localpair --maxiterate 1000
    
    [local_ensembl]
    ensembldb = /home/john_doe/mnt/release-67/fasta/	
    expansion = 150000
    masked = 0

### directory tree configuration file

Here is the example of what the directory_tree.cfg file should look like.

    [root]
    project_dir = /home/john_doe/SuperExonRetriever2000/ExoLocator
    session_dir = /home/john_doe/results/
    
    [input]
    protein_list = /home/john_doe/proteins.txt
    failed_proteins = /home/john_doe/failed_proteins.txt
    protein_description = /home/john_doe/protein_descr.txt
    
    [sequence]
    root = sequence
    gene = gene
    exp_gene = expanded_gene
    protein = protein
    exon_ens = exon/ensembl
    exon_wise = exon/genewise
    assembled_protein = assembled_protein
    
    [statistics]
    statistics = statistics
    
    [alignment]
    root = alignment
    blastn = blastn
    tblastn = tblastn
    SW_gene = SW/gene
    SW_exon = SW/exon
    mafft = mafft
   
    [annotation]
    root = annotation
    wise = genewise
    
    [log]
    root = log
    mutual_best = mutual_best_log
    status_file = .status
    
    [database]
    db = exon_database
 
    [machine]
    computer = donkey
    
    [data_retrieval]
    biomart_perl_script = /home/john_doe/SuperExonRetriever2000/ExoLocator/pipeline/data_retrieval/BioMartRemoteAccess.pl

In the directory tree configuration file you set 
- the root directory of the application (`root / project dir`)
- the directory for your results (`root / session_dir`)
- list of proteins (there is an example list in the application, `input / protein_list`)
- directory structure.

There is really no need to change the directory structure, so the only three things you do need to change are:
 - the directory that will contain your results, 
 - the protein list file path and 
 - the path to the BioMart script.

Regarding the version of BioPython you have installed: the problem that arose was the reading / writing the fasta files. This is (very clumsily) configured by changing the `computer / machine` from donkey to anab. I do apologize for the lack of intuitivity regarding this option.
If it doesn't work with the new versions even if you toggle this option, then the place to look is the `utilities / FileUtilities.py` script and methods for reading the fasta files. (load_fasta_single_record, write_seq_records_to_file, read_seq_records_from_file).

### logging.cfg
There is an example of this file in the cfg directory.
You only need to change the paths to the output logging files.