Skip to content

RahmanTeam/OpEx

Repository files navigation

1 MINIMUM SYSTEM REQUIREMENTS
==============================

OpEx runs on Linux. It requires Python 2.7.3 or later (< Python 3) with Numpy version 1.11.0 installed and Java 1.6. At least 3 Gb of memory is required, and 8 Gb is recommended. In order to make use of the optional multithreading feature, OpEx requires a multicore CPU environment.


2 INSTALLATION GUIDE
=====================

OpEx can be downloaded from https://github.com/RahmanTeam/OpEx/releases

To install OpEx, unpack the tgz file and run the installation script (install.py) in the opex-v1.0.0 directory (see details below).


2.1 What will installation do?
-------------------------------

The installation script will perform the following steps:
- Download all required components of the pipeline (i.e. BWA, Stampy, Picard, Platypus, and CAVA)
- Build all required components
- Index the reference genome (if given) by BWA and Stampy
- Generate the necessary default configuration files


2.2 Full Installation
----------------------

In order to set up the pipeline correctly, we recommend running Full Installation. In Full Installation, the GRCh37 reference genome file has to be provided when running the installation script. The reference genome file will be automatically indexed by BWA and Stampy upon installation and therefore it can take a while (approx. 2-3 hours). There is also a Quick Installation option (Section 2.3).

Go into the opex-v1.0.0 folder and run:
./install.py -r /path/to/reference/human_g1k_v37.fasta

where human_g1k_v37.fasta is the file of the GRCh37 reference genome sequence which (together with the corresponding .fai file) can be downloaded from the 1000G website: 
- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.fai

Note that OpEx expects the unzipped .fasta file, and the .fai file will also need to be in the same folder as the .fasta file. Once the installation script has finished, OpEx is ready for use.


2.3 Quick Installation
-----------------------

In Quick Installation one is not required to provide the reference genome, instead a path pointing to an existing genome installation can be set manually or the reference can be supplied upon the first run.

Go into the opex-v1.0.0 folder and run:
./install.py

Once the installation script has finished, OpEx is ready for use. However, the GRCh37 reference genome must be set manually or supplied upon first run.


3 TESTING INSTALLATION
=======================

A test dataset is included with the package to confirm OpEx is installed correctly.

The test dataset consists of:

- Input test files: Two gzipped FASTQ files (test_R1.fastq.gz, test_R2.fastq.gz) containing 372 read pairs mapping to three exons of BRCA2 and a BED file (test.bed) containing the coding exons of BRCA2 in hg19 genomic coordinates.

- Expected test output files: Eleven output files generated by a correct installation of OpEx. Four files (the bash script file, the log file, the Picard metrics file, and the Platypus log file) are not included as these are dependent on the date, time, and system and are thus not informative as a test of successful installation.

The test dataset and the expected outputs are found in the test/ and test/output/ directories, respectively.

To test the installation, go into the opex-v1.0.0 folder and run:
./test_installation.py

Due to the small size of the test dataset, the test script typically finishes in less than a minute. 

If the test script reports "OpEx is correctly installed", the pipeline was successfully run from beginning to end on the test dataset, the outputs agree with the expected outputs, and resulting outputs are removed. If the test of the installation is not successful, the script reports "OpEx is not installed correctly" and the resulting outputs are placed in a folder called _testinstall for manual inspection.