Required python version:
- python >= 3.6
Sevral external software were depended for rapvis:
- trimmomatic
- STAR
- hisat2
- stringtie
- bwa
- samtools
- featureCounts
- pandas >= 1.1.2
- numpy
- matplotlib
- seaborn
- GSEApy
- rpy2
# Clone remote repository
$ git clone https://github.com/liuwell/rapvis.git
# Install required python pacakge
$ cd rapvis
$ pip install -r requirements.txt
# Add execution path
# The path of current dir can get by shell command "pwd"
$ echo "export PATH=$PATH:current_dir/rapvis" >> ~/.bashrc
$ source ~/.bashrc
# Then you can type -h option to check whether the installation is successful,
# If the output as follows, it means your installation is successful
$ rapvis_run.py -h
usage: rapvis_run.py [-h] -i INPUT [-o OUTPUT] [-p THREADS] [-lib path]
[-m {STAR,hisat2}] [-a ADAPTER] [-minlen N] [-trim5 N]
[--counts] [--rRNA] [-v]
A tool for RNAseq processing and visualization
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
the input data
-o OUTPUT, --output OUTPUT
output directory (default: processed_data)
-p THREADS, --threads THREADS
number of threads (CPUs) to use (default: 5)
-lib path, --libraryPath path
choose reference species for mapping and annotaion
-m {STAR,hisat2}, --mapper {STAR,hisat2}
choose the mapping program (default: STAR)
-a ADAPTER, --adapter ADAPTER
choose illumina adaptor (default: universal), choices
{universal, nextera, pAAAAA}
-minlen N discard reads shorter than N (default: 35)
-trim5 N remove N bases from the begining of each read
(default:0)
--counts Get gene counts
--rRNA whether mapping to rRNA(Human)
-v, --version show program's version number and exit
You can download genome sequence and annotations GTF file from GENCODE. Strongly recommended for mouse and human (files marked with PRI) : https://www.gencodegenes.org/.
Other species can download from ENSEMBL, such as Zebrafish,
genome sequences: ftp://ftp.ensembl.org/pub/release-101/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
GTF file: ftp://ftp.ensembl.org/pub/release-101/gtf/danio_rerio/Danio_rerio.GRCz11.101.gtf.gz
rapvis support STAR and hisat2 for mapping.
$ rapvis_build.py -mapper STAR -genome GRCh38.primary_assembly.genome.fa.gz -gtf gencode.v35.primary_assembly.annotation.gtf.gz
$ rapvis_build.py -mapper hisat2 -genome GRCh38.primary_assembly.genome.fa.gz -gtf gencode.v35.primary_assembly.annotation.gtf.gz
$ rapvis_run.py -i tests/data1/ -o TestsResult -p 5 -lib STAR_index -m STAR
$ rapvis_submit.py -i tests/data1/ -o TestsResult -lib STAR_index -m STAR -p 5 -t 2
rapvis can caculated different expressed genes, based on R limma:
$ rapvis_DE.py -i input_TPM.txt -wt 0:3 -ko 3:6 -p output:
We can perform gene ontology enrichment analysis by -go aption, and the -s also needed for determining species:
$ rapvis_DE.py -i input_TPM.txt -wt 0:3 -ko 3:6 -p output -go -s Human
If the input gene matrix not be normalized, we can use -norm option to normalize, it based on limma voom:
$ rapvis_DE.py -i input_counts.txt -wt 0:3 -ko 3:6 -p output -norm
We can get the correlation coeffcient heatmap of gene expresstion between samples:
$ rapvis_corr.py -i input_gene_TPM.txt
Several files included in the output directory:
- merge_gene_TPM.txt
the gene expression profiles for all samples, normalized by TPM - merge_qc_percent.pdf
a barplot of quality contrl details by trimmomatic - merge_mapping_percent.pdf
a barplot of the mapping details in each sample - merge_gene_TPM_species_type.pdf
a stat of detected gene species in each sample, group by gene type - merge_gene_TPM_species_EI.pdf
a stat of detected gene species in each sample, group by expression interval - merge_gene_TPM_density.pdf
a density plot for gene expression distribution