Skip to content
/ rapvis Public

A tool for RNAseq processing and visualization

License

Notifications You must be signed in to change notification settings

liuwell/rapvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rapvis: a tool for RNAseq processing and visualization

Dependency

Required python version:

  • python >= 3.6

Sevral external software were depended for rapvis:

  • trimmomatic
  • STAR
  • hisat2
  • stringtie
  • bwa
  • samtools
  • featureCounts

Mandatory

  • pandas >= 1.1.2
  • numpy
  • matplotlib
  • seaborn
  • GSEApy
  • rpy2

Installation

Installing from github

# Clone remote repository
$ git clone https://github.com/liuwell/rapvis.git
  
# Install required python pacakge
$ cd rapvis
$ pip install -r requirements.txt
  
# Add execution path
# The path of current dir can get by shell command "pwd"
$ echo "export PATH=$PATH:current_dir/rapvis" >> ~/.bashrc
$ source ~/.bashrc
# Then you can type -h option to check whether the installation is successful,  
# If the output as follows, it means your installation is successful
$ rapvis_run.py -h
usage: rapvis_run.py [-h] -i INPUT [-o OUTPUT] [-p THREADS] [-lib path]
                     [-m {STAR,hisat2}] [-a ADAPTER] [-minlen N] [-trim5 N]
                     [--counts] [--rRNA] [-v]

A tool for RNAseq processing and visualization

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        the input data
  -o OUTPUT, --output OUTPUT
                        output directory (default: processed_data)
  -p THREADS, --threads THREADS
                        number of threads (CPUs) to use (default: 5)
  -lib path, --libraryPath path
                        choose reference species for mapping and annotaion
  -m {STAR,hisat2}, --mapper {STAR,hisat2}
                        choose the mapping program (default: STAR)
  -a ADAPTER, --adapter ADAPTER
                        choose illumina adaptor (default: universal), choices
                        {universal, nextera, pAAAAA}
  -minlen N             discard reads shorter than N (default: 35)
  -trim5 N              remove N bases from the begining of each read
                        (default:0)
  --counts              Get gene counts
  --rRNA                whether mapping to rRNA(Human)
  -v, --version         show program's version number and exit

Build genome index

You can download genome sequence and annotations GTF file from GENCODE. Strongly recommended for mouse and human (files marked with PRI) : https://www.gencodegenes.org/.

Other species can download from ENSEMBL, such as Zebrafish,
genome sequences: ftp://ftp.ensembl.org/pub/release-101/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
GTF file: ftp://ftp.ensembl.org/pub/release-101/gtf/danio_rerio/Danio_rerio.GRCz11.101.gtf.gz

rapvis support STAR and hisat2 for mapping.

1. build STAR index

$ rapvis_build.py -mapper STAR -genome GRCh38.primary_assembly.genome.fa.gz -gtf gencode.v35.primary_assembly.annotation.gtf.gz

2. build hisat2 index

$ rapvis_build.py -mapper hisat2 -genome GRCh38.primary_assembly.genome.fa.gz -gtf gencode.v35.primary_assembly.annotation.gtf.gz

Usage

1. Run in local

$ rapvis_run.py -i tests/data1/ -o TestsResult -p 5 -lib STAR_index -m STAR

2. Submit the tasks to cluster

$ rapvis_submit.py -i tests/data1/ -o TestsResult -lib STAR_index -m STAR -p 5 -t 2

3. Caculated differently expressed genes

rapvis can caculated different expressed genes, based on R limma:

$ rapvis_DE.py -i input_TPM.txt -wt 0:3 -ko 3:6 -p output:

We can perform gene ontology enrichment analysis by -go aption, and the -s also needed for determining species:

$ rapvis_DE.py -i input_TPM.txt -wt 0:3 -ko 3:6 -p output -go -s Human

If the input gene matrix not be normalized, we can use -norm option to normalize, it based on limma voom:

$ rapvis_DE.py -i input_counts.txt -wt 0:3 -ko 3:6 -p output -norm

4. The Correlation coefficient between samples

We can get the correlation coeffcient heatmap of gene expresstion between samples:

$ rapvis_corr.py -i input_gene_TPM.txt

Output

Several files included in the output directory:

  • merge_gene_TPM.txt
    the gene expression profiles for all samples, normalized by TPM
  • merge_qc_percent.pdf
    a barplot of quality contrl details by trimmomatic
  • merge_mapping_percent.pdf
    a barplot of the mapping details in each sample
  • merge_gene_TPM_species_type.pdf
    a stat of detected gene species in each sample, group by gene type
  • merge_gene_TPM_species_EI.pdf
    a stat of detected gene species in each sample, group by expression interval
  • merge_gene_TPM_density.pdf
    a density plot for gene expression distribution

About

A tool for RNAseq processing and visualization

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages