Skip to content

jianzuoyi/orfam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OR gene family identification

This is a pipline for identification of olfactory receptor gene family

ORFAM

A pipeline for identification of olfactory receptor(OR) gene family

Table of Contents

  1. Quick start
  2. Installation
  3. Usage
  4. Example workflows

Quick start

  1. Install

    git clone git@github.com:jianzuoyi/orfam.git
    cd orfam
    make
    
  2. Run the example script

    cd example
    ./run_orfam
    

Installation

Prerequisites

Configuration

System paths to orfam's component software are specified in the [orfam.config] (bin/orfam.config) file, which should reside in the same directory as the orfam executable (for alternate locations use the -K flag). Upon installation, orfam attempts to automatically generate this file, but manual editing may be necessary.

Install components

If any components already exist on the system, their paths should be manually specified by editing orfam.config.

Usage

orfam is a modular framework with three components:

  • orfam align - Search against the target genome with known OR protein sequences as query and produce a alignment result file that can be processed with other orfam modules.
  • orfam func - Identification of intact OR genes.
  • orfam pseudo - Identification of truncated OR genes and pseudogenes.

orfam align

orfam align Search against the target genome with known OR protein sequences as query and produce a alignment result file that can be processed with other orfam modules.

Internally, orfam align runs the following steps to produce a output file (BLAST format 6):

  1. Discard the query sequences which length is less than 250
  2. Alignment with TBLASTN
usage:	orfam align [options]
Alignment options
-q FILE olfactory receptor proteins (FASTA)
-s FILE subject genome (FASTA)
-o STR 	output file [.align]
-T DIR 	temp directory [./tmpXXXXXXXX]
 
TBLASTN options
-e FLOAT evalue for hits
-t INT 	threads [1]
Global options
-K FILE path to orfam.config file (default: same directory as orfam)
-v 	verbose
-h 	show this message

Output

orfam align produces a single output file (BLAST format 6):

  • outprefix.tblastn
    • The alignment result file. This file serve as input for orfam func

orfam func

orfam func identifies intact OR genes from the target genome.

usage:	orfam func [options]
Options
-R FILE reference file (fasta) (required)
-r FILE reference olfactory receptor (fasta) (required)
-B FILE BED file represents the regions of reference olfactory receptor (required)
-A FILE tblastn output (tabular) (required)
-O FILE olfactory receptor for outgroup (fasta) (required)
-S FILE MAO file, setting used to the construction of phylogenetic tree (generated by megaproto) (required)
-o STR 	output prefix [required]
-t INT  threads [1]
-T DIR 	temp directory [./tmpXXXXXXXX]
-k 		keep temporary files
-K FILE path to orfam.config file (default: same directory as orfam)
-v 		verbose
-h 		show this message

Output

orfam func produces two output file:

  • outprefix_best_hit.gff
    • This GFF file contains all OR candicate sequences which can be classified into three types: Intact OR genes, Truncated OR genes and OR pseudogenes.
  • outprefix_intact.fa
    • This FASTA file contains all Intact OR gene sequences.

orfam pseudo

orfam pseudo identifies truncated OR genes or OR pseudogenes.

usage:	orfam pseudo [options]
Options
-s FILE subject genome (fasta) (required)
-q FILE query olfactory receptor proteins (fasta) (required)
-b FILE best hits (gff) (required)
-i FILE intact olfactory receptor (fasta) (required)
-o STR 	output prefix
-T DIR 	temp directory [./tmpXXXXXXXX]
-k 	keep temporary files
-K FILE path to orfam.config file (default: same directory as orfam)
-v 	verbose
-h 	show this message"

Output

orfam pseudo produces five output files:

  • outprefix_truncated.gff
    • This GFF file contains truncated OR genes.
  • outprefix_pseudo.gff
    • This GFF file contains OR pseudogenes.
  • outprefix_pseudo_nonsense.fa
    • This FASTA file contains olfactory receptors with nonsense mutations.
  • outprefix_pseudo_frameshift.fa
    • This FASTA file contains olfactory receptors with frame shift mutations.
  • outprefix_pseudo_others.fa
    • This FASTA file contains olfactory receptors with other mutations.

Example workflows

Identification of OR gene from a target genome

  1. Use orfam align to produce a alignment result file.

    orfam align \
    	-q data/ORs/ORs.fa \
    	-s data/mm10/mm10.fa \
    	-o mm10 \
    	-e 1e-10 \
    	-t 20 \
    	-T temp \
    	-v \
    	-k
    
  2. Use orfam func to identify intact OR genes.

    orfam func \
    	-R data/mm10/mm10.fa \
    	-r data/ORs/O43749.fasta \
    	-B data/ORs/O43749.bed \
    	-O data/ORs/outgroup.fa \
    	-S bin/infer_NJ_protein.mao \
    	-A mm10.tblastn \
    	-o mm10 \
    	-t 20 \
    	-T temp \
    	-k \
    	-v
    
  3. Use orfam pseudo to identify truncated OR genes and OR pseudogenes.

    orfam pseudo \
    	-s data/mm10/mm10.fa \
    	-q intact/mm10_intact.fa \
    	-b mm10_best_hit.gff \
    	-i mm10_intact.fa \
    	-o mm10 \
    	-T temp \
    	-k \
    	-v