Skip to content

brwnj/umitools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UMI Tools

DOI

Tools to handle reads sequenced with unique molecular identifiers (UMIs).

Trim the UMI

Incorporate the UMI into the read name in order to later identify while processing mapped reads.

umitools trim --end 5 unprocessed_fastq NNNNNV > out.fq

If you want to save reads with invalid UMI sequences, you can specify --invalid.

umitools trim --end 5 --invalid bad_umi.fq unprocessed_fastq NNNNNV > out.fq

Remove Duplicates

For any given start site, save only one read per UMI. Writes bed3+ to stdout with before and after counts per start.

umitools rmdup unprocessed.bam out.bam > before_after.bed

Specifying --mismatches will, for a given start site, merge all UMIs within that edit distance into a single unique hit. For example, if a new UMI is within a single mismatch of any existing observed UMIs for a start position, it will be merged and considered a duplicate. The mismatch can occur at any position, regardless of the IUPAC sequence you're using.

Installation

umitools has two requirements: pysam and editdist. Use pip to install pysam.

pip install pysam

editdist has to be downloaded and installed from source (Downloads page).

wget https://py-editdist.googlecode.com/files/py-editdist-0.3.tar.gz
tar xzf py-editdist-0.3.tar.gz
cd py-editdist-0.3/
python setup.py install

Finally download and install umitools from source.

wget -O umitools-master.zip https://github.com/brwnj/umitools/archive/master.zip
unzip umitools-master.zip
cd umitools-master
python setup.py install

About

Tools to handle reads sequenced with unique molecular identifiers (UMIs).

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages