Skip to content

jdblischak/UMI-tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tools for dealing with Unique Molecular Identifiers

This repository contains a number of tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs). Currently there are two tools:

extract_umi.py: Flexible removal of UMI sequences from fastq reads.

UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read.

dedup_umi.py: Implements a number of different UMI deduplication schemes.

The recommended methods are directional_adjecency and adjecency. In general directional_adjecency seems to be less sensitive to starting conditions, but there are situations where adjecency might out perform.

See simulation results at the CGAT blog.

Genome Science 2015 poster.

Installation

Both tools are just python scripts. Type

` python dedup_umi.py --help`

or

` python extract_umi.py --help`

for help. dedup_umi.py is dependent on numpy, pandas and both are dependent, at the moment, on CGAT.

About

Tools for handling Unique Molecular Identifiers in NGS data sets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%