Skip to content
forked from YosefLab/UMIpipe

A pipeline for processing UMI RNAseq data

Notifications You must be signed in to change notification settings

shulp2211/UMIpipe

 
 

Repository files navigation

UMIpipe.py

  • This python script converts fastq file of dropseq sequencing data to an expression matrix where each column correspond to a cell, and each row correspond to a gene.
  • The dependencies of this pipeline include
    • picard-tools
    • dropseq-tools (included in the depository)
    • STAR
    • samtools
    • references files: fasta, a picard .dict file in the same path as the fasta file, STAR index, a gtf file. The paths to the file is pre-specified and the user only need to specify the species using the --ref option. For now mm10 and hg38 are supported.
  • The script take a number of arguments at the beginning (see using -h), most of which have default values adapted to running on the yosef2 queue. An example of command is in runUMIpipe.sh. It has 5 basic parts:
    • Convert fastq to sam: requires 3 arguments
      • --fq1: fastq read 1
      • --fq2: fastq read 2
      • --samplename: output name
    • Tag barcode: this lets tag in bam files. The cell barcode tag is attached to an optional field in the sam file with the non-barcode read XC. The molecular barcode is attached to field XM. It assumes by default that read 1 contains the barcode sequence, and that cell barcode is base 1-12, and the molecular barcode is 13-20. At the end of the tagging, the first read is discarded. The default

Auxilary scripts

  • compare_counts.R
  • count_gene_exon.py

About

A pipeline for processing UMI RNAseq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.8%
  • Python 4.6%
  • Other 0.6%