Skip to content

brwnj/repertoire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#repertoire

Profiling model T-cell and B-cell metagenomes with short reads.

##Receptor Assembly

My reads for this T cell analysis were from sheared fragments so I had to assemble using iSSAKE.

.gz is supported throughout the pipeline

Quality trim your fastq

seqtk trimfq in.fq > trimmed.fq

Convert fastq to fasta

bioawk -c fastx '{print ">"$name"\n"$seq}' trimmed.fq > trimmed.fa

Download TCRB predictions from IMGT (TRAV or TRBV)

Create tags from IMGT regions

python create_tags.py -v -l 35 trav.fa > trav.tags.fa

Find seeds among your reads

python find_seeds.py -v trav.tags.fa in.fq > seeds.fa

Run iSSAKE

iSSAKE -f trimmed.fa -s seeds.fa -b sampleid

##Contig Assessment

Download J regions based on strand (TRAJ or TRBJ).

Rename fasta names

python renameIMGT.py --gene TRAJ imgt_traj.fa > traj.fa

Locally align J regions to assembled contigs

exonerate -q sampleid.contigs \
    -t traj.fa \
    --bestn 1 \
    --ryo ">%qi|%ti\n%qs" \
    --showalignment FALSE \
    --showvulgar FALSE \
    > sampleid.exonerate_out.fa

This step not only filters out possible bad contigs that have identifiable J region, but also adds the J region name onto the read name. You'll have to filter out some unwanted lines added by exonerate.

grep -v "Command line:\|Hostname:\|-- completed" sampleid.exonerate_out.fa > sampleid.fa

Parse read names into data table

python reads2meta.py sampleid.fa > sampleid.metadata

##Links

Bioawk: https://github.com/lh3/bioawk

Python dependency: pip install toolshed

About

Pipelines for assembling and assessing T and B cell receptor repertoires.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published