Skip to content
/ BiGPY Public

Building similiarity graphs from large-scale biological sequence collections using Spark

License

Notifications You must be signed in to change notification settings

raj347/BiGPY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BiGPy - Biological Similarity Graphs with PySpark.

bigpy-prepare is used to prepare input data to be used later for bigpy-sketch. It takes an input file or directory and processes the fasta files within to create several output files.

.brm - Lists all sequences headers from the input file(s) that were filtered during the processing stage.

.btxt - Lists all sequences, without headers, kept after filtering. A single line stores one text sequence and its ID.

.bmap - Lists all sequence headers kept after filtering with a sequence ID. ID is the line in the .bseq and .btxt files in which the sequence is listed.

About

Building similiarity graphs from large-scale biological sequence collections using Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published