Skip to content

luckchem/GiniQC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GiniQC

Dependencies

  • Python 3
  • Cooler (pip install cooler)
  • numpy (pip install numpy)
  • pandas (pip install pandas)
  • tqdm (pip install tqdm)
  • for file conversion: pairix

Usage

Usage: bash GiniQC.sh [-h] -f FILE(s) -o OUTFILE -b BEDFILE [-c CISTHRESHOLD] [-g GINITHRESHOLD] [-r READSTHRESHOLD] [-a MAXABERRATION] [-x]
	-h|--help				prints this message
	-f FILE(s)			path to cooler matrix file (must end in .cool) or path to a list of files (any other extension)
	-o OUTFILE			desired name for output files
	-b BEDFILE			paired-end fastq files corresponding to a single cell
	-c CISTHRESHOLD		minimum percent cis value per cell (default: 80)
	-g GINITHRESHOLD	minimum GiniQC value (if not specified, our tool will suggest a threshold)
	-r READSTHRESHOLD 	minimum number of reads per cell (default: 10,000 reads)
	-a MAXABERRATION 	maximum fold-change in chromosomal sequencing coverage (default: 2-fold)
	-x					when used, GiniQC threshold is determined only on cells passing cis threshold (see -c above)

When you would like to run GiniQC on several files, please note that the list of files must be in the same directory as the specified files OR the list of files must specify the full path of the files.

Sample run

To test the code using a single file, the user can try the following command:

bash GiniQC.sh -f test/cell1.chrom.cool -o cell1_stats.txt -b helper_files/mm10.chroms.bed

To test the code using a list of files, the user can try the following command:

bash GiniQC.sh -f test/file_list.txt -o test_output.txt -b helper_files/mm10.chroms.bed

File format

GiniQC takes cool files as input. For the user's convenience, we provide two scripts to convert bedpe and ncc files to the cool format for compatibility with GiniQC. These can be found under the utilities directory in this repository. In order to use these scripts, the user will need to download pairix.

bedpe2cool usage for a mouse dataset:

bash bedpe2cool.sh input.bedpe mm10

More on the bedpe format.

ncc2cool usage for a mouse dataset:

bash ncc2cool.sh input.ncc mm10

More on the ncc format.

For conversion from pairs format, see cooler. For conversion from hic format, see hic2cool.

References

The raw data used to create the contact matrices provided in this repository as a test dataset were generated by Stevens et al. (2017):

Stevens, T.J., Lando, D., Basu, S., Atkinson, L.P., Cao, Y., Lee, S.F., Leeb, M., Wohlfahrt, K.J., Boucher, W., O’Shaughnessy-Kirwan, A., et al. (2017). 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature 544, 59–64.

About

Tool for single-cell Hi-C quality control

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.2%
  • Shell 27.8%