- Python 3
- Cooler (
pip install cooler
) - numpy (
pip install numpy
) - pandas (
pip install pandas
) - tqdm (
pip install tqdm
) - for file conversion: pairix
Usage: bash GiniQC.sh [-h] -f FILE(s) -o OUTFILE -b BEDFILE [-c CISTHRESHOLD] [-g GINITHRESHOLD] [-r READSTHRESHOLD] [-a MAXABERRATION] [-x]
-h|--help prints this message
-f FILE(s) path to cooler matrix file (must end in .cool) or path to a list of files (any other extension)
-o OUTFILE desired name for output files
-b BEDFILE paired-end fastq files corresponding to a single cell
-c CISTHRESHOLD minimum percent cis value per cell (default: 80)
-g GINITHRESHOLD minimum GiniQC value (if not specified, our tool will suggest a threshold)
-r READSTHRESHOLD minimum number of reads per cell (default: 10,000 reads)
-a MAXABERRATION maximum fold-change in chromosomal sequencing coverage (default: 2-fold)
-x when used, GiniQC threshold is determined only on cells passing cis threshold (see -c above)
When you would like to run GiniQC on several files, please note that the list of files must be in the same directory as the specified files OR the list of files must specify the full path of the files.
To test the code using a single file, the user can try the following command:
bash GiniQC.sh -f test/cell1.chrom.cool -o cell1_stats.txt -b helper_files/mm10.chroms.bed
To test the code using a list of files, the user can try the following command:
bash GiniQC.sh -f test/file_list.txt -o test_output.txt -b helper_files/mm10.chroms.bed
GiniQC takes cool files as input. For the user's convenience, we provide two scripts to convert bedpe and ncc files to the cool format for compatibility with GiniQC. These can be found under the utilities
directory in this repository. In order to use these scripts, the user will need to download pairix.
bedpe2cool usage for a mouse dataset:
bash bedpe2cool.sh input.bedpe mm10
ncc2cool usage for a mouse dataset:
bash ncc2cool.sh input.ncc mm10
For conversion from pairs format, see cooler. For conversion from hic format, see hic2cool.
The raw data used to create the contact matrices provided in this repository as a test dataset were generated by Stevens et al. (2017):
Stevens, T.J., Lando, D., Basu, S., Atkinson, L.P., Cao, Y., Lee, S.F., Leeb, M., Wohlfahrt, K.J., Boucher, W., O’Shaughnessy-Kirwan, A., et al. (2017). 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature 544, 59–64.