Requirements

Input File

The input file should be a resulting PSM (peptide-spectrum match) output file from a database search such as Mascot or Andromeda in tab delimited format. The minimum required columns are :

Sequence: The amino acid sequence of a PSM
Modifications: A column that describes the modifications of the sequence.

(Note for MaxQuant this information is already provided in Modified Sequence)

PrecursorArea: (or Intensity) the AUC (or SPCs) for each PSM
IonScore: Mascot (or search engine equivalent) score for each PSM
q_value: Percolator (or search engine equivalent) score for each PSM Note that PEP can be used in place of q_value with the rough observational approximation that PEP / 10 = q_value

Charge

Renaming Columns

A base configuration file for renaming column headers is provided with gpgrouper To generate it, simply run gpgrouper getconfig and a config file will be generated. The configuration file can be specified by the --configfile flag in gpgrouper run, but if not specified and the default gpgrouper_config.ini file exists, it will be used.

This file can be edited with addition of new column aliases as needed, though it should be set up to work with ProteomeDiscoverer+Mascot and MaxQuant already.

Additional info for MaxQuant

For MaxQuant output, the evidence.txt PSMs file should be used as input. gpgrouper is designed to be run separately for each experiment. As of writing, it is not configured to work with multiple label-free experiments searched together under say MaxQuant. Therefore, separate experiments need to be separated. In other words, if MaxQuant is used to search multiple experiments at once (potentially with match between runs) the evidence.txt file needs to be separated into separate “experiment” files. A simple script is available for doing just this.

FASTA Database File

The database file should be a pre-constructed tab delimited file for matching PSMs to their respective GeneIDs. The required columns are TaxonID, HomologeneID, GeneID, ProteinGI, FASTA. Note that PyGrouper uses GeneID to group PSMs, so if a GeneID is lacking for a desired grouping another identifier can be substituted in such as ProteinGI. HomologeneID can be an empty column if this information is not available or desired.

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
gpgrouper		gpgrouper
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
__init__.py		__init__.py
database_config.py		database_config.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpgrouper

gpgrouper

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.org

README.org

init.py

init.py

database_config.py

database_config.py

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Requirements

Input File

Renaming Columns

Additional info for MaxQuant

FASTA Database File

About

Releases

Packages

Languages

License

malovannaya-lab/gpgrouper

Folders and files

Latest commit

History

Repository files navigation

Requirements

Input File

Renaming Columns

Additional info for MaxQuant

FASTA Database File

About

Resources

License

Stars

Watchers

Forks

Languages