CHarm is a tool harmonizing the codon usage in a DNA sequence prior to heterologous expression. This is done by matching the codon usage of the target host organism to the codon usage of the native origin organism.
The underlying algorithm has been described by Angov et al. (2008)1 The current implementation differs from the described algorithm, though:
- It implements by default a lower threshold for the codon usage in the target host. This threshold is only fallen below if the codon usage in the origin organism is also below this threshold. The reference behavior can be restored by explicitly choosing a threshold of zero.
- In its current state, CHarm does not search for putative link/end segments that require slow translational progress as described by Thanaraj & Argos (1996)2. If no structural data for the translated protein is available, this might not have a big impact, as those predicted link/end segments might not represent the real protein structure.
- Easy to use: Self-explaning command line frontend: Obtains codon usage table directly from http://www.kazusa.or.jp/codon. You only have to enter the species id (e.g. 83333 for E. coli K12) and the path to your input sequence.
- Cross platform: CHarm is written in Python and uses wide-spread modules like matplotlib, NumPy and Biopython for data processing and visualization. It will run on any platform that is supported by Python.
- Open source: CHarm is licensed under the MIT License. You can freely alter its codebase as it fits your needs.
Make sure you have all the necessary dependencies in place.
GNU/Linux users: Use your distribution's package manager to download and install Python and the dependency modules.
Mac OS X users: Download and install installer packages for Python and the dependencies from the respective download sites (see below).
Windows users: Installers for Python and the necessary modules are available at the respective download sites (see below). Alternatively, if you are member of a degree granting educational institution (e.g. university), you can apply for a free license of Enthought Canopy which will provide you with a full-flavored Python environment including all necessary additional packages.
Dependencies
- Python 3 (tested with Python 3.6.3)
- NumPy (tested with NumPy 1.14.1)
- Biopython (tested with Biopython 1.70)
- matplotlib (tested with matplotlib 2.1.2)
- html5lib (tested with html5lib 1.0.1)
- BeautifulSoup4 (tested with BeautifulSoup 4.6.0)
- Open a terminal. You can check whether the python executable is available by executing
python --version
This should result in e.g. 'Python 3.4.0'
-
Open http://www.kazusa.or.jp/codon in a web browser of your choice and navigate to the codon usage tables both of the organism of origin and the expression host. Note down the id numbers as they appear in the navigation bar ("[..]?species=XXXX"). For E. coli (K12) the id is '83333'.
-
Save the sequence to be harmonized (DNA/RNA) in FASTA format.
-
Run the harmonization with default options:
python ./charm-cli.py <id origin> <id host> <path to sequence file>
To see all available options, run
python ./charm-cli.py --help
- Angov, E., Hillier, C. J., Kincaid, R. L., & Lyon, J. a. (2008). Heterologous protein expression is enhanced by harmonizing the codon usage frequencies of the target gene with those of the expression host. PloS one, 3(5), e2189. doi:10.1371/journal.pone.0002189
- Thanaraj, T. a, & Argos, P. (1996). Protein secondary structural types are differentially coded on messenger RNA. Protein science : a publication of the Protein Society, 5(10), 1973–83. doi:10.1002/pro.5560051003