The Classical Language Toolkit (CLTK) offers natural language processing support for Classical languages. In some areas, it extends the NLTK. The goals of the CLTK are to:
- compile analysis-friendly corpora in a variety of Classical languages (currently available for Chinese, Coptic, Greek, Latin, Pali, and Tibetan);
- gather, improve, and generate linguistic data required for NLP (Greek and Latin are in progress, with more in the pipeline);
- develop a free and open platform for generating reproducible, scientific research that advances the study of the languages and literatures of the ancient world.
See installation instructions available in the docs.
The docs are at docs.cltk.org.
The CLTK can download corpora, training sets, models, etc., which are kept in the CLTK's GitHub user group. See docs about importing these corpora.
Each major release of the CLTK is given a DOI, a type of unique identity for digital documents. This DOI ought to be included in your citation, as it will allow your readers to reproduce your scholarship should the CLTK's API or codebase change. To find the CLTK's current DOI, observe the blue DOI
button in the repository's home on GitHub. To the end of your bibliographic entry, append DOI
plus the current identifier.
Please cite core software as:
Kyle P. Johnson et al.. (2014-2016). CLTK: The Classical Language Toolkit. DOI 10.5281/zenodo.<current_release_id>
A style-neutral BibTeX entry would look like this:
@Misc{johnson2014,
author = {Kyle P. Johnson et al.},
title = {CLTK: The Classical Language Toolkit},
howpublished = {\url{https://github.com/cltk/cltk}},
note = {{DOI} 10.5281/zenodo.<current_release_id>},
year = {2014--2016},
}
You may also add version/release number, located in the pypi
button at the project's GitHub repository homepage.
The CLTK is Copyright (c) 2016 Kyle P. Johnson, under the MIT License. See 'LICENSE' for details.