PyNLPl, pronounced as "pineapple", is a Python library for Natural Language Processing. It is a collection of various independent or loosely interdependent modules useful for common, and less common, NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also more complex data types, such as Priority Queues, and search algorithms, such as Beam Search.
The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.
The following modules are available:
pynlpl.datatypes
- Extra datatypes (priority queues, patterns, tries)pynlpl.evaluation
- Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)pynlpl.formats.cgn
- Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tagspynlpl.formats.folia
- Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).pynlpl.formats.fql
- Extensive library for the FoLiA Query Language (FQL), built on top ofpynlpl.formats.folia
. FQL is currently documented here.pynlpl.formats.cql
- Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.pynlpl.formats.giza
- Module for reading GIZA++ word alignment datapynlpl.formats.moses
- Module for reading Moses phrase-translation tables.pynlpl.formats.sonar
- Largely obsolete module for pre-releases of the SoNaR corpus, usepynlpl.formats.folia
instead.pynlpl.formats.timbl
- Module for reading Timbl output (consider using python-timbl instead though)pynlpl.lm.lm
- Module for simple language model and reader for ARPA language model data as well (used by SRILM).pynlpl.search
- Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)pynlpl.statistics
- Frequency lists, Levenshtein, common statistics and information theory functionspynlpl.textprocessors
- Simple tokeniser, n-gram extraction
API Documentation can be found here.