Bamanankan

README

To run the programs, the following programs have to be installed:

python 3;
NLTK (http://www.nltk.org/install.html);
CRFSuite https://pypi.python.org/pypi/python-crfsuite

Running the programs will create the folder "nltk_data" in C:/User directory, where the corpus files in the folder "Corpus" will be automatically copied to.

Containing Folders

!Note: Corpusreader htmlreader.py is supposed to read files in Daba HTML format (Bambara Reference Corpus).

Corpus

Folder which has to contain the corpus files

Models

folder empty, but will be used to store models to when CRFTaggers is trained

Results

folder empty, results of analyzeContingency.py will be saved to this folder

Containing Files

analyzeContingency.py

calculates percentage of words tagged i which in reality are j; saves result to a file in Results folder
looks for the words which are responsible for these errors and save each error to a file in Results folder
saves confusionmatrix to Results folder

backoffCombi.py

combines taggers according to the backoff-chaining given in NLTK

bambara_tagging_htmlreaderALL.py

loads the corpus files and creates a reader needed to work with the sentences/words in the corpus
used by create_reader

bamadaba.txt

the Bamadaba dictionary (http://cormand.huma-num.fr/bamadaba.html)

bamadaba_non_tonal.txt

the Bamadaba dictionary (http://cormand.huma-num.fr/bamadaba.html), but without tones

confusionmatrix.py

slightly modified version of confusionmatrix.py of NLTK
function was added so that switches (tagger tagged word with tag A instead of the right tag B) can be analyzed

create_reader.py

uses bambara_tagging_htmlreaderALL to create a reader (with htmlreaderALL.py) needed to work with the corpus files

crf.py

modified crf of NLTK (features added)

CrossValidation.py

implementation of a 9-fold crossvalidation

disambiguation.py

removes sentences containing words tagged ambiguously (e.g. n/v)

ensemblecombinationBrillWu_Html.py

calculates complementarity and disagreement of the taggers CRF, TnT, HMM and either Unigram or a backoff Tagger (Bigram+Affix+Dictionary+Regexp+DefaultTagger) according to Brill & Wu (1998)
save result to files in Results folder

ensemblecombinationBrillWu_HtmlREGEX,py

calculates complementarity and disagreement of the taggers CRF, TnT, HMM, Unigram, Regexp
saves result to files in Results folder

htmlreaderALL.py

reader for Daba HTML files
it´s a modified xml reader of NLTK: nltk.corpus.reader.xmldocs; uses parts of HTMLReader of Kirill Maslinsky

indivTaggers.py

trains individual taggers more easily
also used by other programs

patterns[_non]_tonal[_SA].py

patterns for the RegexpTagger, see http://cormand.huma-num.fr/gloses.html

regextagger_[non_]tonal[_SA].py

RegexpTagger for each form of training (tonal or nontonal, with or without Affixes)

toolboxreaderRun.py

contains function to get the alternative words to an entry in the dictionary
use to create the DictionaryTagger

Voting.py

implementation of several voting strategies of the ensemble combination

(for further information on the files, look into the files header)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Corpus		Corpus
Models		Models
Results		Results
doc		doc
CrossValidation.py		CrossValidation.py
LICENCE_bamadaba.txt		LICENCE_bamadaba.txt
LICENSE.txt		LICENSE.txt
LICENSE_Apache.txt		LICENSE_Apache.txt
LICENSE_GPL.txt		LICENSE_GPL.txt
README.md		README.md
Voting.py		Voting.py
analyzeContingency.py		analyzeContingency.py
backoffCombi.py		backoffCombi.py
bamadaba.txt		bamadaba.txt
bamadaba_non_tonal.txt		bamadaba_non_tonal.txt
bambara_tagging_htmlreaderALL.py		bambara_tagging_htmlreaderALL.py
confusionmatrix.py		confusionmatrix.py
create_reader.py		create_reader.py
crf.py		crf.py
disambiguation.py		disambiguation.py
ensemblecombinationBrillWu_Html.py		ensemblecombinationBrillWu_Html.py
ensemblecombinationBrillWu_HtmlREGEX.py		ensemblecombinationBrillWu_HtmlREGEX.py
htmlreaderALL.py		htmlreaderALL.py
indivTaggers.py		indivTaggers.py
patterns_non_tonal.py		patterns_non_tonal.py
patterns_non_tonal_SA.py		patterns_non_tonal_SA.py
patterns_tonal.py		patterns_tonal.py
patterns_tonal_SA.py		patterns_tonal_SA.py
regextagger_non_tonal.py		regextagger_non_tonal.py
regextagger_non_tonal_SA.py		regextagger_non_tonal_SA.py
regextagger_tonal.py		regextagger_tonal.py
regextagger_tonal_SA.py		regextagger_tonal_SA.py
toolboxreaderRun.py		toolboxreaderRun.py

License

Licenses found

Batene/Bamanankan

Folders and files

Latest commit

History

Repository files navigation

Bamanankan

Containing Folders

Containing Files

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages