This is a set of tools used to train a model to classify three main dialect groups of the Irish language: Ulster, Connacht, Munster. The classifier uses a model trained on a large corpus of Irish to detect Irish dialect either at the document or sentence level.
- Python 2.7.x
- sklearn >= 0.17
If you want to train your own model, you'll need the Nua-Chorpas na hÉireann/The New Corpus for Ireland available upon request here. This is a large corpus of >30 million Irish words from various texts in Irish. Note: I am not affiliated with the creators of the corpus, and thus I cannot grant access to the corpus itself.
Coming soon
Licencsed under GPL V3. If you use, modify or distribute this code, please make sure the source code is freely available.