Skip to content

frankier/lextract

Repository files navigation

lextract - Dictionary based lexical item extractor

Overview

lextract.aho_corasick

Find multiwords in text using an Aho Corasick automaton. Works for Mandarin and Finnish.

lextract.keyed_db

Find multiwords in text using the rarest lemma as a key. Can find contiguous multiwords in tokenized text or discontinuous ones from a dependency tree.

lextract.mweproc

Processing pipeline for FinnMWE.

Documentation

There are only tests and a few docstrings for now.

About

Dictionary based lexical item extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages