Python raw_tokenize Exemples

Langage de programmation: Python

Espace de nommage/Pack: reynir.bintokenizer

Méthode/Fonction: raw_tokenize

Exemples au hotexamples.com: 2

Python raw_tokenize - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de reynir.bintokenizer.raw_tokenize extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Exemple #1

0

Afficher le fichier

def ifd_tag(text): """ Tokenize the given text and use a global singleton TnT tagger to tag it """ global _TAGGER if _TAGGER is None: # Load the tagger from a pickle the first time it's used logging.info("Loading TnT model from {0}".format("config" + os.sep + "TnT-model.pickle")) _TAGGER = TnT.load("config" + os.sep + "TnT-model.pickle") if _TAGGER is None: return [] # No tagger model - unable to tag token_stream = raw_tokenize(text) result = [] def xlt(txt): """ Translate the token text as required before tagging it """ if txt[0] == '[' and txt[-1] == ']': # Abbreviation enclosed in square brackets: remove'em return txt[1:-1] return _XLT.get(txt, txt) for pg in paragraphs(token_stream): for _, sent in pg: toklist = [xlt(t.txt) for t in sent if t.txt] # print(f"Toklist: {toklist}") tagged = _TAGGER.tag(toklist) result.append(tagged) # Return a list of paragraphs, consisting of sentences, consisting of tokens return result

Exemple #2

0

Afficher le fichier

Fichier : tnttagger.py Projet : vthorsteinsson/Reynir

def ifd_tag(text): """ Tokenize the given text and use a global singleton TnT tagger to tag it """ global _TAGGER if _TAGGER is None: # Load the tagger from a pickle the first time it's used logging.info("Loading TnT model from {0}".format("config" + os.sep + "TnT-model.pickle")) _TAGGER = TnT.load("config" + os.sep + "TnT-model.pickle") if _TAGGER is None: return [] # No tagger model - unable to tag token_stream = raw_tokenize(text) result = [] def xlt(txt): """ Translate the token text as required before tagging it """ if txt[0] == '[' and txt[-1] == ']': # Abbreviation enclosed in square brackets: remove'em return txt[1:-1] return _XLT.get(txt, txt) for pg in paragraphs(token_stream): for _, sent in pg: toklist = [ xlt(t.txt) for t in sent if t.txt ] # print(f"Toklist: {toklist}") tagged = _TAGGER.tag(toklist) result.append(tagged) # Return a list of paragraphs, consisting of sentences, consisting of tokens return result