Python stem示例

编程语言: Python

命名空间/包名称: magpie.misc.stemmer

方法/功能: stem

hotexamples.com的示例: 4

Python stem - 已找到4个示例。这些是从开源项目中提取的最受好评的magpie.misc.stemmer.stem现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： global_index.py 项目： mediacloud/nytlabels-annotator-train

 def add_document(self, doc):
     """
     Add the contents of a document to the index
     :param doc: Document object
     """
     for w in doc.get_meaningful_words():
         self.index[stem(w)].add(doc.doc_id)
     self.total_docs += 1

示例#2

显示文件

def tokenize_keyword(kw_parsed):
    """
    Preprocess a keyword for feature computing. Split a parsed label into words
    and stem each one.
    :param kw_parsed: parsed form of a KeywordToken object

    :return: list of strings/unicodes
    """
    return [stem(w) for w in kw_parsed.split()]

示例#3

显示文件

文件： utils.py 项目： mediacloud/nytlabels-annotator-train

def get_anchors(words, ontology):
    """
    Match single words in the document over the topology to find `anchors`
    i.e. matches that later on can be used for ngram generation or
    subgraph extraction

    :param words: an iterable of all the words you want to get anchors from
    :param ontology: Ontology object

    :return a list of KeywordTokens with anchors
    """
    trie = ontology.get_trie()
    anchors = dict()

    for position, word in enumerate(words):
        for form in [word, stem(word)]:
            if form in trie:
                uri = ontology.get_uri_from_label(form)
                add_token(uri, anchors, position, ontology, form=form)

    return anchors.values()

示例#4

显示文件

 def _build_index(self, words):
     for position, word in enumerate(words):
         stemmed_word = stem(word)
         self.add_occurrence(stemmed_word, position)