Python tokenizeText示例

编程语言: Python

命名空间/包名称: minerva.proc.nlp_functions

方法/功能: tokenizeText

hotexamples.com的示例: 3

Python tokenizeText - 已找到3个示例。这些是从开源项目中提取的最受好评的minerva.proc.nlp_functions.tokenizeText现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： aac_import.py 项目： danieldmm/minerva

    def processBOW(self, text):
        """
            Takes a list of strings, returns a set of tokens.
        """
        text=" ".join(text)
        text=text.lower()
        text=re.sub(r"[\.,;\-\"]", " ", text)
        tokens=tokenizeText(text)
##        tokens=text.lower().split()
        tokens=[token for token in tokens if token not in punctuation and token not in basic_stopwords]
        return set(tokens)

示例#2

显示文件

文件： athar_corpus.py 项目： danieldmm/minerva

def getOutlinkContextAtharAnnotated(context):
    """
        Returns a context as annotated: list of tokens
    """
    tokens=[]
    for line in context["lines"]:
        sent=line["sentiment"]
        if sent and ("p" in sent or "n" in sent or "o" in sent or "c" in sent):
            clean_line=removeURLs(line["line"]).replace(CIT_MARKER,"")
            clean_line=removeACLCitations(clean_line)
            tokens.extend(tokenizeText(clean_line))
    tokens=[token for token in tokens if token not in punctuation]
    return tokens

示例#3

显示文件

文件： athar_corpus.py 项目： danieldmm/minerva

def getOutlinkContextAtharWindowOfWords(context, left, right):
    """
        Returns a window-of-words context: list of tokens
    """
    context_text="".join([line["line"] for line in context["lines"]])
    # remove URLS in text (normally footnotes and conversion erros)
    context_text=removeURLs(context_text)
    context_text=removeACLCitations(context_text)
    tokens=tokenizeText(context_text)
    tokens=[token for token in tokens if token not in punctuation]
    for index,token in enumerate(tokens):
        if token==CIT_MARKER:
            res=[]
            res.extend(tokens[index-left:index])
            res.extend(tokens[index+1:index+right+1])
            return res
    return None