Python Corpus.Corpus示例

编程语言: Python

命名空间/包名称: textcluster

类/类型: Corpus

方法/功能: Corpus

hotexamples.com的示例: 3

Python Corpus.Corpus - 已找到3个示例。这些是从开源项目中提取的最受好评的textcluster.Corpus.Corpus现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

Corpus(3)

add(3)

cluster(3)

示例#1

显示文件

文件： tests.py 项目： peterbe/textcluster

def test_cluster():
    c = Corpus(similarity=0.1)
    for doc in docs:
        c.add(doc)

    groups = c.cluster()

    eq_(len(groups), 2)
    eq_(len(groups[0].similars), 1)
    eq_(len(groups[1].similars), 1)

示例#2

显示文件

def cluster_queryset(qs):
    seen = {}
    c = Corpus(similarity=SIM_THRESHOLD, stopwords=STOPWORDS)

    for op in qs:

        if op.description in seen:
            continue

        # filter short descriptions
        if len(op.description) < 15:
            continue

        seen[op.description] = 1
        c.add(op, str=op.description, key=op.id)

    return c.cluster()

示例#3

显示文件

文件： run.py 项目： garyfub/grouperfish

def process(inStream,
            outStream,
            fields={
                "id": "id",
                "text": "text"
            },
            limits={
                "clusters": 10,
                "top_documents": 10
            }):
    all = {}

    text_field = fields["text"]
    key_field = fields["id"]
    max_clusters = limits["clusters"]
    max_top_docs = limits["top_documents"]

    c = Corpus()
    for line in inStream:
        data = line.split('\t', 1)[1]
        doc = json.loads(data.decode("utf8"))
        key = doc[key_field]
        all[key] = doc
        text = c.add((key, doc[text_field]), key=key)

    clusters = c.cluster()
    results = []
    for c in clusters[:max_clusters]:
        tophits = [c.primary]
        tophits += [hit["object"] for hit in c.similars[:max_top_docs - 1]]
        topdocs = []
        for (key, text) in tophits:
            topdocs.append(all[key])
        results.append({"top_documents": topdocs})

    json.dump({"clusters": results}, outStream)