Python HierarchicalClustering.HierarchicalClusteringの例

プログラミング言語: Python

名前空間/パッケージ名: htmresearch.algorithms.hierarchical_clustering

メソッド/関数: HierarchicalClustering

hotexamples.comのコード掲載数: 1

Python HierarchicalClustering.HierarchicalClustering - 1件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのhtmresearch.algorithms.hierarchical_clustering.HierarchicalClustering.HierarchicalClusteringの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

_computeOverlaps(2)

_extractVectorsFromKNN(2)

HierarchicalClustering(1)

_condensedIndex(1)

_getPrototypes(1)

cluster(1)

getClusterPrototypes(1)

コード例 #1

ファイルを表示

def runExperiment(args):
    if not os.path.exists(SAVE_PATH):
        os.makedirs(SAVE_PATH)

    (trainingDataDup, labelRefs, documentCategoryMap,
     documentTextMap) = readDataAndReshuffle(args)

    # remove duplicates from training data
    includedDocIds = set()
    trainingData = []
    for record in trainingDataDup:
        if record[2] not in includedDocIds:
            includedDocIds.add(record[2])
            trainingData.append(record)

    args.networkConfig = getNetworkConfig(args.networkConfigPath)
    model = createModel(numLabels=1, **vars(args))
    model = trainModel(args, model, trainingData, labelRefs)

    numDocs = model.getClassifier()._numPatterns

    print "Model trained with %d documents" % (numDocs, )

    knn = model.getClassifier()
    hc = HierarchicalClustering(knn)

    hc.cluster("complete")
    protos, clusterSizes = hc.getClusterPrototypes(args.numClusters, numDocs)

    # Run test to ensure consistency with KNN
    if args.knnTest:
        knnTest(protos, knn)
        return

    # Summary statistics
    # bucketCounts[i, j] is the number of occurrances of bucket j in cluster i
    bucketCounts = numpy.zeros((args.numClusters, len(labelRefs)))

    for clusterId in xrange(len(clusterSizes)):
        print
        print "Cluster %d with %d documents" % (clusterId,
                                                clusterSizes[clusterId])
        print "==============="

        prototypeNum = 0
        for index in protos[clusterId]:
            if index != -1:
                docId = trainingData[index][2]
                prototypeNum += 1
                display = prototypeNum <= args.numPrototypes

                if display:
                    print "(%d) %s" % (docId, trainingData[index][0])
                    print "Buckets:"

                # The docId keys in documentCategoryMap are strings rather than ints
                if docId in documentCategoryMap:
                    for bucketId in documentCategoryMap[docId]:
                        bucketCounts[clusterId, bucketId] += 1
                        if display:
                            print "    ", labelRefs[bucketId]
                elif display:
                    print "    <None>"
                if display:
                    print "\n\n"

    createBucketClusterPlot(args, bucketCounts)
    create2DSVDProjection(args, protos, trainingData, documentCategoryMap, knn)