Python MinHash.count 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: datasketch

클래스/타입: MinHash

메소드/함수: count

hotexamples.com에서의 예제들: 3

Python MinHash.count - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 datasketch.MinHash.count에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

MinHash(30)

jaccard(30)

update(30)

digest(20)

update_batch(5)

count(3)

hashvalues(3)

merge(3)

clear(2)

update_with_intval(1)

예제 #1

파일 보기

파일: sarni_minhash.py 프로젝트: mattsarn/MinHash

def estimateDistinctElementParallel(listOfItems, num_perm):
    """Same as above, except here we have a nested for loop to iterate through the 
       lists in the list. This function will also append the estimation result 
       to a list for use in the following accuracy function."""
    h = MinHash(num_perm)
    for item in listOfItems:
        for i in item:  # nested for loop to iterate over lists within a list
            h.digest(sha1(i.encode('utf8')))
    estimate.append(h.count())
    print("Estimated number of elements: ", h.count())

예제 #2

파일 보기

파일: sarni_minhash.py 프로젝트: mattsarn/MinHash

def estimateDistinctElements(items, num_perm):
    """This function will estimate the number of distinct elements in a list.
       The default number of hash function permutations is num_perm(128), but 
       I asjusted after researching more-
       http://blog.cluster-text.com/tag/minhash/"""
    h = MinHash(num_perm)  # creates a minhash object with the parameter 
    for item in items:     # being the number of hash permutations
        h.digest(sha1(item.encode('utf8')))  # digests the minhash signatures 
    print("Estimated number of elements: ", h.count())

예제 #3

파일 보기

def minHash_bml(SX, SY):
    print()
    print("MinHash BML")

    l = 32
    m = 8
    num_perm = pow(2, m)
    error = pow(10, -5)

    print("Number of permutations is ", num_perm)

    m1 = MinHash(num_perm)
    m2 = MinHash(num_perm)

    for d in SX:
        m1.update(d.encode('utf8'))
    for d in SY:
        m2.update(d.encode('utf8'))

    nx = m1.count()
    ny = m2.count()
    print("Estimated nx is ", nx)
    print("Estimated ny is ", ny)

    Vx = m1.digest()
    Vy = m2.digest()

    z = 0
    for i in range(0, num_perm):
        if Vx[i] >= Vy[i]:
            z = z + 1
    P = z / num_perm

    print("P is: ", P)
    print("Inclusion Coefficient: ",
          lookup(P, 0, min(nx, ny), nx, ny, error, m, num_perm, l, 0, 0))

    return