Python VersionHandler.calc_hashes 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: VersionHandler

클래스/타입: VersionHandler

메소드/함수: calc_hashes

hotexamples.com에서의 예제들: 1

Python VersionHandler.calc_hashes - 1개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 VersionHandler.VersionHandler.calc_hashes에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

VersionHandler(3)

calc_hashes(1)

calculate_similarities(1)

create(1)

datestr(1)

increment(1)

print_version(1)

read(1)

touch(1)

version(1)

write(1)

write_hpp(1)

예제 #1

파일 보기

    ### Tokenize, remove stopwords, save the result ###
    # preprocesser = Preprocesser()
    # preprocesser.tokenize(corpus, remove_stopwords=False)
    # corpus_tokenized = preprocesser.corpus_tokenized
    # pickle.dump(corpus_tokenized, open('resources/corpus_300k_filtered_tokenized_with_stopwords_cs.c', 'wb'))
    # save_file(corpus_tokenized, "corpus_300k_filtered_tokenized_with_stopwords_cs")
    # save_file(corpus_tokenized, "corpus_10k_test")

    corpus_tokenized = pickle.load(
        open(
            "/home/nsaef/projects/CollectionExplorer/web/CollectionExplorer/static/CollectionExplorer/corpora/12/12_tokens_stopwords-included_cs.corpus",
            "rb"))

    ##### Versioning and Duplicates #####
    version_handler = VersionHandler()
    version_handler.calc_hashes(corpus_tokenized)
    candidates = version_handler.calculate_similarities()

    ##### Topic Modelling #####

    # ### Vectorize the corpus using raw frequencies for lda ###
    # processer_rf = Preprocesser()
    # corpus_rf = processer_rf.vectorize_frequencies(corpus)
    # feature_names = processer_rf.feature_names_raw

    # ### Create topic models using LDA ###
    # lda = TopicModeller(n_topics=30)
    # lda.create_topic_models(corpus_rf, feature_names)
    # topics = lda.documents_per_topic(corpus_rf, corpus)
    # lda.print_top_words(feature_names, n_top_words=20, collection=topics)