Python TMPreproc.save_state Examples

Programming Language: Python

Namespace/Package Name: tmtoolkit.preprocess

Class/Type: TMPreproc

Method/Function: save_state

Examples at hotexamples.com: 2

Python TMPreproc.save_state - 2 examples found. These are the top rated real world Python examples of tmtoolkit.preprocess.TMPreproc.save_state extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

TMPreproc(11)

tokenize(6)

pos_tag(3)

from_state(2)

save_state(2)

remove_common_tokens(2)

get_dtm(2)

tokens_to_lowercase(2)

filter_for_token(1)

copy(1)

add_stopwords(1)

stem(1)

clean_tokens(1)

remove_uncommon_tokens(1)

remove_tokens(1)

remove_special_chars_in_tokens(1)

remove_documents_by_name(1)

remove_chars_in_tokens(1)

from_tokens(1)

expand_compound_tokens(1)

lemmatize(1)

get_tokens(1)

add_special_chars(1)

filter_for_pos(1)

generate_ngrams(1)

from_tokens_datatable(1)

get_kwic_table(1)

Example #1

Show file

File: read_preproc_lda_de.py Project: petershan1119/tmtoolkit

        preproc = TMPreproc(corpus.docs, language=u'german')
        print('tokenizing...')
        preproc.tokenize()
        print('POS tagging...')
        preproc.pos_tag()
        print('lemmatization...')
        preproc.lemmatize()
        print('lowercase transform...')
        preproc.tokens_to_lowercase()
        print('cleaning...')
        preproc.clean_tokens()

        proc_time = time.time() - start_time
        print('-- processing took %f sec. so far' % proc_time)

        preproc.save_state('data/read_preproc_lda_de_state.pickle')

        print('token samples:')
        for dl, tokens in preproc.tokens_with_pos_tags.items():
            print("> %s:" % dl)
            print(">>", sample(tokens, 10))

        print('generating DTM...')
        doc_labels, vocab, dtm = preproc.get_dtm()

        print("saving DTM data to pickle file '%s'..." % DTM_PICKLE)
        save_dtm_to_pickle(dtm, vocab, doc_labels, DTM_PICKLE)

    print("running LDA...")
    model = lda.LDA(n_topics=30, n_iter=500)
    model.fit(dtm)

Example #2

Show file

File: benchmark_preproc.py Project: jonaschn/tmtoolkit

preproc.expand_compound_tokens()
add_timing('expand_compound_tokens')

preproc.pos_tag()
add_timing('pos_tag')

preproc.lemmatize()
add_timing('lemmatize')

preproc_copy = preproc.copy()
preproc_copy.shutdown_workers()
del preproc_copy
add_timing('copy')

_, statepickle = mkstemp('.pickle')
preproc.save_state(statepickle)
add_timing('save_state')

preproc_copy = TMPreproc.from_state(statepickle)
preproc_copy.shutdown_workers()
del preproc_copy
add_timing('from_state')

preproc_copy = TMPreproc.from_tokens(preproc.tokens_with_metadata,
                                     language='en')
preproc_copy.shutdown_workers()
del preproc_copy
add_timing('from_tokens')

preproc_copy = TMPreproc.from_tokens_datatable(preproc.tokens_datatable,
                                               language='en')