Python ImdbDataHandler.get_data Examples

Programming Language: Python

Namespace/Package Name: datahandlers

Class/Type: ImdbDataHandler

Method/Function: get_data

Examples at hotexamples.com: 1

Python ImdbDataHandler.get_data - 1 examples found. These are the top rated real world Python examples of datahandlers.ImdbDataHandler.get_data extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

get_data(1)

to_sentence_vectors(1)

Example #1

Show file

File: prepare_imdb_new.py Project: alfredolainez/deep-text-classification

    log('Building word vectors from {}'.format(IMDB_WV_FILE))
    gb = GloVeBox(IMDB_WV_FILE)
    gb.build(zero_token=True, normalize_variance=False, normalize_norm=True)

    log('Building global word vectors from {}'.format(GLOBAL_WV_FILE))
    global_gb = GloVeBox(GLOBAL_WV_FILE)
    global_gb.build(zero_token=True, normalize_variance=False, normalize_norm=True)

    log('writing GloVeBox pickle...')
    pickle.dump(gb, open(IMDB_WV_FILE.replace('.txt', '-glovebox.pkl'), 'wb'), pickle.HIGHEST_PROTOCOL)
    pickle.dump(global_gb, open(GLOBAL_WV_FILE.replace('.txt', '-glovebox.pkl'), 'wb'), pickle.HIGHEST_PROTOCOL)

    log('Load data from original source')
    imdb = ImdbDataHandler(source=IMDB_DATA)
    (train_reviews, train_labels) = imdb.get_data(type=ImdbDataHandler.DATA_TRAIN)
    (test_reviews, test_labels) = imdb.get_data(type=ImdbDataHandler.DATA_TEST)

    log('Converting to sentences: global word vectors')
    train_global_wvs_reviews = imdb.to_sentence_vectors(train_reviews, SENTENCES_PER_PARAGRAPH,
                                                    WORDS_PER_SENTENCE, global_gb)
    test_global_wvs_reviews = imdb.to_sentence_vectors(test_reviews, SENTENCES_PER_PARAGRAPH,
                                                   WORDS_PER_SENTENCE, global_gb)

    log('Converting to sentences: only imdb word vectors')
    train_imdb_wvs_reviews = imdb.to_sentence_vectors(train_reviews, SENTENCES_PER_PARAGRAPH,
                                                    WORDS_PER_SENTENCE, gb)
    test_imdb_wvs_reviews = imdb.to_sentence_vectors(test_reviews, SENTENCES_PER_PARAGRAPH,
                                                   WORDS_PER_SENTENCE, gb)

    # -- training data save