Python DataReader.process_data примеры использования

Язык программирования: Python

Пространство имен/Пакет: datareader

Класс/Тип: DataReader

Метод/Функция: process_data

Примеров на hotexamples.com: 2

Python DataReader.process_data - 2 примера найдено. Это лучшие примеры Python кода для datareader.DataReader.process_data, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

DataReader(25)

get_data_size(4)

get_nyms(3)

get_data(3)

get_ratings(3)

nym_count(3)

process_data(2)

read_data(2)

next(2)

start(1)

load_cards_text(1)

load_dictionary(1)

load_from_csv(1)

load_settings(1)

readElementsData(1)

read(1)

shuffle_data(1)

readGlobalData(1)

read_correlation_data(1)

read_sleep_data(1)

read_text(1)

reset_pointer(1)

save_settings(1)

init(1)

get_pretrained_emb(1)

heart_rate_special(1)

get_btc(1)

create_reference(1)

create_zeroes(1)

getPoints(1)

getRawData(1)

getVOC07TestData(1)

getVOC07TrainData(1)

get_Rtilde(1)

get_Rvar(1)

get_eid2idx(1)

close(1)

get_eid2name(1)

get_entity_pos(1)

get_gt_set(1)

get_keywords(1)

get_lam(1)

get_next(1)

get_nym_stats(1)

get_original_seeds(1)

store_data(1)

Пример #1

Показать файл

Файл: word2vec.py Проект: wmarinho-uff/Word2vec-pt

def process_text_data(file_path, vocab_size):
    """
    This function is responsible for preprocessing the text data we will use to
    train our model. It will perform the following steps:

    * Create an word array for the file we have received. For example, if our
      text is:

        'I want to learn wordvec to do cool stuff'

    It will produce the following array:

        ['I', 'want', 'to', 'learn', 'wordvec', 'to', 'do', 'cool', 'stuff']

    * Create the frequency count for every word in our array:

       [('I', 1), ('want', 1), ('to', 2), ('learn', 1), ('wordvec', 1),
        ('do', 1), ('cool', 1), ('stuff', 1)]

    * With the count array, we choose as our vocabulary the words with the
      highest count. The number of words will be decided by the variable
      vocab_size.

    * After that we will create a dictionary to map a word to an index and an
      index to a word:

      index2word: {0: 'I', 1: 'want', 2: 'to', 3: 'learn', 4: 'wordvec',
                   5: 'do', 6: 'cool', 7: 'stuff'}
      word2index: {'I': 0, 'want': 1, 'to': 2, 'learn': 3, 'wordvec': 4,
                   'do': 5, 'cool': 6, 'stuff': 7}

      Both of these dictionaries are based on the words provided by the count
      array.

    * Finally, we will transform the words array to a number array, using the
      word2vec dictionary.

      Therefore, our words array:

      ['I', 'want', 'to', 'learn', 'wordvec', 'to', 'do', 'cool', 'stuff']

      Will be translated to:

      [0, 1, 2, 3, 4, 2, 5, 6, 7]

      If a word is not present in the word2index array, it will be considered an
      unknown word. Every unknown word will be mapped to the same index.
    """
    my_data = DataReader(file_path)
    my_data.process_data(vocab_size)
    return my_data

Пример #2

Показать файл

Файл: word2vec_test.py Проект: wmarinho-uff/Word2vec-pt

    def test_run_training(self):
        """
        Test to check if the read_text function
        return a list of words given a txt file.
        """
        my_data = DataReader(get_path_basic_corpus())
        my_vocab_size = 500
        my_data.process_data(my_vocab_size)
        my_config = wv.Config(num_steps=200,
                              vocab_size=my_vocab_size,
                              show_step=2)

        my_model = wv.SkipGramModel(my_config)
        duration, loss = wv.run_training(my_model,
                                         my_data,
                                         verbose=False,
                                         visualization=False,
                                         debug=True)
        self.assertTrue(duration <= 1.7)
        self.assertTrue(loss < 7)