Python Tokenizer.text_to_matrix 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: keras.preprocessing.text

클래스/타입: Tokenizer

메소드/함수: text_to_matrix

hotexamples.com에서의 예제들: 3

Python Tokenizer.text_to_matrix - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 keras.preprocessing.text.Tokenizer.text_to_matrix에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Tokenizer(30)

fit_on_texts(30)

texts_to_sequences(30)

sequences_to_texts(30)

sequences_to_matrix(30)

word_index(30)

fit_on_sequences(18)

num_words(16)

to_json(16)

filters(16)

texts_to_matrix(14)

texts_to_sequences_generator(11)

__init__(8)

text_to_sequences(7)

fit(6)

transform(6)

index_docs(5)

word_counts(4)

index_word(4)

word_docs(4)

get_feature_names(3)

text_to_matrix(3)

tabeled_tokens_to_matrix(1)

add_row(1)

texts_to_sequence(1)

vocabulary_size(1)

textsToSequences(1)

text_to_sequence(1)

encode(1)

create_tokenizer(1)

do_caps(1)

save(1)

extend(1)

process(1)

proc_text(1)

num_word(1)

num_tokens(1)

labelizeTweets(1)

get_index(1)

get_config(1)

fit_transform(1)

document_count(1)

fitToText(1)

py(1)

예제 #1

파일 보기

def tokenizing_and_vocabulary(train_posts, test_posts, train_tags, test_tags):
    # 20 news groups
    num_labels = 20
    vocab_size = 15000
    batch_size = 100

    #define Tokenizer with Vocab size
    tokenizer = Tokenizer(num_words=vocab_size)
    tokenizer.fit_on_texts(train_posts)

    x_train = tokenizer.text_to_matrix(train_posts, mode='tfidf')
    x_text = tokenizer.text_to_matrix(test_posts, mode='tfidf')

    encoder = LabelBinarizer()
    encoder.fit(train_tags)
    y_train = encoder.transform(train_tags)
    y_test = encoder.transform(test_tags)

예제 #2

파일 보기

from keras.preprocessing.text import Tokenizer

samples = ['I study at CityU', 'I study at CityU at Seattle']

tokenizer = Tokenizer(num_words=1000)

tokenizer.fit_on_texts(samples)

sequences = tokenizer.text_to_sequences(samples)

one_hot_results = tokenizer.text_to_matrix(samples, mode='binary')

word_index = tokenizer.word_index
print('Found %s unique tokesn: ' % len(word_index))
print('Sequences: ', sequences, '\n')
print('word_index: ', tokenizer.word_index)

예제 #3

파일 보기

파일: myw2vload.py 프로젝트: fangzheng354/myPractice

import keras.preprocessing.text as T
from keras.preprocessing.text import Tokenizer

text1='some thing to eat'
text2='some thing to drink'
texts=[text1,text2]

print T.text_to_word_sequence(text1)  #['some', 'thing', 'to', 'eat']
print T.one_hot(text1,10)  #[7, 9, 3, 4]
print T.one_hot(text2,10)  #[7, 9, 3, 1]

tokenizer = Tokenizer(num_words=10)
tokenzier.fit_on_text(texts)
print tokenizer.word_count #[('some', 2), ('thing', 2), ('to', 2), ('eat', 1), ('drink', 1)]
print tokenizer.word_index #{'some': 1, 'thing': 2,'to': 3 ','eat': 4, drink': 5}
print tokenizer.word_docs #{'some': 2, 'thing': 2, 'to': 2, 'drink': 1,  'eat': 1}
print tokenizer.index_docs #{1: 2, 2: 2, 3: 2, 4: 1, 5: 1}

print tokenizer.text_to_sequences(texts) #[[1, 2, 3, 4], [1, 2, 3, 5]]
print tokenizer.text_to_matrix(texts) #
[[ 0.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
 [ 0.,  1.,  1.,  1.,  0.,  1.,  0.,  0.,  0.,  0.]]

import keras.preprocessing.sequence as S
S.pad_sequences([[1,2,3]],10,padding='post')
--------------------- 
作者：vivian_ll 
来源：CSDN 
原文：https://blog.csdn.net/vivian_ll/article/details/80795139 
版权声明：本文为博主原创文章，转载请附上博文链接！