from tensorflow.keras.preprocessing.text import Tokenizer # define some sample text data samples = ['This is the first sentence', 'And here is the second sentence'] # create a tokenizer object tokenizer = Tokenizer() # build a vocabulary index from the text data tokenizer.fit_on_texts(samples) # print the indexed vocabulary print(tokenizer.word_index) # prints: {'sentence': 1, 'this': 2, 'is': 3, 'the': 4, 'first': 5, 'and': 6, 'here': 7, 'second': 8}
from tensorflow.keras.preprocessing.text import Tokenizer # some example text data samples = ['This is one sentence.', 'This is another sentence.', 'This is a third sentence.'] # create a tokenizer object tokenizer = Tokenizer() # fit tokenizer on text data tokenizer.fit_on_texts(samples) # encode new text data using the tokenizer's vocabulary index new_samples = ['This is a new sentence.'] encoded_new_samples = tokenizer.texts_to_sequences(new_samples) # print encoded new text data print(encoded_new_samples) # prints: [[2, 3, 4, 9]]In this example, we first call the `fit_on_texts` method to build an indexed vocabulary from the text data. We then encode new text data using the `texts_to_sequences` method, which uses the vocabulary index built with `fit_on_texts`. The resulting encoded data shows the indexes of each word in the new sentence according to the vocabulary index. Overall, `fit_on_texts` is a useful function in the `tensorflow.keras.preprocessing.text` package for building vocabulary indexes from text data and encoding new data based on that vocabulary.