from tensorflow.keras.preprocessing.text import Tokenizer tokenizer = Tokenizer() text = ["This is the first sentence.", "This is the second sentence."] tokenizer.fit_on_texts(text) encoded_text = tokenizer.texts_to_sequences(text) print(encoded_text)
[[1, 2, 3, 4, 5], [1, 2, 6, 5]]
from tensorflow.keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=100) text = ["This is the first sentence.", "This is the second sentence."] tokenizer.fit_on_texts(text) encoded_text = tokenizer.texts_to_sequences(text) print(encoded_text)
[[1, 2, 3, 4, 5], [1, 2, 6, 5]]
from tensorflow.keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=100) text = ["This is the first sentence.", "This is the second sentence."] tokenizer.fit_on_texts(text) new_text = ["This is the third sentence."] encoded_new_text = tokenizer.texts_to_sequences(new_text) print(encoded_new_text)
[[1, 2, 7, 5]]In this example, we use the same tokenizer as in the previous example, but we convert a new sentence to a sequence of numerical values using the `texts_to_sequences` method. Overall, the `Tokenizer` module in TensorFlow's Keras Preprocessing library is a useful tool for preprocessing text data for use in neural networks.