Exemplo n.º 1
0
"""
### Applying the hashing trick to an integer categorical feature

If you have a categorical feature that can take many different values (on the order of
10e3 or higher), where each value only appears a few times in the data,
it becomes impractical and ineffective to index and one-hot encode the feature values.
Instead, it can be a good idea to apply the "hashing trick": hash the values to a vector
of fixed size. This keeps the size of the feature space manageable, and removes the need
for explicit indexing.
"""

# Sample data: 10,000 random integers with values between 0 and 100,000
data = np.random.randint(0, 100000, size=(10000, 1))

# Use the Hashing layer to hash the values to the range [0, 64]
hasher = preprocessing.Hashing(num_bins=64, salt=1337)

# Use the CategoryEncoding layer to one-hot encode the hashed values
encoder = preprocessing.CategoryEncoding(max_tokens=64, output_mode="binary")
encoded_data = encoder(hasher(data))
print(encoded_data.shape)
"""
### Encoding text as a sequence of token indices

This is how you should preprocess text to be passed to an `Embedding` layer.
"""

# Define some text data to adapt the layer
data = tf.constant([
    "The Brain is wider than the Sky",
    "For put them side by side",
Exemplo n.º 2
0
 def input_layer(self):
     return preprocessing.Hashing(**self.feature_params)(self.inputs)