Python GapEncoder.get_feature_names示例

编程语言: Python

命名空间/包名称: dirty_cat

类/类型: GapEncoder

方法/功能: get_feature_names

hotexamples.com的示例: 3

Python GapEncoder.get_feature_names - 已找到3个示例。这些是从开源项目中提取的最受好评的dirty_cat.GapEncoder.get_feature_names现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

GapEncoder(16)

fit(6)

fit_transform(5)

transform(4)

get_feature_names(3)

get_feature_names_out(2)

partial_fit(2)

score(1)

示例#1

显示文件

文件： test_gap_encoder.py 项目： LilianBoulard/dirty_cat

def test_get_feature_names(n_samples=70):
    X_txt = fetch_20newsgroups(subset='train')['data'][:n_samples]
    X = np.array([X_txt, X_txt]).T
    enc = GapEncoder(random_state=42)
    enc.fit(X)
    topic_labels = enc.get_feature_names()
    # Check number of labels
    assert len(topic_labels) == enc.n_components * X.shape[1]
    # Test different parameters for col_names
    topic_labels_2 = enc.get_feature_names(col_names='auto')
    assert topic_labels_2[0] == 'col0: ' + topic_labels[0]
    topic_labels_3 = enc.get_feature_names(col_names=['abc', 'def'])
    assert topic_labels_3[0] == 'abc: ' + topic_labels[0]
    return

示例#2

显示文件

def test_get_feature_names(n_samples=70):
    X_txt = fetch_20newsgroups(subset='train')['data']
    X = X_txt[:n_samples]
    enc = GapEncoder()
    enc.fit(X)
    topic_labels = enc.get_feature_names()
    # Check number of labels
    assert len(topic_labels) == enc.n_components
    return

示例#3

显示文件

文件： 04_feature_interpretation_gap_encoder.py 项目： patelashutosh/dirty_cat

X_enc = enc.fit_transform(X_dirty)
print(f'Shape of encoded vectors = {X_enc.shape}')

################################################################################
# Interpreting encoded vectors
# ----------------------------
#
# The GapEncoder can be understood as a continuous encoding on a set of latent
# topics estimated from the data. The latent topics are built by
# capturing combinations of substrings that frequently co-occur, and encoded
# vectors correspond to their activations.
# To interpret these latent topics, we select for each of them a few labels
# from the input data with the highest activations.
# In the example below we select 3 labels to summarize each topic.

topic_labels = enc.get_feature_names(n_labels=3)
for k in range(len(topic_labels)):
    labels = topic_labels[k]
    print(f'Topic n°{k}: {labels}')

################################################################################
# As expected, topics capture labels that frequently co-occur. For instance,
# the labels *firefighter*, *rescuer*, *rescue* appear together in
# *Firefigther/Rescuer III*, or *Fire/Rescue Lieutenant*.
#
# This enables us to understand the encoding of different samples

import matplotlib.pyplot as plt
encoded_labels = enc.transform(X_dirty[:20])
plt.figure(figsize=(8, 10))
plt.imshow(encoded_labels)