Python CountVectorLanguageの例

プログラミング言語: Python

名前空間/パッケージ名: whatlies.language

hotexamples.comのコード掲載数: 7

Python CountVectorLanguage - 7件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのwhatlies.language.CountVectorLanguageの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

CountVectorLanguage(7)

fit_manual(2)

よく使われるメソッド

CountVectorLanguage (7)

fit_manual (2)

コード例 #1

ファイルを表示

def test_basic_docs_usage2():
    lang = CountVectorLanguage(n_components=2,
                               ngram_range=(1, 2),
                               analyzer="char")
    lang.fit_manual(
        ["pizza", "pizzas", "firehouse", "firehydrant", "cat", "dog"])
    embset = lang[[
        "piza", "pizza", "pizzaz", "fyrehouse", "firehouse", "fyrehidrant"
    ]]
    assert embset.to_dataframe().shape == (6, 2)

コード例 #2

ファイルを表示

ファイル: test_countvector_sklearn.py プロジェクト: mkaze/whatlies

def test_sklearn_feature_union_works():
    lang = CountVectorLanguage(n_components=2)
    X = [
        "i really like this post", "thanks for that comment",
        "i enjoy this friendly forum", "this is a bad post",
        "i dislike this article", "this is not well written"
    ]

    preprocess = FeatureUnion([("dense", lang), ("sparse", CountVectorizer())])

    assert preprocess.fit_transform(X).shape[0] == 6

コード例 #3

ファイルを表示

ファイル: test_countvector_sklearn.py プロジェクト: mkaze/whatlies

def test_sklearn_pipeline_works(components):
    lang = CountVectorLanguage(n_components=components)
    pipe = Pipeline([("embed", lang), ("model", LogisticRegression())])

    X = [
        "i really like this post", "thanks for that comment",
        "i enjoy this friendly forum", "this is a bad post",
        "i dislike this article", "this is not well written"
    ]
    y = np.array([1, 1, 1, 0, 0, 0])

    pipe.fit(X, y)
    assert pipe.predict(X).shape[0] == 6

コード例 #4

ファイルを表示

from whatlies.language import (
    FasttextLanguage,
    CountVectorLanguage,
    SpacyLanguage,
    GensimLanguage,
    BytePairLanguage,
    TFHubLanguage,
    ConveRTLanguage,
    HFTransformersLanguage,
)


backends = [
    SpacyLanguage("tests/custom_test_lang/"),
    FasttextLanguage("tests/custom_fasttext_model.bin"),
    CountVectorLanguage(n_components=10),
    BytePairLanguage("en"),
    GensimLanguage("tests/cache/custom_gensim_vectors.kv"),
    ConveRTLanguage(),
    HFTransformersLanguage("sshleifer/tiny-gpt2", framework="tf"),
    TFHubLanguage("https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"),
]


@pytest.mark.parametrize("lang", backends)
def test_sklearn_pipeline_works(lang):
    pipe = Pipeline([("embed", lang), ("model", LogisticRegression())])

    X = [
        "i really like this post",
        "thanks for that comment",

コード例 #5

ファイルを表示

ファイル: test_countvector_lang.py プロジェクト: cirrushuet/whatlies

def lang():
    lang = CountVectorLanguage(n_components=3,
                               ngram_range=(1, 2),
                               analyzer="char")
    return lang.fit_manual(
        ["pizza", "pizzas", "firehouse", "firehydrant", "cat", "dog"])

コード例 #6

ファイルを表示

ファイル: app.py プロジェクト: vishnupriyavr/rasalit

        step=0.01,
    )
    reduction = Umap(2, n_neighbors=n_neighbors, min_dist=min_dist)
else:
    reduction = Pca(2)

st.markdown("# Simple Text Clustering")
st.markdown(
    "Let's say you've gotten a lot of feedback from clients on different channels. You might like to be able to distill main topics and get an overview. It might even inspire some intents that will be used in a virtual assistant!"
)
st.markdown(
    "This tool will help you discover them. This app will attempt to cluster whatever text you give it. The chart will try to clump text together and you can explore underlying patterns."
)

if method == "CountVector SVD":
    lang = CountVectorLanguage(n_svd, ngram_range=(min_ngram, max_ngram))
    embset = lang[texts]
if method == "Lite Sentence Encoding":
    embset = EmbeddingSet(
        *[
            Embedding(t, v)
            for t, v in zip(texts, calculate_embeddings(texts, encodings=encodings))
        ]
    )

p = (
    embset.transform(reduction)
    .plot_interactive(annot=False)
    .properties(width=500, height=500, title="")
)

コード例 #7

ファイルを表示

def test_basic_docs_usage1():
    lang = CountVectorLanguage(n_components=2, ngram_range=(1, 2), analyzer="char")
    embset = lang[['pizza', 'pizzas', 'firehouse', 'firehydrant']]
    assert embset.to_dataframe().shape == (4, 2)