Python Preprocessing.run_pipelineの例

プログラミング言語: Python

名前空間/パッケージ名: classes.preprocessing

クラス/型: Preprocessing

メソッド/関数: run_pipeline

hotexamples.comのコード掲載数: 2

Python Preprocessing.run_pipeline - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのclasses.preprocessing.Preprocessing.run_pipelineの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Preprocessing(5)

run_pipeline(2)

split_dataset(2)

binariseImg(1)

encode_dataset_column(1)

get(1)

make_dictionary(1)

pad_text(1)

prepare_dataset(1)

shuffle(1)

splitChars(1)

string_to_int(1)

write_preprocessed_dataset(1)

コード例 #1

ファイルを表示

path = "preprocessed_datasets/final_other_dataset.csv"

print("")
print("PREPROCESSING:")
print("")
''' FAKE NEWS DATASET '''

print("INPUT:")
print("(TYPE: ", type(fake), ")")
print(fake.head(10))

preprocesser_fake = Preprocessing(
    fake, date, time, analysis=analysis, news_type="fake",
    language="es")  # here you can set the configuration
data_fake = preprocesser_fake.run_pipeline()
print("")
print("FINAL OUTPUT:")

if preprocesser_fake.aggregation:
    print("(TYPE: ", type(data_fake.aggregated), ")")
    print(data_fake.aggregated)

else:
    print("(TYPE: ", type(data_fake.docvectors), ")")
    print(data_fake.docvectors)
''' REAL NEWS DATASET '''

print("INPUT:")
print("(TYPE: ", type(true), ")")
print(true.head(10))

コード例 #2

ファイルを表示

ファイル: main_use_preprocessed.py プロジェクト: iamgiolaga/fakedetector

    analysis="text",
    news_type="generated",
    duplicate_rows_removal=False,
    lowercasing=False,
    tokenization=False,
    lemmatization=False,
    noise_removal=False,
    stemming=False,
    stopword_removal=False,
    entity_recognition=False,
    data_augmentation=False,
    word2vec=True,
    doc2vec=False,
    aggregation=True)  # here you can set the configuration

gen = pp_generated.run_pipeline()
dataframe = pd.DataFrame(gen.aggregated, columns=["text"])
dataframe["membership"] = generated["membership"]
dataset = pp_generated.shuffle(dataframe).reset_index()
dataset.columns = ["old index", "text", "membership"]
dataset.index.name = "index"

cardinality = len(dataset)

outdir = "generated_datasets/" + date + "_" + time
outname = "generated_dataset_" + str(cardinality) + ".csv"

if not os.path.exists(outdir):
    os.makedirs(outdir)

fullname = os.path.join(outdir, outname)