Python Preprocessing.split_dataset示例

编程语言: Python

命名空间/包名称: classes.preprocessing

类/类型: Preprocessing

方法/功能: split_dataset

hotexamples.com的示例: 2

Python Preprocessing.split_dataset - 已找到2个示例。这些是从开源项目中提取的最受好评的classes.preprocessing.Preprocessing.split_dataset现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

Preprocessing(5)

run_pipeline(2)

split_dataset(2)

binariseImg(1)

encode_dataset_column(1)

get(1)

make_dictionary(1)

pad_text(1)

prepare_dataset(1)

shuffle(1)

splitChars(1)

string_to_int(1)

write_preprocessed_dataset(1)

示例#1

显示文件

文件： the_best_saved_model.py 项目： mellowpixel/DeepLearning

    prep.make_dictionary()
    
    # Encode all words with integer IDs
    # Encode only the most used words in the dataset, any other words encode as 0
    n_top_used_words = 10000
    dataset = prep.encode_dataset_column(df=dataset, field="review", use_top_words=n_top_used_words)

    # Encode target variables to binary representation
    dataset = prep.string_to_int(df=dataset, params={"sentiment": {'positive': 1, 'negative': 0}})

    # Pad all reviews, remove reviews that have no words, trim reviews that exceed the review_len value
    review_len = 500
    dataset = prep.pad_text(df=dataset, column="review_encoded", min_words=1, max_words=review_len)

    # Split the dataset into training, test and validation subsets
    train_s, test_s, valid_s = prep.split_dataset(training_r=0.5, test_r=0.3, validation_r=0.2, dataset=dataset)

    # Convert dataframe column to the numpy array
    X_train = np.array(train_s['review_encoded'].tolist())
    Y = np.array(train_s['sentiment'].tolist())

    X_eval = np.array(valid_s['review_encoded'].tolist())
    Yv = np.array(valid_s['sentiment'].tolist())

    X_test = np.array(test_s['review_encoded'].tolist())
    Yt = np.array(test_s['sentiment'].tolist())

    # ************************************************** #
    #                THE SIMPLE RNN MODEL                #
    # ************************************************** #

示例#2

显示文件

文件： embeddings.py 项目： mellowpixel/DeepLearning

    dataset = prep.string_to_int(
        df=dataset, params={"sentiment": {
            'positive': 1,
            'negative': 0
        }})

    # Pad all reviews, remove reviews that have no words, trim reviews that exceed the review_len value
    review_len = 500
    dataset = prep.pad_text(df=dataset,
                            column="review_encoded",
                            min_words=1,
                            max_words=review_len)

    # Split the dataset into training, test and validation subsets
    train_s, test_s, valid_s = prep.split_dataset(training_r=0.5,
                                                  test_r=0.3,
                                                  validation_r=0.2,
                                                  dataset=dataset)

    # Convert dataframe column to the numpy array
    X_train = np.array(train_s['review_encoded'].tolist())
    Y = np.array(train_s['sentiment'].tolist())

    X_eval = np.array(valid_s['review_encoded'].tolist())
    Yv = np.array(valid_s['sentiment'].tolist())

    X_test = np.array(test_s['review_encoded'].tolist())
    Yt = np.array(test_s['sentiment'].tolist())

    # ************************************************** #
    #              MODELS COMMON SETTINGS                #
    # ************************************************** #