Python DataManager.complete_preprocessing_flow Examples

Programming Language: Python

Namespace/Package Name: data_manager

Class/Type: DataManager

Method/Function: complete_preprocessing_flow

Examples at hotexamples.com: 3

Python DataManager.complete_preprocessing_flow - 3 examples found. These are the top rated real world Python examples of data_manager.DataManager.complete_preprocessing_flow extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

DataManager(30)

destination_data(5)

add_user(3)

__init__(3)

complete_preprocessing_flow(3)

create_table(2)

connect(2)

data(2)

download_data(1)

download_all(1)

destination(1)

data_sheet_update(1)

dequeue(1)

define_path(1)

email_data(1)

data_destination(1)

create_train_test_jsonfile(1)

dump(1)

enough_data(1)

encode(1)

find_data(1)

save_model(1)

load_datasets(1)

init_cache_soft_db(1)

get_events_for_view(1)

flow_test(1)

flow(1)

filter_by_range(1)

create_payout_record(1)

filter_by_list(1)

fetch_data(1)

feature_engineering(1)

feat_type(1)

feat_name(1)

export_csv(1)

create_pdf(1)

convert_section_to_z(1)

create_instances_from_data(1)

c(1)

X(1)

_choose_op_triplet(1)

_generate_all_train_batches(1)

_get_masked_image(1)

_index_to_labels(1)

addProperty(1)

add_iata_code_to_sheet(1)

add_iata_to_google_sheet(1)

add_new_user(1)

airport_model(1)

appendData(1)

Example #1

Show file

File: ex3_208965814_311272264.py Project: Rotem-Lev-Lehman/NLP_Ex3

def predict(m, fn):
    """ returns a list of 0s and 1s, corresponding to the lines in the specified file.

    :param m: the trained model
    :type m: BaseClassifier
    :param fn: the full path to a file in the same format as the test set
    :type fn: str
    :return: a list of 0s and 1s, corresponding to the lines in the specified file
    :rtype: list
    """
    dm_test = DataManager(fn, is_train=False, algorithm_name=m.clf_name)
    dm_test.run_first_preprocessing_flow()
    X_test, _ = dm_test.complete_preprocessing_flow()
    return m.predict(X_test)

Example #2

Show file

File: ex3_208965814_311272264.py Project: Rotem-Lev-Lehman/NLP_Ex3

def train_best_model():
    """ training a XGBoost classifier from scratch with it's best hyper-parameters.

    :return: a trained XGBoost classifier built with the best performing hyper-parameters.
    :rtype: XGBoostClassifier
    """
    clf = XGBoostClassifier()
    clf.set_best_hyper_parameters()  # sets the best hyper-parameters that were found in the optimization stage.
    dm_train = DataManager('trump_train.tsv', is_train=True, algorithm_name=clf.clf_name)
    # we had two stages of the preprocessing flow because of the padding of the text features used in the NN algorithms:
    dm_train.run_first_preprocessing_flow()
    X_train, y_train = dm_train.complete_preprocessing_flow()
    # fit the classifier using all of the training data:
    clf.fit(X_train, y_train)
    return clf

Example #3

Show file

File: experiment_runner.py Project: Rotem-Lev-Lehman/NLP_Ex3

for clf in classifiers:
    print(f'Initialized classifier {clf.clf_name}')
    evaluator = Evaluator(clf)
    print('Initialized evaluator')
    dm_train = DataManager('trump_train.tsv',
                           is_train=True,
                           algorithm_name=clf.clf_name)
    dm_test = DataManager('trump_test.tsv',
                          is_train=False,
                          algorithm_name=clf.clf_name)

    print('Initialized Data manager')
    dm_train.run_first_preprocessing_flow()
    dm_test.run_first_preprocessing_flow()
    fix_max_length(dm_train, dm_test)
    X_train, y_train = dm_train.complete_preprocessing_flow()
    X_test, _ = dm_test.complete_preprocessing_flow()
    if clf.clf_name == 'RNN':
        clf.sequence_length = dm_train.max_length
    print('Cleaned X, y')
    best_score = evaluator.optimize_hyper_parameters(X_train,
                                                     y_train,
                                                     cv=3,
                                                     scoring='f1')
    print('Done')
    print(
        f'The classifier {clf.clf_name} gave us a best 3-fold score of: {best_score}'
    )
    print(clf.hyper_parameters)
    best_hyper_parameters_solver[clf.clf_name] = clf.hyper_parameters
    best_score_solver[clf.clf_name] = best_score