Python FunctionLib.log_transformの例

プログラミング言語: Python

名前空間/パッケージ名: Model

クラス/型: FunctionLib

メソッド/関数: log_transform

hotexamples.comのコード掲載数: 2

Python FunctionLib.log_transform - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのModel.FunctionLib.log_transformの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

get_params(8)

distinct_feats(7)

change_type(7)

get_missing_value_feats(6)

ScoreDataFrame(3)

get_aggregate_features_num(3)

get_model_performance(3)

TurkyOutliers(2)

impute_knn_classifier(2)

GetScaledModel(2)

get_rowcnt_most_missing_val(2)

GetBasedModel(2)

cv_score(2)

corr_feats(2)

GetScaledModelwithfactorizedCW(2)

plot_bar(2)

missing_val_perc(2)

impute_values(2)

log_transform(2)

PlotBoxR(2)

match_strings(1)

hist_perc(1)

hist_compare(1)

get_unique_val_list(1)

plot_stats(1)

min_len_col(1)

AdaBoostClassifier(1)

get_corr(1)

feature_stats(1)

default_ratio(1)

cv_metrics(1)

concat_model_score(1)

RandomSearch(1)

RandomForestClassifier(1)

LogisticRegression(1)

KNeighborsClassifier(1)

GridSearch(1)

GradientBoostingClassifier(1)

GetScaledModelwithbestparams(1)

train_test_split(1)

コード例 #1

ファイルを表示

ファイル: Preprocessing.py プロジェクト: rkparyani/KAGGLE---Home-Credit-Default-Risk

    def outlier_treatment(self, normalized_feats):
        # Find the num and cat feats for imp_df

        num_feats_imp_df, cat_feats_imp_df = self.seperate_cat_num_var(
            self.ds1_df)
        other_feats = [
            x for x in num_feats_imp_df if x not in normalized_feats
        ]

        # Anamolies and data correction.
        # DAYS_EMPLOYED has abnormal value '365243' which would be changed to nan for imputation at a later stage
        feature = 'DAYS_EMPLOYED'
        self.ds1_df[feature].loc[self.ds1_df[self.ds1_df[feature] ==
                                             365243].index] = np.nan

        # XNA values exist in ORGANIZATION_TYPE feature, replacing it by np.NaN to be imputed.
        self.ds1_df['ORGANIZATION_TYPE'].replace("XNA", np.nan, inplace=True)

        # Log transformation of all numerical non normalized highly skewed values to remove outliers

        for feature in other_feats:
            print('log_transform', feature)
            self.ds1_df = f.log_transform(self.ds1_df, feature)
            self.ds1_df.drop(self.ds1_df[[feature]], axis=1, inplace=True)

        #normalized_num_feats_imp_df = [x for x in normalized_feats if x in num_feats_imp_df]
        num_feats_imp_df, cat_feats_imp_df = self.seperate_cat_num_var(
            self.ds1_df)

        for i in num_feats_imp_df:
            print(i)
            out_l, out_r, min, max = f.TurkyOutliers(self.ds1_df,
                                                     i,
                                                     drop=False)
            if (len(out_l) | len(out_r)) > 0:
                self.ds1_df[i].loc[out_l] = round(min, 3)
                self.ds1_df[i].loc[out_r] = round(max, 3)

コード例 #2

ファイルを表示

ファイル: Preprocessing_app_train.py プロジェクト: rkparyani/KAGGLE---Home-Credit-Default-Risk

other_feats = [x for x in num_feats_imp_df if x not in normalized_feats]

# Anamolies and data correction.
# DAYS_EMPLOYED has abnormal value '365243' which would be changed to nan for imputation at a later stage
feature = 'DAYS_EMPLOYED'
imp_df[feature].loc[imp_df[imp_df[feature] == 365243].index] = np.nan

# XNA values exist in ORGANIZATION_TYPE feature, replacing it by np.NaN to be imputed.
imp_df['ORGANIZATION_TYPE'].replace("XNA", np.nan, inplace=True)

# Log transformation of all numerical non normalized highly skewed values to remove outliers

for feature in other_feats:
    print(feature)
    imp_df = f.log_transform(imp_df, feature)
    imp_df.drop(imp_df[[feature]], axis=1, inplace=True)

#normalized_num_feats_imp_df = [x for x in normalized_feats if x in num_feats_imp_df]
num_feats_imp_df, cat_feats_imp_df = f.distinct_feats(imp_df)
num_feats_imp_df.remove('TARGET')
num_feats_imp_df.remove('SK_ID_CURR')
print(len(num_feats_imp_df), len(cat_feats_imp_df))

for i in num_feats_imp_df:
    print(i)
    #i = 'AMT_REQ_CREDIT_BUREAU_YEAR_log'
    out_l, out_r, min, max = f.TurkyOutliers(imp_df, i, drop=False)
    if (len(out_l) | len(out_r)) > 0:
        imp_df[i].loc[out_l] = round(min, 3)
        imp_df[i].loc[out_r] = round(max, 3)