Python OneSidedSelection.OneSidedSelection 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: unbalanced_dataset

클래스/타입: OneSidedSelection

메소드/함수: OneSidedSelection

hotexamples.com에서의 예제들: 2

Python OneSidedSelection.OneSidedSelection - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 unbalanced_dataset.OneSidedSelection.OneSidedSelection에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

OneSidedSelection(2)

fit_transform(1)

자주 사용되는 메소드들

OneSidedSelection (2)

fit_transform (1)

예제 #1

파일 보기

def apply_sampling(X_data, Y_data, sampling, n_states, maxlen):
    ratio = float(np.count_nonzero(Y_data == 1)) / \
        float(np.count_nonzero(Y_data == 0))
    X_data = np.reshape(X_data, (len(X_data), n_states * maxlen))
    # 'Random over-sampling'
    if sampling == 'OverSampler':
        OS = OverSampler(ratio=ratio, verbose=True)
    # 'Random under-sampling'
    elif sampling == 'UnderSampler':
        OS = UnderSampler(verbose=True)
    # 'Tomek under-sampling'
    elif sampling == 'TomekLinks':
        OS = TomekLinks(verbose=True)
    # Oversampling
    elif sampling == 'SMOTE':
        OS = SMOTE(ratio=1, verbose=True, kind='regular')
    # Oversampling - Undersampling
    elif sampling == 'SMOTETomek':
        OS = SMOTETomek(ratio=ratio, verbose=True)
    # Undersampling
    elif sampling == 'OneSidedSelection':
        OS = OneSidedSelection(verbose=True)
    # Undersampling
    elif sampling == 'CondensedNearestNeighbour':
        OS = CondensedNearestNeighbour(verbose=True)
    # Undersampling
    elif sampling == 'NearMiss':
        OS = NearMiss(version=1, verbose=True)
    # Undersampling
    elif sampling == 'NeighbourhoodCleaningRule':
        OS = NeighbourhoodCleaningRule(verbose=True)
    # ERROR: WRONG SAMPLER, TERMINATE
    else:
        print('Wrong sampling variable you have set... Exiting...')
        sys.exit()
    # print('shape ' + str(X.shape))
    X_data, Y_data = OS.fit_transform(X_data, Y_data)
    return X_data, Y_data

예제 #2

파일 보기

colnames = ['old_index','job_id', 'task_idx','sched_cls', 'priority', 'cpu_requested',
            'mem_requested', 'disk', 'violation'] 

tain_path = r'/home/askrey/Dropbox/Project_step_by_step/3_create_database/csvs/frull_db_2.csv'

X = pd.read_csv(tain_path, header = None, index_col = False ,names = colnames, skiprows = [0],  usecols = [3,4,5,6,7])
y = pd.read_csv(tain_path, header = None, index_col = False ,names = colnames, skiprows = [0],  usecols = [8])
y = y['violation'].values
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.333, random_state=0)
main_x = X.values
main_y = y

verbose = False

# 'One-Sided Selection'
OSS = OneSidedSelection(size_ngh=51, n_seeds_S=51, verbose=verbose)
x, y = OSS.fit_transform(main_x, main_y)

ratio = float(np.count_nonzero(y==1)) / float(np.count_nonzero(y==0))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.333, random_state=0)

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_score

clf = RandomForestClassifier(n_estimators=10)
scores = cross_val_score(clf, X_test, y_test)

y_pred = clf.fit(X_train, y_train).predict(X_test)
y_score = clf.fit(X_train, y_train).predict_proba(X_test)[:,1]