Python getUnlabelledData 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: System.DataProcessing.process_data

메소드/함수: getUnlabelledData

hotexamples.com에서의 예제들: 2

Python getUnlabelledData - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 System.DataProcessing.process_data.getUnlabelledData에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: extract_meta_data.py 프로젝트: petterasla/IECCS

def checkIdenticals():
    old = ptd.getDataWithMeta()
    old_2011 = old[old.Publication_year == 2011]
    old_2011_wos = old_2011.WOS.tolist()
    new = ptd.getUnlabelledData()
    print("len of new data: {}".format(len(new)))
    new_2011 = new[new.Publication_year == "2011"]
    new_2011_wos = new_2011.WOS.tolist()

    print("old length 2011: {}".format(len(old_2011_wos)))
    print("new length 2011: {}".format(len(new_2011_wos)))

    print old_2011_wos[:5]
    print new_2011_wos[:5]

    identical = []
    for wos in new_2011_wos:
        for wos2 in old_2011_wos:
            if wos == wos2:
                print("{}\n{}\n".format(wos, wos2))
                identical.append(wos)

    print("Number of identical papers = {}".format(len(identical)))

    new_data = ptd.getUnlabelledDataAsList()

    print ("len of old before: {}".format(len(new_data)))
    new_data_after = []
    for dic in new_data:
        if dic["WOS"] not in identical:
            new_data_after.append(dic)

    print ("len of old after: {}".format(len(new_data_after)))

예제 #2

파일 보기

파일: ex1_improved_baseline.py 프로젝트: petterasla/IECCS

#################
#    Parameters #
#################

store_to_file = 0


################
#    Load Data #
################


print("Loading data...")
train_data = pd.concat([ptd.getTrainingData(), ptd.getValidationData(), ptd.getTestData()])
unlabelled_data = ptd.getUnlabelledData()


#########################
#   Train classifier    #
#########################


print("Training classifier")
best_classifier = LinearSVC(C=1.178)

pipeline = Pipeline([('vect', CountVectorizer(decode_error='ignore',
                                              analyzer='word',
                                              ngram_range=(1, 2),
                                              stop_words= None,
                                              max_features=None)),