Python build_environmental_data 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: glcdataset

메소드/함수: build_environmental_data

hotexamples.com에서의 예제들: 2

Python build_environmental_data - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 glcdataset.build_environmental_data에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: vector_model.py 프로젝트: jules-brc/GeoLifeClef-1

def run(train_csv, train_tensor, test_csv, test_tensor, metric='euclidean'):
    print("K means\n")
    """
        Construction du dataset train.
    """
    df = pd.read_csv(train_csv,
                     sep=';',
                     header='infer',
                     quotechar='"',
                     low_memory=True)

    df = df[['Longitude','Latitude','glc19SpId','scName']]\
       .dropna(axis=0, how='all')\
       .astype({'glc19SpId': 'int64'})
    # target pandas series of the species identifiers (there are 505 labels)
    target_df = df['glc19SpId']

    # building the environmental data
    env_df = build_environmental_data(df[['Latitude', 'Longitude']],
                                      patches_dir=train_tensor)
    X_train = env_df.values
    y_train = target_df.values
    """
        Construction du dataset test.
    """
    df = pd.read_csv(test_csv,
                     sep=';',
                     header='infer',
                     quotechar='"',
                     low_memory=True)

    df = df[['Longitude','Latitude','glc19SpId','scName']]\
       .dropna(axis=0, how='all')\
       .astype({'glc19SpId': 'int64'})
    # target pandas series of the species identifiers (there are 505 labels)
    target_df = df['glc19SpId']

    # building the environmental data
    env_df = build_environmental_data(df[['Latitude', 'Longitude']],
                                      patches_dir=test_tensor)
    X_test = env_df.values
    y_test = target_df.values
    """
        Entrainement modèle.
    """
    # Standardize the features by removing the mean and scaling to unit variance
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    classifier = VectorModel(metric=metric)
    classifier.fit(X_train, y_train)
    """
         Évaluation et Calcul de score.
    """

    y_predicted = classifier.predict(X_test)
    print(f'Top30 score:{classifier.top30_score(y_predicted, y_test)}')
    print(f'MRR score:{classifier.mrr_score(y_predicted, y_test)}')
    print('Params:', classifier.get_params())

예제 #2

파일 보기

           .dropna(axis=0, how='all')\
           .astype({'glc19SpId': 'int64'})

    # target pandas series of the species identifiers (there are 505 labels)
    target_df = df['glc19SpId']

    # correspondence table between ids and the species taxonomic names
    # (Taxref names with year of discoverie)
    taxonomic_names = pd.read_csv('../data/occurrences/taxaName_glc19SpId.csv',
                                  sep=';',
                                  header='infer',
                                  quotechar='"',
                                  low_memory=True)

    # building the environmental data
    env_df = build_environmental_data(df[['Latitude', 'Longitude']],
                                      patches_dir='example_envtensors')
    X = env_df.values
    y = target_df.values
    # Standardize the features by removing the mean and scaling to unit variance
    scaler = StandardScaler()
    X = scaler.fit_transform(X)

    # Evaluate as the average accuracy on one train/split random sample:
    print("Test nearest centroid model")
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    classifier = NearestCentroidModel(metric='euclidean')
    classifier.fit(X_train, y_train)
    y_predicted = classifier.predict(X_test)
    print(f'Top30 score:{classifier.top30_score(y_predicted, y_test)}')
    print(f'MRR score:{classifier.mrr_score(y_predicted, y_test)}')
    print('Params:', classifier.get_params())