Python UTILS.rename_cols 예제들

프로그래밍 언어: Python

클래스/타입: UTILS

메소드/함수: rename_cols

hotexamples.com에서의 예제들: 3

Python UTILS.rename_cols - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 UTILS.rename_cols에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

upper(6)

logMaker(6)

merge_coalesce(5)

toJsonFile(4)

rename_cols(3)

remove(3)

resize(2)

picCut(2)

pixelLoss(2)

featureLoss(2)

int2bin(1)

save_to_grayscale(1)

change_code(1)

thumbnail(1)

smoothLoss_ESR(1)

smoothLoss(1)

sizeRecurrect(1)

datetime64ToStr(1)

completionCheck(1)

get_code(1)

download_rtc_attachment(1)

qualityRank(1)

preprocess(1)

addGaussianNoise(1)

getBinaryOnArray(1)

UTILS(1)

예제 #1

파일 보기

파일: svc_agencies.py 프로젝트: GeoDaCenter/contracts_cleaning

def read_svc():
    '''
    Reads in the service agency addresses. Calls the COMPARE_ADDRESSES module to
    merge duplicate addresses per agency. Counts the number of service addresses
    per organization. Returns a dataframe.
    '''

    # Print progress report
    print('\nReading in service agencies')

    # Read in the service agencies, converting zip code to string
    df = pd.read_csv(SVC, converters={'ZipCode': str})

    # Append '_SVC' to all columns except CSDS_Svc_ID
    df = u.rename_cols(df, [x for x in df.columns if x != 'CSDS_Svc_ID'],
                       '_SVC')

    # Rename a column to prepare for linking
    df = df.rename(columns={'CSDS_Svc_ID': 'CSDS_Vendor_ID_LINK2'}, index=str)

    # Use the COMPARE_ADDRESSES module to clean up multiple strings for a single
    # address record
    key = 'CSDS_Vendor_ID_LINK2'
    target = 'Address_SVC'
    fixed_addresses = ca.fix_duplicate_addresses(df, key, target)

    # Drop duplicates based on the key and target fields
    fixed_addresses = fixed_addresses.drop_duplicates(subset=[key, target])

    return fixed_addresses

예제 #2

파일 보기

def jwsim_contracts_irs(contracts, irs, suffix):
    '''
    Takes the contracts and IRS dataframes and returns a dataframe of records
    with matching names where the JW similarity is >= JWSIM_THRESH.
    '''

    # Rename the columns in IRS:
    irs = u.rename_cols(irs, irs.columns, suffix)

    # Restrict the contracts df to just those from IL
    contracts = contracts[contracts.CSDS_Contract_ID.str.startswith('IL')]

    # Take the cartesian product between the two; replace np.NaN with ''
    prod = mn.cart_prod(contracts, irs)
    prod = prod.replace(np.NaN, '')

    # Print progress report
    print('Calculating Jaro-Winkler similarity on vendor names')

    # Compute the Jaro-Winkler similarity on the VendorName cols
    col1 = 'VendorName'
    arg = ((prod, col1, col1 + suffix))
    jwsim = mn.parallelize(mn.jwsim, arg)

    # Return only the rows where JW similarity >= JWSIM_THRESH
    return jwsim[jwsim.JWSimilarity >= JWSIM_THRESH]

예제 #3

파일 보기

파일: svc_agencies.py 프로젝트: GeoDaCenter/contracts_cleaning

def linker():
    '''
    Reads in the linker file (to link HQ agencies to service agencies). Merges a
    copy of itself on cluster ID, then eliminates records that match on vendor
    ID (to produce only matches that have different vendor IDs). Returns a
    dataframe.
    '''

    # Read in the link dataframe
    link = read_linker()

    # Make two new dataframes by copying the link dataframe and renaming columns
    link1 = link.rename(columns={'VendorName': 'VendorName_LINK1'}, index=str)
    link2 = u.rename_cols(link, ['VendorName', 'CSDS_Vendor_ID'], '_LINK2')

    # Merge the two link dataframes together
    df = link1.merge(link2, how='left')

    # Drop self-matches and reset the index
    df = df[df['CSDS_Vendor_ID'] != df['CSDS_Vendor_ID_LINK2']].reset_index(
        drop=True)

    return df