Python Schema.add_categorical_column 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pydatavec

클래스/타입: Schema

메소드/함수: add_categorical_column

hotexamples.com에서의 예제들: 6

Python Schema.add_categorical_column - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pydatavec.Schema.add_categorical_column에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Schema(15)

add_string_column(14)

add_categorical_column(5)

add_double_column(4)

add_integer_column(3)

add_float_column(1)

to_java(1)

예제 #1

파일 보기

파일: test_transform_process.py 프로젝트: rghwer/testdocs

def test_cat_to_int():
    schema = Schema()
    schema.add_categorical_column('cat', ['A', 'B', 'C'])

    tp = TransformProcess(schema)
    tp.categorical_to_integer('cat')

    assert tp.final_schema.get_column_type('cat') == 'integer'

    tp.to_java()

예제 #2

파일 보기

def test_schema():
    schema = Schema()
    schema.add_string_column('str1')
    schema.add_string_column('str2')
    schema.add_integer_column('int1')
    schema.add_integer_column('int2')
    schema.add_double_column('dbl1')
    schema.add_double_column('dbl2')
    schema.add_float_column('flt1')
    schema.add_float_column('flt2')
    schema.add_categorical_column('cat1', ['A', 'B', 'C'])
    schema.add_categorical_column('cat2', ['A', 'B', 'C'])
    schema.to_java()

예제 #3

파일 보기

파일: test_schema.py 프로젝트: akhodakivskiy/deeplearning4j

def test_schema():
    schema = Schema()
    schema.add_string_column('str1')
    schema.add_string_column('str2')
    schema.add_integer_column('int1')
    schema.add_integer_column('int2')
    schema.add_double_column('dbl1')
    schema.add_double_column('dbl2')
    schema.add_float_column('flt1')
    schema.add_float_column('flt2')
    schema.add_categorical_column('cat1', ['A', 'B', 'C'])
    schema.add_categorical_column('cat2', ['A', 'B', 'C'])
    schema.to_java()

예제 #4

파일 보기

파일: train_model_and_transform.py 프로젝트: wangfeng-skymind/skil-python

        os.remove(temp_filename)
    download_file(url, temp_filename)
    os.rename(temp_filename, filename)

# We use pyspark to filter empty lines
sc = pyspark.SparkContext(master='local[*]', appName='iris')
data = sc.textFile('iris.data')
filtered_data = data.filter(lambda d: len(d) > 0)

# Define Input Schema
input_schema = Schema()
input_schema.add_double_column('Sepal length')
input_schema.add_double_column('Sepal width')
input_schema.add_double_column('Petal length')
input_schema.add_double_column('Petal width')
input_schema.add_categorical_column(
    "Species", ["Iris-setosa", "Iris-versicolor", "Iris-virginica"])

# Define Transform Process
tp = TransformProcess(input_schema)
tp.one_hot("Species")

# Do the transformation on spark and convert to numpy
output = tp(filtered_data)
np_array = np.array([[float(i) for i in x.split(',')] for x in output])
x = np_array[:, :-3]
y = np_array[:, -3:]

# Build the Keras model
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
model.add(Dense(10, activation='relu', name='fc2'))

예제 #5

파일 보기

from pydatavec import Schema, TransformProcess
from pydatavec import NotInSet, LessThan

# Let's define the schema of the data that we want to import
# The order in which columns are defined here should match the order in which they appear in the input data

input_schema = Schema()

input_schema.add_string_column("DateTimeString")
input_schema.add_string_column("CustomerID")
input_schema.add_string_column("MerchantID")

input_schema.add_integer_column("NumItemsInTransaction")

input_schema.add_categorical_column("MerchantCountryCode",
                                    ["USA", "CAN", "FR", "MX"])

# Some columns have restrictions on the allowable values, that we consider valid:

input_schema.add_double_column(
    "TransactionAmountUSD", 0.0, None, False,
    False)  # $0.0 or more, no maximum limit, no NaN and no Infinite values

input_schema.add_categorical_column("FraudLabel", ["Fraud", "Legit"])

# Lets define some operations to execute on the data...
# We do this by defining a TransformProcess
# At each step, we identify column by the name we gave them in the input data schema, above

tp = TransformProcess(input_schema)

예제 #6

파일 보기

파일: reduction.py 프로젝트: vishalbelsare/pydatavec

#
# SPDX-License-Identifier: Apache-2.0
################################################################################
'''
In this simple example: We'll show how to combine multiple independent records by key.
Specifically, assume we have data like "person,country_visited,entry_time" and we want to know how many times
each person has entered each country.
'''

from pydatavec import Schema, TransformProcess

# Define the input schema

schema = Schema()
schema.add_string_column('person')
schema.add_categorical_column('country_visited',
                              ['USA', 'Japan', 'China', 'India'])
schema.add_string_column('entry_time')

# Define the operations we want to do

tp = TransformProcess(schema)

# Parse date-time
# Format for parsing times is as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html

tp.string_to_time('entry_time', 'YYYY/MM/dd')

# Take the "country_visited" column and expand it to a one-hot representation
# So, "USA" becomes [1,0,0,0], "Japan" becomes [0,1,0,0], "China" becomes [0,0,1,0] etc

tp.one_hot('country_visited')