Python OneHotEncoder.setInputCols 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyspark.ml.feature

클래스/타입: OneHotEncoder

메소드/함수: setInputCols

hotexamples.com에서의 예제들: 1

Python OneHotEncoder.setInputCols - 1개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyspark.ml.feature.OneHotEncoder.setInputCols에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

OneHotEncoder(30)

transform(30)

fit(24)

setDropLast(4)

getOutputCols(3)

select(3)

getOutputCol(1)

load(1)

save(1)

setInputCols(1)

show(1)

toPandas(1)

write(1)

예제 #1

파일 보기

파일: main.py 프로젝트: billkellett/databricks-linear-regression-workshop-dbconnect

#  1. Indexing: assigning a numerical value to each data value
#  2. Encoding: creating the vector.

# First we'll index (NOTE that StringIndexer works on numeric data as well)
conditionIndexer = StringIndexer(inputCol="condition", outputCol="condition_index")

gradeIndexer = StringIndexer(inputCol="grade", outputCol="grade_index")

zipcodeIndexer = StringIndexer(inputCol="zipcode", outputCol="zipcode_index")

# Encode the indexed Categorical columns into Vectors
# Now we'll encode into vectors

# Now we'll tranform the indexed values into a vector
encoder = OneHotEncoder()
encoder.setInputCols(["condition_index", "grade_index", "zipcode_index"])\
    .setOutputCols(["condition_vector", "grade_vector", "zipcode_vector"])

# Transform all Features into a single Vector
# Transform the features into a Spark ML Vector

# Let's define our vector with only the features we actually want to use to build the model
# We'll ignore the columns above that are highly correlated to one another.

# Note that waterfront is treated as a boolean, so we didn't have to encode it.
# We can just add it to the vector assembler.
assembler = VectorAssembler(
    inputCols=["bedrooms", "bathrooms", "sqft_living", "sqft_above_percentage", "floors", "condition_vector", "grade_vector", "zipcode_vector", "waterfront"],
    outputCol="features")

# Build a Grid of Hyperparameters to test
# Here we build a Grid of hyperparameters so we can test all permutations