Python OneHotEncoder.toPandas 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyspark.ml.feature

클래스/타입: OneHotEncoder

메소드/함수: toPandas

hotexamples.com에서의 예제들: 1

Python OneHotEncoder.toPandas - 1개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyspark.ml.feature.OneHotEncoder.toPandas에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

OneHotEncoder(30)

transform(30)

fit(24)

setDropLast(4)

getOutputCols(3)

select(3)

getOutputCol(1)

load(1)

save(1)

setInputCols(1)

show(1)

toPandas(1)

write(1)

예제 #1

파일 보기

파일: m2_demo02_RandomForest.py 프로젝트: GCPBigData/ds

# Use the new indexed field to obtain a one-hot-encoded field

# In[15]:

from pyspark.ml.feature import OneHotEncoder

encodedDF = OneHotEncoder(inputCol="WorkClass_index",
                          outputCol="WorkClass_encoded").transform(indexedDF)

# #### A WorkClass_encoded field is created
# * This contains the one-hot-encoding for WorkClass
# * This cannot operate directly on a column with string values - values need to be numeric. Hence we use the WorkClass_index as input

# In[16]:

encodedDF.toPandas().head()

# #### View the original and transformed fields together

# In[17]:

encodedDF.select('WorkClass', 'WorkClass_index',
                 'WorkClass_encoded').toPandas().head()

# ### Transform the entire dataset
# * So far we have only transformed a single column
# * We need to perform this transformation for every categorical and non-numeric column
# * This will be simplified by using a Pipeline (a feature of Spark ML)

# ####  First, split the data into training and test sets