Python VectorAssembler.printSchema 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyspark.ml.feature

클래스/타입: VectorAssembler

메소드/함수: printSchema

hotexamples.com에서의 예제들: 2

Python VectorAssembler.printSchema - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyspark.ml.feature.VectorAssembler.printSchema에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

VectorAssembler(30)

getOutputCol(30)

transform(30)

getInputCols(19)

setInputCols(18)

setHandleInvalid(15)

select(11)

load(10)

setOutputCol(9)

randomSplit(7)

show(5)

explainParams(3)

join(2)

take(2)

setParams(2)

printSchema(2)

coalesce(2)

foreachPartition(2)

filter(2)

registerTempTable(1)

count(1)

cache(1)

dropna(1)

drop(1)

collect(1)

예제 #1

파일 보기

df3.show()
df3.printSchema()
df3 = OneHotEncoder(inputCol='color1', outputCol='color2',
                    dropLast=False).transform(df3)
df3.printSchema()
df4 = StringIndexer(inputCol='type', outputCol='type1').fit(df2).transform(df3)
df4.show()
df4.printSchema()

# Vector assembler
df5 = VectorAssembler(inputCols=[
    'id', 'bone_length', 'rotting_flesh', 'hair_length', 'has_soul', 'color2'
],
                      outputCol='Features').transform(df4)
df5.show(truncate=False)
df5.printSchema()
# --------------------------------------------------------------------------

# data processing complete---
# 6 .Model building
training = df5
#training.show(truncate=False,n=5)
from pyspark.ml.classification import RandomForestClassifier
df1 = RandomForestClassifier(featuresCol='Features',
                             labelCol='type1',
                             numTrees=86,
                             maxDepth=10)
model22 = df1.fit(training)
model22.getNumTrees
#model22.numFeatures
training2 = model22.transform(training)

예제 #2

파일 보기

파일: spark_PR_fromCSV.py 프로젝트: xavialex/Spark-Examples

    train_2 = train_1.withColumn("x",
                                 train_1["oldX"].cast("float")).drop("oldX")
    train_2 = train_2.withColumn(
        "label", train_1["oldLabel"].cast("float")).drop("oldLabel")
    train_2.cache()
    train_2.show()
    train_2.printSchema()

    train_2.printSchema()
    print(train_2.dtypes)
    train_2.describe().show()

    # Converting "features" column in a Vector column
    train_2 = VectorAssembler(inputCols=["x"],
                              outputCol="feature").transform(train_2)
    train_2.printSchema()

    # Plotting Dataset
    f, axarr = plt.subplots(2, sharex=True)
    # Converting "features" DenseVector column to NPy Array
    npFeatures = np.array([])
    for i in train_2.collect():
        npFeatures = np.append(npFeatures, i['feature'].toArray())
    # Converting "label" DenseVector column to NPy Array
    npLabels = np.array([])
    for i in train_2.collect():
        npLabels = np.append(npLabels, i['label'])
    axarr[0].plot(npFeatures, npLabels, label="Data", linewidth=2)

    # Pipeline: Polynomial expansion, Linear Regression and label vs. prediction charts for every degree
    for degree in [5, 6, 7]: