Python DecisionTreeRegressor.setMaxBins 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyspark.ml.regression

메소드/함수: setMaxBins

hotexamples.com에서의 예제들: 2

Python DecisionTreeRegressor.setMaxBins - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyspark.ml.regression.DecisionTreeRegressor.setMaxBins에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

DecisionTreeRegressor(30)

fit(30)

setLabelCol(3)

setPredictionCol(3)

save(2)

setMaxBins(2)

explainParams(1)

getMaxDepth(1)

getNumTrees(1)

get_params(1)

load(1)

setFeaturesCol(1)

set_params(1)

transform(1)

예제 #1

파일 보기

파일: Power Plant ML Demo.py 프로젝트: abhinavg6/demo_azure_db_shard

# MAGIC Reference Decision Trees: https://en.wikipedia.org/wiki/Decision_tree_learning

# COMMAND ----------

# MAGIC %md
# MAGIC ###  Decision Tree Models

# COMMAND ----------

from pyspark.ml.regression import DecisionTreeRegressor

dt = DecisionTreeRegressor()
dt.setLabelCol("PE")
dt.setPredictionCol("Predicted_PE")
dt.setFeaturesCol("features")
dt.setMaxBins(100)

dtPipeline = Pipeline()
dtPipeline.setStages([vectorizer, dt])
# Let's just resuse our CrossValidator

crossval.setEstimator(dtPipeline)

paramGrid = ParamGridBuilder()\
  .addGrid(dt.maxDepth, range(2, 8))\
  .build()
crossval.setEstimatorParamMaps(paramGrid)

dtModel = crossval.fit(trainingSet)

# COMMAND ----------

예제 #2

파일 보기

파일: 10-6 Decision Trees.py 프로젝트: RGuseynov/spark_learning

# COMMAND ----------

# MAGIC %md
# MAGIC In Spark, data is partitioned by row. So when it needs to make a split, each worker has to compute summary statistics for every feature for  each split point. Then these summary statistics have to be aggregated (via tree reduce) for a split to be made.
# MAGIC
# MAGIC Think about it: What if worker 1 had the value `32` but none of the others had it. How could you communicate how good of a split that would be? So, Spark has a maxBins parameter for discretizing continuous variables into buckets, but the number of buckets has to be as large as the number of categorical variables.

# COMMAND ----------

# MAGIC %md
# MAGIC Let's go ahead and increase maxBins to `40`.

# COMMAND ----------

dt.setMaxBins(40)

# COMMAND ----------

# MAGIC %md
# MAGIC Take two.

# COMMAND ----------

pipelineModel = pipeline.fit(trainDF)

# COMMAND ----------

# MAGIC %md
# MAGIC ## Visualize the Decision Tree