from pyspark.ml.feature import StringIndexer indexer = StringIndexer(inputCol="label", outputCol="indexedLabel")
from pyspark.ml.feature import StringIndexer indexer = StringIndexer(inputCols=["label1", "label2"], outputCols=["indexedLabel1", "indexedLabel2"])
from pyspark.ml.feature import StringIndexer indexer = StringIndexer(inputCol="label", outputCol="indexedLabel") data = spark.createDataFrame([(0, "a"), (1, "b"), (2, "a"), (3, "c"), (4, "a")], ["id", "label"]) indexed = indexer.fit(data).transform(data) indexed.describe().show()In this example, we create a DataFrame with five rows and two columns ("id" and "label"). We transform the "label" column into numerical indices using a StringIndexer and then describe the resulting DataFrame ("indexed"). The output shows the count, mean, standard deviation, minimum, and maximum values of the numerical indices. Package library: PySpark's ml.feature package.