Python autolog 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: mlflow._spark_autologging

메소드/함수: autolog

hotexamples.com에서의 예제들: 3

Python autolog - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 mlflow._spark_autologging.autolog에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

def autolog():
    """
    Enables automatic logging of Spark datasource paths, versions (if applicable), and formats
    when they are read. This method is not threadsafe and assumes a
    `SparkSession
    <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession>`_
    already exists with the
    `mlflow-spark JAR
    <http://mlflow.org/docs/latest/tracking.html#automatic-logging-from-spark-experimental>`_
    attached. It should be called on the Spark driver, not on the executors (i.e. do not call
    this method within a function parallelized by Spark). This API requires Spark 3.0 or above.

    Datasource information is logged under the current active MLflow run. If no active run
    exists, datasource information is cached in memory & logged to the next-created active run
    (but not to successive runs). Note that autologging of Spark ML (MLlib) models is not currently
    supported via this API. Datasource-autologging is best-effort, meaning that if Spark is under
    heavy load or MLflow logging fails for any reason (e.g., if the MLflow server is unavailable),
    logging may be dropped.

    For any unexpected issues with autologging, check Spark driver and executor logs in addition
    to stderr & stdout generated from your MLflow code - datasource information is pulled from
    Spark, so logs relevant to debugging may show up amongst the Spark logs.

    .. code-block:: python
        :caption: Example

        import mlflow.spark
        import os
        import shutil
        from pyspark.sql import SparkSession
        # Create and persist some dummy data
        # Note: On environments like Databricks with pre-created SparkSessions,
        # ensure the org.mlflow:mlflow-spark:1.11.0 is attached as a library to
        # your cluster
        spark = (SparkSession.builder
                    .config("spark.jars.packages", "org.mlflow:mlflow-spark:1.11.0")
                    .master("local[*]")
                    .getOrCreate())
        df = spark.createDataFrame([
                (4, "spark i j k"),
                (5, "l m n"),
                (6, "spark hadoop spark"),
                (7, "apache hadoop")], ["id", "text"])
        import tempfile
        tempdir = tempfile.mkdtemp()
        df.write.csv(os.path.join(tempdir, "my-data-path"), header=True)
        # Enable Spark datasource autologging.
        mlflow.spark.autolog()
        loaded_df = spark.read.csv(os.path.join(tempdir, "my-data-path"),
                        header=True, inferSchema=True)
        # Call toPandas() to trigger a read of the Spark datasource. Datasource info
        # (path and format) is logged to the current active run, or the
        # next-created MLflow run if no run is currently active
        with mlflow.start_run() as active_run:
            pandas_df = loaded_df.toPandas()
    """
    from mlflow import _spark_autologging

    _spark_autologging.autolog()

예제 #2

파일 보기

파일: spark.py 프로젝트: mdneuzerling/mlflow

def autolog():
    """
    Enables automatic logging of Spark datasource paths, versions (if applicable), and formats
    when they are read. This method is not threadsafe and assumes a
    `SparkSession
    <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession>`_
    already exists with the
    `mlflow-spark JAR
    <http://mlflow.org/docs/latest/tracking.html#automatic-logging-from-spark-experimental>`_
    attached. It should be called on the Spark driver, not on the executors (i.e. do not call
    this method within a function parallelized by Spark). This API requires Spark 3.0 or above.

    Datasource information is logged under the current active MLflow run. If no active run
    exists, datasource information is cached in memory & logged to the next-created active run
    (but not to successive runs). Note that autologging of Spark ML (MLlib) models is not currently
    supported via this API. Datasource-autologging is best-effort, meaning that if Spark is under
    heavy load or MLflow logging fails for any reason (e.g., if the MLflow server is unavailable),
    logging may be dropped.

    For any unexpected issues with autologging, check Spark driver and executor logs in addition
    to stderr & stdout generated from your MLflow code - datasource information is pulled from
    Spark, so logs relevant to debugging may show up amongst the Spark logs.

    .. code-block:: python
        :caption: Example

        import mlflow.spark
        from pyspark.sql import SparkSession
        # Create and persist some dummy data
        spark = (SparkSession.builder
                    .config("spark.jars.packages", "org.mlflow.mlflow-spark")
                    .getOrCreate())
        df = spark.createDataFrame([
                (4, "spark i j k"),
                (5, "l m n"),
                (6, "spark hadoop spark"),
                (7, "apache hadoop")], ["id", "text"])
        import tempfile
        tempdir = tempfile.mkdtemp()
        df.write.format("csv").save(tempdir)
        # Enable Spark datasource autologging.
        mlflow.spark.autolog()
        loaded_df = spark.read.format("csv").load(tempdir)
        # Call collect() to trigger a read of the Spark datasource. Datasource info
        # (path and format)is automatically logged to an MLflow run.
        loaded_df.collect()
        shutil.rmtree(tempdir) # clean up tempdir
    """
    from mlflow import _spark_autologging

    _spark_autologging.autolog()

예제 #3

파일 보기

def autolog():
    """
    Enables automatic logging of Spark datasource paths, versions (if applicable), and formats
    when they are read. This method is not threadsafe and assumes a
    `SparkSession
    <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession>`_
    already exists with the
    `mlflow-spark JAR
    <http://mlflow.org/docs/latest/tracking.html#automatic-logging-from-spark-experimental>`_
    attached. It should be called on the Spark driver, not on the executors (i.e. do not call
    this method within a function parallelized by Spark). This API requires Spark 3.0 or above,
    but can be run on Spark 2.x environments with backports for compatibility with the
    mlflow-spark JAR (e.g. Databricks Runtime 6.0 and above).

    Datasource information is logged under the current active MLflow run, creating an active run
    if none exists. Note that autologging of Spark ML (MLlib) models is not currently supported
    via this API. Datasource-autologging is best-effort, meaning that if Spark is under heavy load
    or MLflow logging fails for any reason (e.g. if the MLflow server is unavailable), logging may
    be dropped.

    For any unexpected issues with autologging, check Spark driver and executor logs in addition
    to stderr & stdout generated from your MLflow code - datasource information is pulled from
    Spark, so logs relevant to debugging may show up amongst the Spark logs.

    >>> import mlflow.spark
    >>> from pyspark.sql import SparkSession
    >>> # Create and persist some dummy data
    >>> spark = SparkSession.builder\
    >>>   .config("spark.jars.packages", "org.mlflow.mlflow-spark").getOrCreate()
    >>> df = spark.createDataFrame([
    ...   (4, "spark i j k"),
    ...   (5, "l m n"),
    ...   (6, "spark hadoop spark"),
    ...   (7, "apache hadoop")], ["id", "text"])
    >>> import tempfile
    >>> tempdir = tempfile.mkdtemp()
    >>> df.write.format("csv").save(tempdir)
    >>> # Enable Spark datasource autologging.
    >>> mlflow.spark.autolog()
    >>> loaded_df = spark.read.format("csv").load(tempdir)
    >>> # Call collect() to trigger a read of the Spark datasource. Datasource info
    >>> # (path and format)is automatically logged to an MLflow run.
    >>> loaded_df.collect()
    >>> shutil.rmtree(tempdir) # clean up tempdir
    """
    from mlflow import _spark_autologging
    _spark_autologging.autolog()