Python F.udf 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: great_expectations.expectations.metrics.import_manager

클래스/타입: F

메소드/함수: udf

hotexamples.com에서의 예제들: 8

Python F.udf - 8개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 great_expectations.expectations.metrics.import_manager.F.udf에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

lit(15)

col(10)

udf(8)

when(7)

max(4)

lag(4)

min(4)

datediff(3)

count(3)

length(2)

to_date(2)

abs(2)

desc(1)

expr(1)

array(1)

mean(1)

stddev_samp(1)

struct(1)

sum(1)

예제 #1

파일 보기

파일: column_values_match_strftime_format.py 프로젝트: rpatil524/great_expectations

    def _spark(cls, column, strftime_format, **kwargs):
        # Below is a simple validation that the provided format can both format and parse a datetime object.
        # %D is an example of a format that can format but not parse, e.g.
        try:
            datetime.strptime(
                datetime.strftime(datetime.now(), strftime_format),
                strftime_format)
        except ValueError as e:
            raise ValueError(
                f"Unable to use provided strftime_format: {str(e)}")

        def is_parseable_by_format(val):
            if val is None:
                return False
            try:
                datetime.strptime(val, strftime_format)
                return True
            except TypeError:
                raise TypeError(
                    "Values passed to expect_column_values_to_match_strftime_format must be of type string.\nIf you want to validate a column of dates or timestamps, please call the expectation before converting from string format."
                )
            except ValueError:
                return False

        success_udf = F.udf(is_parseable_by_format, sparktypes.BooleanType())
        return success_udf(column)

예제 #2

파일 보기

파일: expect_column_values_to_be_ascii.py 프로젝트: yangrong688/great_expectations

    def _spark(cls, column, **kwargs):
        def is_ascii(val):
            return str(val).isascii()

        is_ascii_udf = F.udf(is_ascii, sparktypes.BooleanType())

        return is_ascii_udf(column)

예제 #3

파일 보기

파일: expect_column_values_to_be_xml_parseable.py 프로젝트: rpatil524/great_expectations

    def _spark(cls, column, **kwargs):
        def is_xml(val):
            try:
                xml_doc = etree.fromstring(val)
                return True
            except:
                return False

        is_xml_udf = F.udf(is_xml, sparktypes.BooleanType())

        return is_xml_udf(column)

예제 #4

파일 보기

파일: column_values_json_parseable.py 프로젝트: yjlee215/great_expectations

    def _spark(cls, column, json_schema, **kwargs):
        def is_json(val):
            try:
                json.loads(val)
                return True
            except:
                return False

        is_json_udf = F.udf(is_json, sparktypes.BooleanType())

        return is_json_udf(column)

예제 #5

파일 보기

    def _spark(cls, column, **kwargs):
        center_point = kwargs.get("center_point")
        unit = kwargs.get("unit")
        range = kwargs.get("range")
        projection = kwargs.get("projection")

        if projection == "fcc":
            if unit == "kilometers":
                distances = F.udf(
                    lambda x, y=center_point: fcc_projection(x, y),
                    sparktypes.FloatType(),
                )
            elif unit == "miles":
                distances = F.udf(
                    lambda x, y=center_point: fcc_projection(x, y) * 1.609344,
                    sparktypes.FloatType(),
                )
                range = range * 1.609344

            return F.when(distances(column) < range,
                          F.lit(True)).otherwise(F.lit(False))

        elif projection == "pythagorean":
            if unit == "kilometers":
                distances = F.udf(
                    lambda x, y=center_point: pythagorean_projection(x, y),
                    sparktypes.FloatType(),
                )
            elif unit == "miles":
                distances = F.udf(
                    lambda x, y=center_point: pythagorean_projection(x, y) *
                    1.609344,
                    sparktypes.FloatType(),
                )
                range = range * 1.609344

            return F.when(distances(column) < range,
                          F.lit(True)).otherwise(F.lit(False))

예제 #6

파일 보기

파일: column_values_match_json_schema.py 프로젝트: rpatil524/great_expectations

    def _spark(cls, column, json_schema, **kwargs):
        def matches_json_schema(val):
            if val is None:
                return False
            try:
                val_json = json.loads(val)
                jsonschema.validate(val_json, json_schema)
                # jsonschema.validate raises an error if validation fails.
                # So if we make it this far, we know that the validation succeeded.
                return True
            except jsonschema.ValidationError:
                return False
            except jsonschema.SchemaError:
                raise
            except:
                raise

        matches_json_schema_udf = F.udf(matches_json_schema, sparktypes.BooleanType())

        return matches_json_schema_udf(column)

예제 #7

파일 보기

    def _spark(cls, column, xml_schema, **kwargs):
        try:
            xmlschema_doc = etree.fromstring(xml_schema)
            xmlschema = etree.XMLSchema(xmlschema_doc)
        except etree.ParseError:
            raise
        except:
            raise

        def matches_xml_schema(val):
            if val is None:
                return False
            try:
                xml_doc = etree.fromstring(val)
                return xmlschema(xml_doc)
            except:
                raise

        matches_xml_schema_udf = F.udf(matches_xml_schema,
                                       sparktypes.BooleanType())

        return matches_xml_schema_udf(column)

예제 #8

파일 보기

    def _spark(cls, column, **kwargs):

        tz_udf = F.udf(is_valid_timezone, sparktypes.BooleanType())

        return tz_udf(column)