lakehouse_engine.transformers.custom_transformers

Custom transformers module.

 1"""Custom transformers module."""
 2
 3from typing import Callable
 4
 5from pyspark.sql import DataFrame
 6
 7
 8class CustomTransformers(object):
 9    """Class representing a CustomTransformers."""
10
11    @staticmethod
12    def custom_transformation(custom_transformer: Callable) -> Callable:
13        """Execute a custom transformation provided by the user.
14
15        This transformer can be very useful whenever the user cannot use our provided
16        transformers, or they want to write complex logic in the transform step of the
17        algorithm.
18
19        .. warning:: Attention!
20            Please bear in mind that the custom_transformer function provided
21            as argument needs to receive a DataFrame and return a DataFrame,
22            because it is how Spark's .transform method is able to chain the
23            transformations.
24
25        Example:
26        ```python
27        def my_custom_logic(df: DataFrame) -> DataFrame:
28        ```
29
30        Args:
31            custom_transformer: custom transformer function. A python function with all
32                required pyspark logic provided by the user.
33
34        Returns:
35            Callable: the same function provided as parameter, in order to e called
36                later in the TransformerFactory.
37
38        """
39        return custom_transformer
40
41    @staticmethod
42    def sql_transformation(sql: str) -> Callable:
43        """Execute a SQL transformation provided by the user.
44
45        This transformer can be very useful whenever the user wants to perform
46        SQL-based transformations that are not natively supported by the
47        lakehouse engine transformers.
48
49        Args:
50            sql: the SQL query to be executed. This can read from any table or
51                view from the catalog, or any dataframe registered as a temp
52                view.
53
54        Returns:
55            Callable: A function to be called in .transform() spark function.
56
57        """
58
59        def inner(df: DataFrame) -> DataFrame:
60            return df.sparkSession.sql(sql)
61
62        return inner
class CustomTransformers:
 9class CustomTransformers(object):
10    """Class representing a CustomTransformers."""
11
12    @staticmethod
13    def custom_transformation(custom_transformer: Callable) -> Callable:
14        """Execute a custom transformation provided by the user.
15
16        This transformer can be very useful whenever the user cannot use our provided
17        transformers, or they want to write complex logic in the transform step of the
18        algorithm.
19
20        .. warning:: Attention!
21            Please bear in mind that the custom_transformer function provided
22            as argument needs to receive a DataFrame and return a DataFrame,
23            because it is how Spark's .transform method is able to chain the
24            transformations.
25
26        Example:
27        ```python
28        def my_custom_logic(df: DataFrame) -> DataFrame:
29        ```
30
31        Args:
32            custom_transformer: custom transformer function. A python function with all
33                required pyspark logic provided by the user.
34
35        Returns:
36            Callable: the same function provided as parameter, in order to e called
37                later in the TransformerFactory.
38
39        """
40        return custom_transformer
41
42    @staticmethod
43    def sql_transformation(sql: str) -> Callable:
44        """Execute a SQL transformation provided by the user.
45
46        This transformer can be very useful whenever the user wants to perform
47        SQL-based transformations that are not natively supported by the
48        lakehouse engine transformers.
49
50        Args:
51            sql: the SQL query to be executed. This can read from any table or
52                view from the catalog, or any dataframe registered as a temp
53                view.
54
55        Returns:
56            Callable: A function to be called in .transform() spark function.
57
58        """
59
60        def inner(df: DataFrame) -> DataFrame:
61            return df.sparkSession.sql(sql)
62
63        return inner

Class representing a CustomTransformers.

@staticmethod
def custom_transformation(custom_transformer: Callable) -> Callable:
12    @staticmethod
13    def custom_transformation(custom_transformer: Callable) -> Callable:
14        """Execute a custom transformation provided by the user.
15
16        This transformer can be very useful whenever the user cannot use our provided
17        transformers, or they want to write complex logic in the transform step of the
18        algorithm.
19
20        .. warning:: Attention!
21            Please bear in mind that the custom_transformer function provided
22            as argument needs to receive a DataFrame and return a DataFrame,
23            because it is how Spark's .transform method is able to chain the
24            transformations.
25
26        Example:
27        ```python
28        def my_custom_logic(df: DataFrame) -> DataFrame:
29        ```
30
31        Args:
32            custom_transformer: custom transformer function. A python function with all
33                required pyspark logic provided by the user.
34
35        Returns:
36            Callable: the same function provided as parameter, in order to e called
37                later in the TransformerFactory.
38
39        """
40        return custom_transformer

Execute a custom transformation provided by the user.

This transformer can be very useful whenever the user cannot use our provided transformers, or they want to write complex logic in the transform step of the algorithm.

Attention!

Please bear in mind that the custom_transformer function provided as argument needs to receive a DataFrame and return a DataFrame, because it is how Spark's .transform method is able to chain the transformations.

Example:

def my_custom_logic(df: DataFrame) -> DataFrame:
Arguments:
  • custom_transformer: custom transformer function. A python function with all required pyspark logic provided by the user.
Returns:

Callable: the same function provided as parameter, in order to e called later in the TransformerFactory.

@staticmethod
def sql_transformation(sql: str) -> Callable:
42    @staticmethod
43    def sql_transformation(sql: str) -> Callable:
44        """Execute a SQL transformation provided by the user.
45
46        This transformer can be very useful whenever the user wants to perform
47        SQL-based transformations that are not natively supported by the
48        lakehouse engine transformers.
49
50        Args:
51            sql: the SQL query to be executed. This can read from any table or
52                view from the catalog, or any dataframe registered as a temp
53                view.
54
55        Returns:
56            Callable: A function to be called in .transform() spark function.
57
58        """
59
60        def inner(df: DataFrame) -> DataFrame:
61            return df.sparkSession.sql(sql)
62
63        return inner

Execute a SQL transformation provided by the user.

This transformer can be very useful whenever the user wants to perform SQL-based transformations that are not natively supported by the lakehouse engine transformers.

Arguments:
  • sql: the SQL query to be executed. This can read from any table or view from the catalog, or any dataframe registered as a temp view.
Returns:

Callable: A function to be called in .transform() spark function.