lakehouse_engine.transformers.custom_transformers
Custom transformers module.
1"""Custom transformers module.""" 2 3from typing import Callable 4 5from pyspark.sql import DataFrame 6 7 8class CustomTransformers(object): 9 """Class representing a CustomTransformers.""" 10 11 @staticmethod 12 def custom_transformation(custom_transformer: Callable) -> Callable: 13 """Execute a custom transformation provided by the user. 14 15 This transformer can be very useful whenever the user cannot use our provided 16 transformers, or they want to write complex logic in the transform step of the 17 algorithm. 18 19 .. warning:: Attention! 20 Please bear in mind that the custom_transformer function provided 21 as argument needs to receive a DataFrame and return a DataFrame, 22 because it is how Spark's .transform method is able to chain the 23 transformations. 24 25 Example: 26 ```python 27 def my_custom_logic(df: DataFrame) -> DataFrame: 28 ``` 29 30 Args: 31 custom_transformer: custom transformer function. A python function with all 32 required pyspark logic provided by the user. 33 34 Returns: 35 Callable: the same function provided as parameter, in order to e called 36 later in the TransformerFactory. 37 38 """ 39 return custom_transformer 40 41 @staticmethod 42 def sql_transformation(sql: str) -> Callable: 43 """Execute a SQL transformation provided by the user. 44 45 This transformer can be very useful whenever the user wants to perform 46 SQL-based transformations that are not natively supported by the 47 lakehouse engine transformers. 48 49 Args: 50 sql: the SQL query to be executed. This can read from any table or 51 view from the catalog, or any dataframe registered as a temp 52 view. 53 54 Returns: 55 Callable: A function to be called in .transform() spark function. 56 57 """ 58 59 def inner(df: DataFrame) -> DataFrame: 60 return df.sparkSession.sql(sql) 61 62 return inner
9class CustomTransformers(object): 10 """Class representing a CustomTransformers.""" 11 12 @staticmethod 13 def custom_transformation(custom_transformer: Callable) -> Callable: 14 """Execute a custom transformation provided by the user. 15 16 This transformer can be very useful whenever the user cannot use our provided 17 transformers, or they want to write complex logic in the transform step of the 18 algorithm. 19 20 .. warning:: Attention! 21 Please bear in mind that the custom_transformer function provided 22 as argument needs to receive a DataFrame and return a DataFrame, 23 because it is how Spark's .transform method is able to chain the 24 transformations. 25 26 Example: 27 ```python 28 def my_custom_logic(df: DataFrame) -> DataFrame: 29 ``` 30 31 Args: 32 custom_transformer: custom transformer function. A python function with all 33 required pyspark logic provided by the user. 34 35 Returns: 36 Callable: the same function provided as parameter, in order to e called 37 later in the TransformerFactory. 38 39 """ 40 return custom_transformer 41 42 @staticmethod 43 def sql_transformation(sql: str) -> Callable: 44 """Execute a SQL transformation provided by the user. 45 46 This transformer can be very useful whenever the user wants to perform 47 SQL-based transformations that are not natively supported by the 48 lakehouse engine transformers. 49 50 Args: 51 sql: the SQL query to be executed. This can read from any table or 52 view from the catalog, or any dataframe registered as a temp 53 view. 54 55 Returns: 56 Callable: A function to be called in .transform() spark function. 57 58 """ 59 60 def inner(df: DataFrame) -> DataFrame: 61 return df.sparkSession.sql(sql) 62 63 return inner
Class representing a CustomTransformers.
12 @staticmethod 13 def custom_transformation(custom_transformer: Callable) -> Callable: 14 """Execute a custom transformation provided by the user. 15 16 This transformer can be very useful whenever the user cannot use our provided 17 transformers, or they want to write complex logic in the transform step of the 18 algorithm. 19 20 .. warning:: Attention! 21 Please bear in mind that the custom_transformer function provided 22 as argument needs to receive a DataFrame and return a DataFrame, 23 because it is how Spark's .transform method is able to chain the 24 transformations. 25 26 Example: 27 ```python 28 def my_custom_logic(df: DataFrame) -> DataFrame: 29 ``` 30 31 Args: 32 custom_transformer: custom transformer function. A python function with all 33 required pyspark logic provided by the user. 34 35 Returns: 36 Callable: the same function provided as parameter, in order to e called 37 later in the TransformerFactory. 38 39 """ 40 return custom_transformer
Execute a custom transformation provided by the user.
This transformer can be very useful whenever the user cannot use our provided transformers, or they want to write complex logic in the transform step of the algorithm.
Attention!
Please bear in mind that the custom_transformer function provided as argument needs to receive a DataFrame and return a DataFrame, because it is how Spark's .transform method is able to chain the transformations.
Example:
def my_custom_logic(df: DataFrame) -> DataFrame:
Arguments:
- custom_transformer: custom transformer function. A python function with all required pyspark logic provided by the user.
Returns:
Callable: the same function provided as parameter, in order to e called later in the TransformerFactory.
42 @staticmethod 43 def sql_transformation(sql: str) -> Callable: 44 """Execute a SQL transformation provided by the user. 45 46 This transformer can be very useful whenever the user wants to perform 47 SQL-based transformations that are not natively supported by the 48 lakehouse engine transformers. 49 50 Args: 51 sql: the SQL query to be executed. This can read from any table or 52 view from the catalog, or any dataframe registered as a temp 53 view. 54 55 Returns: 56 Callable: A function to be called in .transform() spark function. 57 58 """ 59 60 def inner(df: DataFrame) -> DataFrame: 61 return df.sparkSession.sql(sql) 62 63 return inner
Execute a SQL transformation provided by the user.
This transformer can be very useful whenever the user wants to perform SQL-based transformations that are not natively supported by the lakehouse engine transformers.
Arguments:
- sql: the SQL query to be executed. This can read from any table or view from the catalog, or any dataframe registered as a temp view.
Returns:
Callable: A function to be called in .transform() spark function.