Skip to content

Watermarker

Watermarker module.

Watermarker

Bases: object

Class containing all watermarker transformers.

Source code in mkdocs/lakehouse_engine/packages/transformers/watermarker.py
class Watermarker(object):
    """Class containing all watermarker transformers."""

    _logger = LoggingHandler(__name__).get_logger()

    @staticmethod
    def with_watermark(watermarker_column: str, watermarker_time: str) -> Callable:
        """Get the dataframe with watermarker defined.

        Args:
            watermarker_column: name of the input column to be considered for
                the watermarking. Note: it must be a timestamp.
            watermarker_time: time window to define the watermark value.

        Returns:
            A function to be executed on other transformers.

        {{get_example(method_name='with_watermark')}}
        """

        def inner(df: DataFrame) -> DataFrame:
            return df.withWatermark(watermarker_column, watermarker_time)

        return inner

with_watermark(watermarker_column, watermarker_time) staticmethod

Get the dataframe with watermarker defined.

Parameters:

Name Type Description Default
watermarker_column str

name of the input column to be considered for the watermarking. Note: it must be a timestamp.

required
watermarker_time str

time window to define the watermark value.

required

Returns:

Type Description
Callable

A function to be executed on other transformers.

View Example of with_watermark (See full example here)
21{
22    "function": "with_watermark",
23    "args": {
24        "watermarker_column": "date",
25        "watermarker_time": "2 days"
26    }
27}
Source code in mkdocs/lakehouse_engine/packages/transformers/watermarker.py
@staticmethod
def with_watermark(watermarker_column: str, watermarker_time: str) -> Callable:
    """Get the dataframe with watermarker defined.

    Args:
        watermarker_column: name of the input column to be considered for
            the watermarking. Note: it must be a timestamp.
        watermarker_time: time window to define the watermark value.

    Returns:
        A function to be executed on other transformers.

    {{get_example(method_name='with_watermark')}}
    """

    def inner(df: DataFrame) -> DataFrame:
        return df.withWatermark(watermarker_column, watermarker_time)

    return inner