Sap b4 extraction utils

Utilities module for SAP B4 extraction processes.

`ADSOTypes` ¶

Bases: Enum

Standardise the types of ADSOs we can have for Extractions from SAP B4.

Source code in mkdocs/lakehouse_engine/packages/utils/extraction/sap_b4_extraction_utils.py

class ADSOTypes(Enum):
    """Standardise the types of ADSOs we can have for Extractions from SAP B4."""

    AQ: str = "AQ"
    CL: str = "CL"
    SUPPORTED_TYPES: list = [AQ, CL]

`SAPB4Extraction` `dataclass` ¶

Bases: JDBCExtraction

Configurations available for an Extraction from SAP B4.

It inherits from JDBCExtraction configurations, so it can use and/or overwrite those configurations.

These configurations cover:

latest_timestamp_input_col: the column containing the request timestamps in the dataset in latest_timestamp_data_location. Default: REQTSN.
request_status_tbl: the name of the SAP B4 table having information about the extraction requests. Composed of database.table. Default: SAPHANADB.RSPMREQUEST.
request_col_name: name of the column having the request timestamp to join with the request status table. Default: REQUEST_TSN.
data_target: the data target to extract from. User in the join operation with the request status table.
act_req_join_condition: the join condition into activation table can be changed using this property. Default: 'tbl.reqtsn = req.request_col_name'.
include_changelog_tech_cols: whether to include the technical columns (usually coming from the changelog) table or not.
extra_cols_req_status_tbl: columns to be added from request status table. It needs to contain the prefix "req.". E.g. "req.col1 as column_one, req.col2 as column_two".
request_status_tbl_filter: filter to use for filtering the request status table, influencing the calculation of the max timestamps and the delta extractions.
adso_type: the type of ADSO that you are extracting from. Can be "AQ" or "CL".
max_timestamp_custom_schema: the custom schema to apply on the calculation of the max timestamp to consider for the delta extractions. Default: timestamp DECIMAL(23,0).
default_max_timestamp: the timestamp to use as default, when it is not possible to derive one.
custom_schema: specify custom_schema for particular columns of the returned dataframe in the init/delta extraction of the source table.

Source code in mkdocs/lakehouse_engine/packages/utils/extraction/sap_b4_extraction_utils.py

@dataclass
class SAPB4Extraction(JDBCExtraction):
    """Configurations available for an Extraction from SAP B4.

    It inherits from JDBCExtraction configurations, so it can use
    and/or overwrite those configurations.

    These configurations cover:

    - latest_timestamp_input_col: the column containing the request timestamps
        in the dataset in latest_timestamp_data_location. Default: REQTSN.
    - request_status_tbl: the name of the SAP B4 table having information
        about the extraction requests. Composed of database.table.
        Default: SAPHANADB.RSPMREQUEST.
    - request_col_name: name of the column having the request timestamp to join
        with the request status table. Default: REQUEST_TSN.
    - data_target: the data target to extract from. User in the join operation with
        the request status table.
    - act_req_join_condition: the join condition into activation table
        can be changed using this property.
        Default: 'tbl.reqtsn = req.request_col_name'.
    - include_changelog_tech_cols: whether to include the technical columns
        (usually coming from the changelog) table or not.
    - extra_cols_req_status_tbl: columns to be added from request status table.
        It needs to contain the prefix "req.". E.g. "req.col1 as column_one,
        req.col2 as column_two".
    - request_status_tbl_filter: filter to use for filtering the request status table,
        influencing the calculation of the max timestamps and the delta extractions.
    - adso_type: the type of ADSO that you are extracting from. Can be "AQ" or "CL".
    - max_timestamp_custom_schema: the custom schema to apply on the calculation of
        the max timestamp to consider for the delta extractions.
        Default: timestamp DECIMAL(23,0).
    - default_max_timestamp: the timestamp to use as default, when it is not possible
        to derive one.
    - custom_schema: specify custom_schema for particular columns of the
        returned dataframe in the init/delta extraction of the source table.
    """

    latest_timestamp_input_col: str = "REQTSN"
    request_status_tbl: str = "SAPHANADB.RSPMREQUEST"
    request_col_name: str = "REQUEST_TSN"
    data_target: Optional[str] = None
    act_req_join_condition: Optional[str] = None
    include_changelog_tech_cols: Optional[bool] = None
    extra_cols_req_status_tbl: Optional[str] = None
    request_status_tbl_filter: Optional[str] = None
    adso_type: Optional[str] = None
    max_timestamp_custom_schema: str = "timestamp DECIMAL(23,0)"
    default_max_timestamp: str = "1970000000000000000000"
    custom_schema: str = "REQTSN DECIMAL(23,0)"