Condensers
Condensers module.
Condensers
¶
Bases: object
Class containing all the functions to condensate data for later merges.
Source code in mkdocs/lakehouse_engine/packages/transformers/condensers.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
condense_record_mode_cdc(business_key, record_mode_col, valid_record_modes, ranking_key_desc=None, ranking_key_asc=None)
classmethod
¶
Condense Change Data Capture (CDC) based on record_mode strategy.
This CDC data is particularly seen in some CDC enabled systems. Other systems may have different CDC strategies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
business_key |
List[str]
|
The business key (logical primary key) of the data. |
required |
ranking_key_desc |
Optional[List[str]]
|
In this type of CDC condensation the data needs to be in descending order in a certain way, using columns specified in this parameter. |
None
|
ranking_key_asc |
Optional[List[str]]
|
In this type of CDC condensation the data needs to be in ascending order in a certain way, using columns specified in this parameter. |
None
|
record_mode_col |
str
|
Name of the record mode input_col. |
required |
valid_record_modes |
List[str]
|
Depending on the context, not all record modes may be considered for condensation. Use this parameter to skip those. |
required |
Returns:
Type | Description |
---|---|
Callable
|
A function to be executed in the .transform() spark function. |
View Example of condense_record_mode_cdc (See full example here)
20{
21 "function": "condense_record_mode_cdc",
22 "args": {
23 "business_key": [
24 "salesorder",
25 "item"
26 ],
27 "ranking_key_desc": [
28 "extraction_timestamp",
29 "actrequest_timestamp",
30 "datapakid",
31 "partno",
32 "record"
33 ],
34 "record_mode_col": "recordmode",
35 "valid_record_modes": [
36 "",
37 "N",
38 "R",
39 "D",
40 "X"
41 ]
42 }
43}
Source code in mkdocs/lakehouse_engine/packages/transformers/condensers.py
group_and_rank(group_key, ranking_key, descending=True)
classmethod
¶
Condense data based on a simple group by + take latest mechanism.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_key |
List[str]
|
list of column names to use in the group by. |
required |
ranking_key |
List[str]
|
the data needs to be in descending order using columns specified in this parameter. |
required |
descending |
bool
|
if the ranking considers descending order or not. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
Callable
|
A function to be executed in the .transform() spark function. |