Writer
Defines abstract writer behaviour.
Writer
¶
Bases: ABC
Abstract Writer class.
Source code in mkdocs/lakehouse_engine/packages/io/writer.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
__init__(output_spec, df, data=None)
¶
Construct Writer instances.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_spec |
OutputSpec
|
output specification to write data. |
required |
df |
DataFrame
|
dataframe to write. |
required |
data |
OrderedDict
|
list of all dfs generated on previous steps before writer. |
None
|
Source code in mkdocs/lakehouse_engine/packages/io/writer.py
get_streaming_trigger(output_spec)
classmethod
¶
Define which streaming trigger will be used.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_spec |
OutputSpec
|
output specification. |
required |
Returns:
Type | Description |
---|---|
Dict
|
A dict containing streaming trigger. |
Source code in mkdocs/lakehouse_engine/packages/io/writer.py
get_transformed_micro_batch(output_spec, batch_df, batch_id, data)
classmethod
¶
Get the result of the transformations applied to a micro batch dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_spec |
OutputSpec
|
output specification associated with the writer. |
required |
batch_df |
DataFrame
|
batch dataframe (given from streaming foreachBatch). |
required |
batch_id |
int
|
if of the batch (given from streaming foreachBatch). |
required |
data |
OrderedDict
|
list of all dfs generated on previous steps before writer to be available on micro batch transforms. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed dataframe. |
Source code in mkdocs/lakehouse_engine/packages/io/writer.py
run_micro_batch_dq_process(df, dq_spec)
staticmethod
¶
Run the data quality process in a streaming micro batch dataframe.
Iterates over the specs and performs the checks or analysis depending on the data quality specification provided in the configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
the dataframe in which to run the dq process on. |
required |
dq_spec |
List[DQSpec]
|
data quality specification. |
required |
Source code in mkdocs/lakehouse_engine/packages/io/writer.py
write()
abstractmethod
¶
write_transformed_micro_batch(**kwargs)
staticmethod
¶
Define how to write a streaming micro batch after transforming it.
This function must define an inner function that manipulates a streaming batch, and then return that function. Look for concrete implementations of this function for more clarity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kwargs |
Any
|
any keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
Callable
|
A function to be executed in the foreachBatch spark write method. |