Filtered Full Load with Selective Replace¶
This scenario is very similar to the Filtered Full Load, but we only replace a subset of the partitions, leaving the other ones untouched, so we don't replace the entire table. This capability is very useful for backfilling scenarios.
As for other cases, the acon configuration should be executed with load_data
using:
{
"input_specs": [
{
"spec_id": "sales_source",
"read_type": "batch",
"data_format": "csv",
"options": {
"header": true,
"delimiter": "|",
"inferSchema": true
},
"location": "file:///app/tests/lakehouse/in/feature/full_load/with_filter_partition_overwrite/data"
}
],
"transform_specs": [
{
"spec_id": "filtered_sales",
"input_id": "sales_source",
"transformers": [
{
"function": "expression_filter",
"args": {
"exp": "date like '2016%'"
}
}
]
}
],
"output_specs": [
{
"spec_id": "sales_bronze",
"input_id": "filtered_sales",
"write_type": "overwrite",
"data_format": "delta",
"partitions": [
"date",
"customer"
],
"location": "file:///app/tests/lakehouse/out/feature/full_load/with_filter_partition_overwrite/data",
"options": {
"replaceWhere": "date like '2016%'"
}
}
]
Relevant notes:¶
- The key option for this scenario in the ACON is the
replaceWhere
, which we use to only overwrite a specific period of time, that realistically can match a subset of all the partitions of the table. Therefore, this capability is very useful for backfilling scenarios.