Skip to content

Filtered Full Load with Selective Replace

This scenario is very similar to the Filtered Full Load, but we only replace a subset of the partitions, leaving the other ones untouched, so we don't replace the entire table. This capability is very useful for backfilling scenarios. As for other cases, the acon configuration should be executed with load_data using:

from lakehouse_engine.engine import load_data
acon = {...}
load_data(acon=acon)
Example of ACON configuration:
{
  "input_specs": [
    {
      "spec_id": "sales_source",
      "read_type": "batch",
      "data_format": "csv",
      "options": {
        "header": true,
        "delimiter": "|",
        "inferSchema": true
      },
      "location": "file:///app/tests/lakehouse/in/feature/full_load/with_filter_partition_overwrite/data"
    }
  ],
  "transform_specs": [
    {
      "spec_id": "filtered_sales",
      "input_id": "sales_source",
      "transformers": [
        {
          "function": "expression_filter",
          "args": {
            "exp": "date like '2016%'"
          }
        }
      ]
    }
  ],
  "output_specs": [
    {
      "spec_id": "sales_bronze",
      "input_id": "filtered_sales",
      "write_type": "overwrite",
      "data_format": "delta",
      "partitions": [
        "date",
        "customer"
      ],
      "location": "file:///app/tests/lakehouse/out/feature/full_load/with_filter_partition_overwrite/data",
      "options": {
        "replaceWhere": "date like '2016%'"
      }
    }
  ]

Relevant notes:
  • The key option for this scenario in the ACON is the replaceWhere, which we use to only overwrite a specific period of time, that realistically can match a subset of all the partitions of the table. Therefore, this capability is very useful for backfilling scenarios.