Filtered Full Load¶

This scenario is very similar to the full load, but it filters the data coming As for other cases, the acon configuration should be executed with load_data using:

acon load

Example of ACON configuration:

from the source, instead of doing a complete full load. href="#__codelineno-0-1">from lakehouse_engine.engine import load_data = {...} _data(acon=acon) href="#__codelineno-1-1">{ "input_specs": [ { "spec_id": "sales_source", "read_type": "batch", "data_format": "csv", "options": { "header": true, "delimiter": "|", "inferSchema": true }, "location": "file:///app/tests/lakehouse/in/feature/full_load/with_filter/data" } ], "transform_specs": [ { "spec_id": "filtered_sales", "input_id": "sales_source", "transformers": [ { "function": "expression_filter", "args": { "exp": "date like '2016%'" } } ] } ], "output_specs": [ { "spec_id": "sales_bronze", "input_id": "filtered_sales", "write_type": "overwrite", "data_format": "parquet", "location": "file:///app/tests/lakehouse/out/feature/full_load/with_filter/data" } ]

Relevant notes:¶

As seen in the ACON, the filtering capabilities are provided by a transformer called expression_filter, where you can provide a custom Spark SQL filter.