Dataset optimizer
Module with dataset optimizer terminator.
DatasetOptimizer
¶
Bases: object
Class with dataset optimizer terminator.
Source code in mkdocs/lakehouse_engine/packages/terminators/dataset_optimizer.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
optimize_dataset(db_table=None, location=None, compute_table_stats=True, vacuum=True, vacuum_hours=720, optimize=True, optimize_where=None, optimize_zorder_col_list=None, debug=False)
classmethod
¶
Optimize a dataset based on a set of pre-conceived optimizations.
Most of the time the dataset is a table, but it can be a file-based one only.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_table |
Optional[str]
|
|
None
|
location |
Optional[str]
|
dataset/table filesystem location. |
None
|
compute_table_stats |
bool
|
to compute table statistics or not. |
True
|
vacuum |
bool
|
(delta lake tables only) whether to vacuum the delta lake table or not. |
True
|
vacuum_hours |
int
|
(delta lake tables only) number of hours to consider in vacuum operation. |
720
|
optimize |
bool
|
(delta lake tables only) whether to optimize the table or not. Custom optimize parameters can be supplied through ExecEnv (Spark) configs |
True
|
optimize_where |
Optional[str]
|
expression to use in the optimize function. |
None
|
optimize_zorder_col_list |
Optional[List[str]]
|
(delta lake tables only) list of columns to consider in the zorder optimization process. Custom optimize parameters can be supplied through ExecEnv (Spark) configs. |
None
|
debug |
bool
|
flag indicating if we are just debugging this for local tests and therefore pass through all the exceptions to perform some assertions in local tests. |
False
|