Schema utils
Utilities to facilitate dataframe schema management.
SchemaUtils
¶
Bases: object
Schema utils that help retrieve and manage schemas of dataframes.
Source code in mkdocs/lakehouse_engine/packages/utils/schema_utils.py
|
|
from_dict(struct_type)
staticmethod
¶
Get a spark schema from a dict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
struct_type |
dict
|
dict containing a spark schema structure. Check here. |
required |
Returns:
Type | Description |
---|---|
StructType
|
Spark schema struct type. |
Source code in mkdocs/lakehouse_engine/packages/utils/schema_utils.py
from_file(file_path, disable_dbfs_retry=False)
staticmethod
¶
Get a spark schema from a file (spark StructType json file) in a file system.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
str
|
path of the file in a file system. Check here. |
required |
disable_dbfs_retry |
bool
|
optional flag to disable file storage dbfs. |
False
|
Returns:
Type | Description |
---|---|
StructType
|
Spark schema struct type. |
Source code in mkdocs/lakehouse_engine/packages/utils/schema_utils.py
from_file_to_dict(file_path, disable_dbfs_retry=False)
staticmethod
¶
Get a dict with the spark schema from a file in a file system.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
str
|
path of the file in a file system. Check here. |
required |
disable_dbfs_retry |
bool
|
optional flag to disable file storage dbfs. |
False
|
Returns:
Type | Description |
---|---|
Any
|
Spark schema in a dict. |
Source code in mkdocs/lakehouse_engine/packages/utils/schema_utils.py
from_input_spec(input_spec)
classmethod
¶
Get a spark schema from an input specification.
This covers scenarios where the schema is provided as part of the input specification of the algorithm. Schema can come from the table specified in the input specification (enforce_schema_from_table) or by the dict with the spark schema provided there also.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_spec |
InputSpec
|
input specification. |
required |
Returns:
Type | Description |
---|---|
Optional[StructType]
|
spark schema struct type. |
Source code in mkdocs/lakehouse_engine/packages/utils/schema_utils.py
from_table_schema(table)
staticmethod
¶
Get a spark schema from a table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
table name from which to inherit the schema. |
required |
Returns:
Type | Description |
---|---|
StructType
|
Spark schema struct type. |
Source code in mkdocs/lakehouse_engine/packages/utils/schema_utils.py
schema_flattener(schema, prefix=None, level=1, max_level=None, shorten_names=False, alias=True, num_chars=7, ignore_cols=None)
staticmethod
¶
Recursive method to flatten the schema of the dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema |
StructType
|
schema to be flattened. |
required |
prefix |
str
|
prefix of the struct to get the value for. Only relevant for being used in the internal recursive logic. |
None
|
level |
int
|
level of the depth in the schema being flattened. Only relevant for being used in the internal recursive logic. |
1
|
max_level |
int
|
level until which you want to flatten the schema. Default: None. |
None
|
shorten_names |
bool
|
whether to shorten the names of the prefixes of the fields being flattened or not. Default: False. |
False
|
alias |
bool
|
whether to define alias for the columns being flattened or not. Default: True. |
True
|
num_chars |
int
|
number of characters to consider when shortening the names of the fields. Default: 7. |
7
|
ignore_cols |
List
|
columns which you don't want to flatten. Default: None. |
None
|
Returns:
Type | Description |
---|---|
List
|
A function to be called in .transform() spark function. |