Generic
Generic (i.e. format-independent) logic for describing individual files and merging descriptions of individual files into descriptions of classes of files.
GROUPPAT
module-attribute
GROUPPAT = re.compile('(.*?)((?:_|\\d)*\\d+)$')
Suffix patterns we auto-recognize for repeated objects.
FileDescription
dataclass
FileDescription(fn: Path, standard: str | None = None, objects: list[dict[str, Any]] | None = None, errors: list[str] = list(), warnings: list[Warning] = list())
Simple dataclass for holding file descriptions.
assign_ordered_stemgroups
assign_ordered_stemgroups(objs: list[dict], stems: Collection[str]) -> tuple[list[dict], dict[str, int], str | None]
Heuristically group object/column names by stemming likely 'repeated' names (suffixed with numbers).
chunk_repeated_ordered_objects
chunk_repeated_ordered_objects(objlists: list[list[dict]]) -> tuple[list[list[dict]], str | None]
Find groups of 'repeated' ordered objects (HDUs or columns) shared among all HDU lists or schema described in objlists. Limited to finding 'repetitions' defined by variable numeric / underscore patterns suffixed to some stem, consistently ordered with respect to other HDUs / columns across objlists.
Returns:
-
objlists_mutated(list[list[dict]]) –objlists, but with "group_id" and "stem" added where relevant; or, if grouping failed, None -
failure(str | None) –string describing failure if grouping failed; None if it succeeded
sanitize_object_description
sanitize_object_description(obj: dict) -> dict
Clean temporary identifiers added by unify_descriptions() functions in order to sanitize a description for usage in constructing Filetypes and DataObjects.
unify_column
unify_column(existing: dict, new: dict) -> tuple[dict, list | None]
Attempt to 'unify' two column descriptions.
Returns:
-
column(dict) –dict describing column unified from
existingandnew -
failures(list | None) –list of failures if unification failed; empty list if it succeeded
unify_name
unify_name(existing: dict, new: dict) -> tuple[dict, str | None]
Attempt to unify two name specifications.
Returns:
-
name(dict) –dict giving name specification unified from
existingandnew -
failure(str | None) –string describing failure if unification failed; None if it succeeded
unify_obj_description
unify_obj_description(existing: dict, new: dict) -> tuple[dict, str | None]
Attempt to 'unify' two object descriptions.
Returns:
-
unified(dict) –dict describing object unified from
existingandnew. -
failure(str | None) –string describing failure if unification failed; None if it succeeded
unify_object_lists
unify_object_lists(objlists: list[list[dict]]) -> tuple[dict | None, str | None]
Attempt to 'unify' an arbitrary number of object lists, including unifying any schemata in those objects.
Returns:
-
objects(dict | None) –dict of unified objects (suitable for use in constructing
DataObjects after passing throughsanitize_object_description()) if unification succeeded; None if it failed -
failure(str | None) –string describing failure if unification failed; None if it succeeded
unify_schema
unify_schema(existing: dict, new: list[dict]) -> tuple[dict, str | None]
Attempt to 'unify' two schemata, looking for repeated and variably-named columns, checking dtype and ndim match, etc.
Returns:
-
unified(dict) –schema created by unifying
existingandnew; or, if unification failed, an empty dict -
failure(str | None) –string describing failure if unification failed; None if it succeeded