Skip to content

Generic

Generic (i.e. format-independent) logic for describing individual files and merging descriptions of individual files into descriptions of classes of files.

GROUPPAT module-attribute

GROUPPAT = re.compile('(.*?)((?:_|\\d)*\\d+)$')

Suffix patterns we auto-recognize for repeated objects.

FileDescription dataclass

FileDescription(fn: Path, standard: str | None = None, objects: list[dict[str, Any]] | None = None, errors: list[str] = list(), warnings: list[Warning] = list())

Simple dataclass for holding file descriptions.

assign_ordered_stemgroups

assign_ordered_stemgroups(objs: list[dict], stems: Collection[str]) -> tuple[list[dict], dict[str, int], str | None]

Heuristically group object/column names by stemming likely 'repeated' names (suffixed with numbers).

chunk_repeated_ordered_objects

chunk_repeated_ordered_objects(objlists: list[list[dict]]) -> tuple[list[list[dict]], str | None]

Find groups of 'repeated' ordered objects (HDUs or columns) shared among all HDU lists or schema described in objlists. Limited to finding 'repetitions' defined by variable numeric / underscore patterns suffixed to some stem, consistently ordered with respect to other HDUs / columns across objlists.

Returns:

  • objlists_mutated ( list[list[dict]] ) –

    objlists, but with "group_id" and "stem" added where relevant; or, if grouping failed, None

  • failure ( str | None ) –

    string describing failure if grouping failed; None if it succeeded

sanitize_object_description

sanitize_object_description(obj: dict) -> dict

Clean temporary identifiers added by unify_descriptions() functions in order to sanitize a description for usage in constructing Filetypes and DataObjects.

unify_column

unify_column(existing: dict, new: dict) -> tuple[dict, list | None]

Attempt to 'unify' two column descriptions.

Returns:

  • column ( dict ) –

    dict describing column unified from existing and new

  • failures ( list | None ) –

    list of failures if unification failed; empty list if it succeeded

unify_name

unify_name(existing: dict, new: dict) -> tuple[dict, str | None]

Attempt to unify two name specifications.

Returns:

  • name ( dict ) –

    dict giving name specification unified from existing and new

  • failure ( str | None ) –

    string describing failure if unification failed; None if it succeeded

unify_obj_description

unify_obj_description(existing: dict, new: dict) -> tuple[dict, str | None]

Attempt to 'unify' two object descriptions.

Returns:

  • unified ( dict ) –

    dict describing object unified from existing and new.

  • failure ( str | None ) –

    string describing failure if unification failed; None if it succeeded

unify_object_lists

unify_object_lists(objlists: list[list[dict]]) -> tuple[dict | None, str | None]

Attempt to 'unify' an arbitrary number of object lists, including unifying any schemata in those objects.

Returns:

  • objects ( dict | None ) –

    dict of unified objects (suitable for use in constructing DataObjects after passing through sanitize_object_description()) if unification succeeded; None if it failed

  • failure ( str | None ) –

    string describing failure if unification failed; None if it succeeded

unify_schema

unify_schema(existing: dict, new: list[dict]) -> tuple[dict, str | None]

Attempt to 'unify' two schemata, looking for repeated and variably-named columns, checking dtype and ndim match, etc.

Returns:

  • unified ( dict ) –

    schema created by unifying existing and new; or, if unification failed, an empty dict

  • failure ( str | None ) –

    string describing failure if unification failed; None if it succeeded