Basic transfer procedure
This represents the simplest case. Further configuration is possible for most of these steps.
Please refer to mast-upload's help and the complete label schema description
for a full description of functionality and options.
Complete the label
Once you receive a skeleton label from MAST:
- Add filetype name patterns and standards to skeleton label (see "Completing a label" below)
- Run
mast-upload populate-label /path/to/data/files /path/to/label_with_filetypes.yml, where /path/to/data/files is the root directory of a tree that contains your files (or a representative subset) - Read the populated label ({dataset}-{delivery_id}-populated.yml) and confirm that it accurately represents your files, or make edits if necessary
- run
mast-upload validate-all /path/to/data/files {dataset}-{delivery_id}-populated.ymlto confirm that the label passes validation against your files - Send the populated label to MAST for verification and possible further refinements.
Send a sample
Once MAST has accepted your label and informed you that you are authorized to send a sample:
- Select a representative sample of your data set and place it in its own directory tree (or manually prepare / edit an index file to list just those files)
- Run
mast-upload index /path/to/sample/data/files --make-checksums --output sample-index-filename.csvto produce an index for those files. Check the index to verify that it lists the files you intend to send. - Run
mast-upload transfer /path/to/sample/data/files /path/to/label_confirmed_by_mast.yml sample-index-filename.csv --sampleto upload those files to MAST. Wait for the transfer and validation to complete. If the transfer is interrupted or if you need to adjust some files to pass validation, run the command again to resume.
Send full data set
Once your sample has passed validation and been reviewed by MAST:
- Run
mast-upload index /path/to/all/data/files --make-checksums --output full-index-filename.csvto produce an index for your full data set. Check the index to verify that it lists the files you intend to send. - Run
mast-upload transfer /path/to/all/data/files /path/to/label_confirmed_by_mast.yml full-index-filename.csvto upload those files to MAST. Wait for the transfer and validation to complete. If the transfer is interrupted or if you need to adjust some files to pass validation, run the command again to resume.
Completing a label
Minimal label example
Here is a minimal example of a legal label. The skeleton label you receive from MAST will contain at least this information:
dataset: some-dataset
delivery_id: 1234
time:
delivery_start_date: 2026-11-14
delivery_meta:
schema_version: 0.1.0
However, for actual use, a "filetypes" section should almost always be added. Doing this is described in the next section.
Populating a label's filetypes section
Filenames and standards
The initial label you receive does not include descriptions of the specific kinds of files you plan to deliver. To
support validation, you will need to add these descriptions to the label. mast-upload includes commands to help you
build out more complete descriptions for data files, but you need to provide it with filename patterns and standards so
that it knows what files belong to what filetypes.
The simplest valid filetype description consists of a short name for the filetype, a filename pattern, and a standard name. "filename" is a regular expression matching filenames of that filetype. It should uniquely specify the filenames of that filetype: overlaps between filetypes are invalid. For FITS, ASDF, or Parquet files (which support data-level validation) "standard" must be, respectively, "fits", "asdf", or "parquet". For other kinds of files (which do not support data-level validation), "standard" can be any reasonable short identifier for the file format.
Here is a simple example of a filetypes section for a delivery that consists of a single kind of FITS file called " observation" along with PDF documentation files. The regular expressions here can be very broad because the dataset structure is simple:
filetypes:
observation:
filename: .*\.fits
standard: fits
docs:
filename: .*\.pdf
standard: pdf
If you instead had two types of FITS files distinguished by "cal" and "obs" prefixes, a single parquet file named " catalog.parquet", and PDF documentation files, you might instead write:
filetypes:
observation:
filename: obs.*\.fits
standard: fits
calibration:
filename: cal.*\.fits
standard: fits
catalog:
filename: catalog.parquet
standard: parquet
docs:
filename: .*\.pdf
standard: pdf
For ASDF, FITS, and Parquet files, even simple descriptions like this are enough to allow the validator to check that your files conform to their general format standard.
Specifying data objects
For more detailed validation of your specific ASDF, FITS, or Parquet file layouts, you must populate the objects
section of a filetype. This section describes the specific data objects, like tables and arrays, that appear within a
filetype. You can write the objects section manually, but it is usually easier to do it with the help of
mast-upload populate-label.
After filling out filetypes with filename patterns as described in the previous section, run
mast-upload populate-label <source> <label>, where <source> is either the root directory of a tree under which your
files are located or a prefix on an S3 bucket under which your files are located. S3 addresses must be preceded with '
s3://'.
populate-label will scan files associated with each filetype in your label in order to attempt to populate each
filetype's objects section (if it is of a standard that supports data validation).
<source> may contain either your complete data set or a representative subset of files for quicker analysis.
Optionally, you may also pass --output <output> to populate-label, where <output> is the name of the new populated
label file you wish to write (otherwise it writes to label.yml).
What is a 'filetype'?
A filetype is a group of files with a reasonably consistent structure: for instance, a group of FITS files with HDUs of the same extension and data types in the same order, a group of Parquet files that share a table schema, or a group of ASDF files with tables or arrays of the same types at the same paths.
The label schema also permits optional and variably repeated columns and objects, but populate-label will not
automatically describe every legal variable structure due to uncertainty about whether variability is "correct" or
unintentional. Specifically, it will only automatically detect variable numbers of repetitions when they follow the
common naming convention <CONSISTENT_NAME>_<VARIABLE_NUMBER>. It will detect columns or HDUs of consistent type and
HDU order but variable name. It will also detect 'repeated' ASDF nodes
defined by <CONSISTENT_NAME>_<VARIABLE_NUMBER> in the final element of the path.
Note that populate-label does not describe every single object in ASDF files; because of ASDF's flexibility, there is
too much ambiguity about what is "data" and what is "metadata", and populate-label does not intend to fully
recreate an ASDF schema. It describes arrays and tables of common types. If an ASDF filetype stores its data objects as
types populate-label does not recognize, you must manually describe those objects if you wish to validate that
filetype at object level.
Troubleshooting
If populate-label fails, it is possible that you accidentally described multiple distinct filetypes as one, or that
your filetype has a structure populate-label will not describe. You might want to try populate-label on a smaller
subset of your files to help disambiguate these issues.
Additional requirements for the validator, in particular metadata that must be present, may also be manually specified
in the label, but are not automatically filled in by populate-label.
Toy example
This repository includes a small 'toy' dataset and skeleton label you can use to
try the local commands of mast-upload.
For instance, from repository root:
mast-upload populate-label extras/fast_toy_dataset/data/ extras/fast_toy_dataset/toy-1.yml
will fill out extras/fast_toy_dataset/toy-1.yml and write the populated label as
toy-1-populated.yml.
mast-upload validate-all extras/fast_toy_dataset/data/ toy-1-populated.yml will then validate
the contents of extras/fast_toy_dataset/data against toy-1-populated.yml (this command
should report that all files are valid).
mast-upload validate-all extras/fast_toy_dataset/bad_data/ toy-1-populated.yml will then validate
the contents of extras/fast_toy_dataset/bad_data against toy-1-populated.yml (this command
should report validation failures for all files).
mast-upload index extras/fast_toy_dataset/data/ --make-checksums will index the contents
of extras/fast_toy_dataset/data and write an index.csv file containing paths and checksums.
mast-upload report-filetypes extras/fast_toy_dataset/data/ toy-1-populated.yml will write a
filetypes.csv file containing paths and assigned filetypes.