The BioSimulators format for data outlines how simulation results for SED reports and plots should be encoded into Hierarachical Data Format (HDF) 5 . These conventions are capable of capturing multidimensional reports and plots, reports and plots whose data sets have different shapes and data types, and reports whose data sets have repeated labels.

Data for reports and plots of simulation results should be saved in HDF5 according to the following conventions:

  • Paths of reports and plots: Within the HDF5 file, each report/plot should be saved to a path equal to the combination of (a) the relative path of parent SED-ML document within the parent COMBINE/OMEX archive and (b) the id of the report/plot. For example, a report with id time_course_results in a SED-ML file located at ./path/to/experiment.sedml should be saved to the path path/to/experiment.sedml/time_course_results.
  • Data set shapes: For SED reports, the rows of each HDF5 dataset should correspond to the SED data sets (sedml:dataSet) specified in the SED-ML definition of the report (e.g., time symbol, specific model varibles). For SED plots, the rows of each HDF5 dataset should correspond to the SED data generators (sedml:dataGenerator) specified in the SED-ML definition of the plot (e.g., time symbol, specific model varibles).

    • sedml:task:
      • Steady-state simulations: The rows of HDF5 data sets should be scalars.
      • One step simulations: The rows of HDF5 data sets should be tuples of the start and end points of the simulation.
      • Time course simulations: The rows of HDF5 data sets should be a vector with length equal to the number of steps of the time course + 1.
      • Simulations of spatial models: The rows of HDF5 data sets should be matrices whose dimensions represent space and time.
    • sedml:repeatedTask: The first dimension of each row should represent the iterations of the tasks that produced its values. The second dimension of each data set should represent the individual sub-tasks of the task. The results of sub-tasks should be ordered in the same order the sub-tasks were executed (in order of their order attributes). If repeated tasks are nested within repeated tasks, the next dimensions should alternate between representing the iterations and sub-tasks of the nested repeated tasks. The final dimensions of each row should be encoded as above for sedml:task. For example, non-spatial time course simulations should have a single additional dimension of length equal to the number of steps of the time course + 1.

    If the rows of an HDF5 data set have different shapes, the data sets should be reshaped into a consistent shape by right-padding their values with NaN.

  • Metadata for reports: The following metadata should be encoded into attributes of the corresponding HDF5 dataset.

    • Type of the output: The type of the output (Report, Plot2D, Plot3D) should be encoded into the key _type.
    • Complete id of the output: The complete id of the output (combination of the location of the parent SED-ML file of the output (omex-manifest:content/@location) within its parent COMBINE archive and the SED-ML id of the output (sed:output/@id)) should be encoded into the key uri.
    • Id of the output: The SED-ML id of the output (sed:output/@id) should be encoded into the key sedmlId.
    • Name of the output: The name of the output (sed:output/@name) should be encoded into the key sedmlName.
    • Ids of rows (SED data sets or data generators): For reports, the ids of the data sets should be encoded into the key sedmlDataSetIds. The value of this key should be an array of the ids of the data sets, in the order in which the data sets were defined in their parent SED document. For plots, the ids of the data generators should be encoded into the key sedmlDataSetIds. The value of this key should be an array of the ids of the data generators, in the order in which the data generators were defined in their parent SED document.
    • Names of row (SED data sets or data generators): For reports, the names of the data sets should be encoded into the key sedmlDataSetNames. For plots, the names of the data generators should be encoded into the key sedmlDataSetNames. The value of this key should be an array of the ids of the data sets, in the order in which the data sets were defined in their parent SED document.
    • Labels of rows (SED data sets or data generators): For reports, the labels of the data sets should be encoded into the key sedmlDataSetLabels. For plots, the id of the data generators should be encoded into the key sedmlDataSetLabels. The value of this key should be an array of the labels of the data sets, in the order in which the data sets were defined in their parent SED document.
    • Data types of SED data sets/generators: The data types of the data sets (reports) or data generators (plots) should be encoded into the key sedmlDataSetDataTypes. The value of this key should be an array of the data types of the data sets/generators, in the order in which the data sets/generators were defined in their parent SED document. The data type of each data set should be described using a NumPy dtype (e.g., int64) to indicate a data set whose value is non-null or __None__ to indicate a data set whose value is null.
    • Shapes of SED data sets/generators: The shapes of the data sets (reports) or data generators (plots) should be encoded into the key sedmlDataSetShapes. The value of this key should be an array of comma-separated lists of the shapes of the data sets/generators. The shapes of the data sets/generators should be listed in the order in which the data sets/generators were defined in their parent SED document.
  • Metadata for SED-ML files: The following metadata should be encoded into attributes of the parent groups of HDF5 datasets which represent SED-ML files and their parent directories within their parent COMBINE archives.

    • Complete id of the COMBINE archive location: The location of each SED-ML file and the location of each parent directory of each SED-ML file with their parent COMBINE archive (omex-manifest:content/@location) should be encoded into the keys uri and combineArchiveLocation.

Several example reports are available here.

Below is a graphical illustration of the organization of a HDF5 file for a SED report with id report-1 defined in a SED-ML file located at experiment-1/batch-1/simulation-1.sedml within a COMBINE/OMEX archive.

Path of the HDF5 dataset for the SED report

experiment-1/batch-1/simulation-1.sedml/report-1

HDF5 dataset for the SED report

time 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0
metabolite_a 4 8 9 4 8 9 4 8 9 4 8 NaN NaN NaN
metabolite_b 3 1 5 4 3 6 7 5 4 NaN NaN NaN NaN NaN
sum_metabolite_a_b 7 4 3 3 3 4 5 6 6 NaN NaN NaN NaN NaN
ratio_flux_c_d 1.0 6.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 5.0 5.0 5.0 4.0 4.0

Attributes of the HDF5 dataset for the SED report

  • _type: Report
  • uri: experiment-1/batch-1/simulation-1.sedml/report-1
  • sedmlId: report-1
  • sedmlName: Report 1
  • sedmlDataSetIds: time, metabolite_a, metabolite_b, sum_metabolite_a_b, ratio_flux_c_d
  • sedmlDataSetLabels: Time, Metabolite A, Metabolite B, Sum of metabolites A and B, Flux ratio of reactions C and D
  • sedmlDataSetDataTypes: float64, int64, int64, int64, float64
  • sedmlDataSetShapes: 14, 9, 11, 9, 14

Attributes of the HDF5 groups for the SED-ML file and its parent subdirectories

  • experiment-1 HDF5 group for the grandparent directory of the SED-ML file
    • uri: experiment-1
    • combineArchiveLocation: experiment-1
  • experiment-1/batch-1 HDF5 group for the parent directory of the SED-ML file
    • uri: experiment-1/batch-1
    • combineArchiveLocation: experiment-1/batch-1
  • experiment-1/batch-1/simulation-1.sedml HDF5 group for the SED-ML file
    • uri: experiment-1/batch-1/simulation-1.sedml
    • combineArchiveLocation: experiment-1/batch-1/simulation-1.sedml

Below are helpful tools for building reports of simulation results:

  • BioSimulators utils is a Python library which provides functions for generating reports to the above specifications.
  • h5py is a high-level Python library for reading and writing HDF5 files.
  • HDF5 libraries for C, C++, and Java.