Description of dmdScheme

dmdScheme Version v0.9.8

Rainer M Krug Rainer@uzh.ch

2019-05-24

Introduction

This scheme will provide a framework for describing aquatic microcosm experiments in the sense as used in the papers in the [References] section.

It aims at providing a structured way to describe the data and to make finding the data using the provided metadata possible.

This scheme does neither aim at covering aspects of the actual analysis of the extracted data nor does it aim at giving all information to re-run the experiment.

Terminology and Conventions

Terminology

Conventions

Relationship to other schemes

The dmdScheme is a metadata scheme for Ecological Microcosm Experiments. As these are essentially ecological data, the use of other schemes geareed towards ecological data also comes to mind. One widely used scheme is the eml scheme.

The main difference between the dmdScheme and other ecological meta data schemes is that in the development of the dmdScheme the aim was to develop a scheme specific for a certain type of experiment. This specificity went together with the objective of keeping the scheme simple to fill in and to understand. This resulted in a meta data scheme which contains all information necessary to describe the data generated in these aquatic ecological microcosm experiments, while at the same time being simple enough to be filled in by the researcher without to much time required. It should be possible to use e.g. the afore mentioned eml scheme to store the meta data contained in the dmdScheme, but it would require a much larger investment in time to fill in the eml scheme as it is much more general per definition.

The dmdScheme is neither intended nor suited for experiments outside of ecological and aquatic microcosm experiments.

Structure of the dmdScheme

The dmdScheme is providing meta data for a data file bundle. A data bundle is an archive (e.g. tar.gz or zip) cosisting of multiple data files and one file with the metadata TODO we have to decide on the format of this file - should be a text file?. If the data files represent tabular data, they should be in csv format, otherwise any open format.

It is a property set (dmdSchemeSet) which contains of five different sets of data properties (dmdSchemeData) which are tables of metadata.

These five data properties are:

  1. Experiment: General informtion about the experiment. Each dataset can only have one.
  2. Species: Species used in the experiment. Each dataset can have multiple of these.
  3. Treatment: Description of the treatment and their differences. Each dataset can have multiple of these.
  4. Measurement: Measurement methodology and parameter. Each dataset can have multiple of these.
  5. DataExtraction: Methodology of extraction to extract the data to be analysed from the raw data resulting from the Measurement. Each dataset can have multiple of these.
  6. TableMetaData: Meta data for each data file. If the data file contains tabular data, a description of the columns and a specification of the column containing the different parameters between treeatments, if other format, e.g. video, a short description of the video and the name of the treatment.

It is important to note that

  1. All data properties can obtain multiple rows (i.e. are repeatable) of metadata with the exception of the Experiment which can contain only one row of metadata.
  2. Many value properties have an attribute suggesteddValues. Any value can be entered, but, if possible, one should choose a value from the list.
label

label

The dmdScheme

This document was than re-ordered and thinned which resulted in the initial verions of the dmdScheme.

Here we describe the structure of the dmdScheme

dmdScheme Structure

Experiment

propertySet valueProperty unit type suggestedValues Description DATA_v0.9.5
Experiment name NA character NA The name of the experiment. ASR-expt1
NA temperature NA character treatment, in degrees celsius, measurement Temperature used for all treatments. If different between treatments, use “treatment” and specify in the Treatment sheet. 20
NA light NA character treatment,light, dark, cycle , e.g. 16:8 LD Light used for all treatments. If different between treatments, use “treatment” and specify in the Treatment sheet. semi-ambient
NA humidity NA character treatment, relative humidity in % Humidity used for all treatments. If different between treatments, use “treatment” and specify in the Treatment sheet. ambient
NA incubator NA character none, bench What type of incubator is used. not given here
NA container NA character NA What type of container is used. Duran type bottle, red lids, 250ml
NA microcosmVolume ml numeric NA Volume of the microcosm container. Not the volume of the culture medium! 100
NA mediaType NA character NA NA PPM
NA mediaConcentration g/l numeric NA NA 0.55
NA cultureConditions NA character axenic, dirty, clean Conditions of the cultures for all treatments. dirty
NA comunityType NA character treatment, single trophic level, multiple trophic level Characterisation of the microbe community. initially unknown
NA mediaAdditions NA character NA NA Wheat seeds added on specific dates, see file wheat_seed_additions.csv
NA duration days integer NA Length of the experiment in days. This should only include the time in which the measurements were taken! 100
NA comment NA character NA Additional features of the Experiment you want to provide NA

Treatment

propertySet Treatments …3 …4
valueProperty treatmentID treatmentLevelHeight comment
unit NA NA NA
type character character character
suggestedValues species, temperatur, light, initial density, comunity composition, densities, dispersal, viscosity, disturbance, communityType value, variable: freetext NA
Description ID of the the treatment decribed in this a row. Each treatmentId can occur multiple times as it can contain multiple treatment levels. The value of the parameter if the parameter is constant over time, or a description of the variability. If unit is speciesId, comma separated list of all species in the treatment. NA
DATA Lid_treatment Loose NA
MULTIPLE ROWS Lid_treatment Tight NA
NA species_1 tt_1, unknown NA
NA species_2 unknown NA
NA species_3 tt_1 NA

Measurement

propertySet Measurement …3 …4 …5 …6 …7 …8 …9 …10 …11
valueProperty measurementID variable method unit object noOfSamplesInTimeSeries samplingVolume dataExtractionID measuredFrom comment
unit NA NA NA NA NA NA ml NA NA NA
type character character character character character integer numeric character character character
suggestedValues NA O2 concentration, video, manual count, abundance, DNA presens Optode, microscopy %, mmol, count species, OUT, gene, community, particles NA NA NA NA NA
Description Id of the Measurement process. This includes methodology, variables . Each measurementId specifies one Measurement process and must be unique in this column. Should be in the mapping column in the DataFileMetaData tab. The variable measured. Name of the method used. Unit of the measured variable The object measured. E.g. species in the case of manual count, gene for genetic analysis, particle for particle counters. Total number of all samples in the time series. The sampling volumne. If e.g. atmosphere in container is sampled (oxygen measurements), than enter 0. Please use NA if sampling volumne is variable. as used in the sheet DataExtraction, column dataExtractionID if measured from the experiment, raw, else the measurementId (first column) of the Measurement it is based on. NA
DATA oxygen concentration DO presens Optode % community 50 0 none raw NA
MULTIPLE ROWS abundance abundance molecular count species 6 0.5 Mol_Analy_pipeline1 sequenceData NA
NA smell smell nose rotten eggs or not community 6 0 none raw NA
NA sequenceData DNA NGS Nucleotide DNA fragment 6 0 none raw NA

DataExtraction

propertySet DataExtraction …3 …4 …5 …6
valueProperty dataExtractionID method parameter value comment
unit NA NA NA NA NA
type character character character character character
suggestedValues NA bemovi x.y.z NA NA NA
Description Name of the DataExtraction process. This includes methodology, variables . Each name specifies one extraction process and can occur multiple times in the case of multiple parameters in the analysis. Method used for the DataExtraction process. If possible including version (in the case of R packages). parameter in the analysis. Only needs to be specified if it varies from the default. value of the parameter (you can enter a number or a word) NA
DATA Mol_Analy_pipeline1 NA NA NA See description in file xxx.yyy
MULTIPLE ROWS NA NA NA NA NA

DataFile

propertySet DataFileMetaData …3 …4 …5 …6 …7 …8
valueProperty dataFileName columnName columnData mappingColumn type description comment
unit NA NA NA NA NA NA NA
type character character character character character character character
allowedValues NA NA ID, Treatment, Measurement, Species, other NA integer, numeric, character, logical, datetime, date, time NA NA
Description the name of the data set. Name of column in the data file. Each column in the data file needs to be documented! or NA if it is for the whole data file and not specified in the dataFileName The type of the data in the column. ID: ID field (unique ID of unit of replication); Treatment: specifies treatment; Measurement: contains measurements; Species: contains species; other: other type of data columnData = Treatment: treatmentID as in the Treatment tab; columnData = Species: treatmentID refering to species composition as in the Treatment tab columnData = Measurement: measurementID as in the Measurement tab; otherwise: NA Type of the column. if column contains measurement: General description. If type is datatime, date, or time, give the order of year month day hour minute second as e.g. ymdhms, ymd, or hms. (Do not give any other information, e.g. give nothing about how months are entered (e.g. number or name), or how years, months, day, etc are separated. NA
DATA dissolved_oxygen_measures.csv Jar_ID ID NA character NA NA
MULTIPLE ROWS dissolved_oxygen_measures.csv DO Measurement oxygen concentration numeric NA NA
NA dissolved_oxygen_measures.csv Unit_1 other NA character NA NA
NA dissolved_oxygen_measures.csv Mode other NA character NA NA
NA dissolved_oxygen_measures.csv Location other NA character NA NA
NA dissolved_oxygen_measures.csv Date_time other NA datetime ymdhms NA
NA dissolved_oxygen_measures.csv Lid_treatment Treatment Lid_treatment character NA NA
NA dissolved_oxygen_measures.csv Jar_type other NA character NA NA
NA dissolved_oxygen_measures.csv Jar_ID ID NA character NA NA
NA smell.csv NA Species species_1 character NA NA
NA smell.csv smell Measurement smell character NA NA
NA smell.csv Date other NA datetime ymdhms NA
NA smell.csv Lid_treatment Treatment Lid_treatment character NA NA
NA smell.csv Jar_type other NA character NA NA
NA abundances.csv NA Species species_3 character NA NA
NA abundances.csv Jar_ID ID NA character NA NA
NA abundances.csv Date_time other NA datetime ymdhms NA
NA abundances.csv Lid_treatment Treatment Lid_treatment character NA NA
NA abundances.csv Jar_type other NA character NA NA
NA abundances.csv count_number Measurement abundance numeric NA NA

A property set which contains data properties must be seen as tables and must therefore have the same number of entries for each data property.

The structure as at 2019-05-24 16:34:28 GMT is as followed:

Example data

An xml file with the example data can be downloaded from here

XSD Grammar

The xsd grammar has been generated using xmlgrid.net. You can download it from here - right mouse click - Save Linked Content