The data is supplied from different research groups and commercial companies that participate in this initiative. The data is filtered and pre-formatted into a uniform format between datasets in order to facilitate future handeling, import, etc. At the moment, this is done by human supervision. If you are interested in contributing with data for this database, please contact: dmpassos@ualg.pt

We can provide you with a simple template .xlsx for filling in some of the metadata requested.

The guidelines for the datasets are the following:

Specs for the Cereal NIR Database:
Input Type (X): NIR reflectance (original, unprocessed spectra)
Range: 400 – 2500 nm (or subset)
Spectral (optical) resolution: Preferably < 15 nm (acceptable until 20 nm)
Target Measurements (Y): protein and/or moisture
Sample cereal (whole grain): maize (corn), rice, wheat, barley, sorghum, millet, oats and rye (other available cereals can be considered as well)

Data format: .csv (using coma ',' as column separator  and having the following columns:
ID Spectrometer Cereal Variety Country Year Moisture Protein  w1 ... wn
where are  [w1, w2, w3, ..., wn] wavelengths. The 1st row contains the columns names and the next rows contain samples.
For formating purposes, in the case only one of the target values (e.g. Protein measurements) are available, the other target column (e.g. Moisture) should be inserted and filled with zeros. This facilitates reading/import data pipelines because the first 8 columns are always the same, and wavelengths are defined from columns [9] to [-1] (last column).

Metadata mandatory (in a separate .xlsx or .txt file):
  • Spectrometer Make/Model                         [ex: Hamamatsu_C11482GA]
  • Wavelength Range (nm),                             [ex: 900 – 1700 nm]
  • Spectral Resolution (nm),                            [ex: 5 nm]
  • Number of Scans per Spectrum,                 [ex: 10]
  • Grain Type,                                                  [ex: wheat] (lowercase)
  • Variety,                                                        [ex: durum] (lowercase)
  • Origin (country),                                          [ex: Belgium] (upercase)
  • Year,                                                            [ex: 2020, 2021]
  • Reference Method (e.g., Kjeldahl for protein, oven drying for moisture),        [ex: kjeldahl]
  • Units for Protein and Moisture,                   [ex: % protein]
  • Citation                            [DOI of paper or source group e.g. CEOT-UAlg (for sensAIfood)]
  • Autorship                               [ex: CEOT-UAlg]
Metadata supplementary (in a separate .txt or .xlsx file):
Detector Type (e.g. InGaAs, etc), Geometry (diffuse reflectance, interactance, etc), Reference Material (e.g., Spectralon, gold mirror), Software Used for Acquisition (Name and version), Sampling notes (inform about data structure, i.e., duplicated measurements per sample, different batches, etc) so that we know how to properly split the data.
 

NOTE: For the initial experimental phase of this database, we permitted a few datasets were Number of Scans per Spectrum and Variety, is defined as unknown (since this data was not registered by the initial contributors). In future contributions we should be more strict with these parameters.