README: ANN Training & Evaluation Data for MI Prediction This README file describes the complete dataset collection used in the study *"Deep learning prediction of noise-driven nonlinear instabilities in fibre optics"*. The dataset consists of both **numerical (GNLSE simulations)** and **experimental (real-time DFT)** data, across **two and four seed configurations**, to train and evaluate artificial neural networks (ANNs) for predicting spectral features and correlation structures induced by modulation instability (MI). --- ## Dataset Overview | Dataset Name | Source | Seeds | Cases | Spectral Points | Correlation Map Size | Seed Parameters Shape | Traces / Realizations | | ---------------------- | ------------- | ----- | ------- | --------------- | -------------------- | --------------------- | --------------------- | | Simulation - 2 Seeds | GNLSE | 2 | 90,000 | 1024 | 128 x 128 | [90000 x 4] | 500 | | Simulation - 4 Seeds | GNLSE | 4 | 105,000 | 1024 | 128 x 128 | [105000 x 8] | 500 | | Experimental - 2 Seeds | Real-time DFT | 2 | 60,000 | 82 | 82 x 82 | [60000 x 4] | 1000 | | Experimental - 4 Seeds | Real-time DFT | 4 | 60,000 | 82 | 82 x 82 | [60000 x 8] | 1000 | All data are stored in `double` precision and grouped by type: - /correlation/: spectral correlation maps - /spectrum/: averaged spectral intensity - /seed/: input seed parameters --- ## File Structure Example /correlation/ Correlation_Map [Cases x N x N] double Correlation_Wavelength [1 x N] double /spectrum/ Spectrum_Intensity_avg [Cases x M] double Spectrum_Wavelength [1 x M] double /seed/ Seed_parameters [Cases x P] double - N = 128 for simulated data, 82 for experimental data - M = 1024 for simulated data, 82 for experimental data - P = 4 for 2 seeds, 8 for 4 seeds --- ## Dataset Descriptions ### /correlation/Correlation_Map - Size: See table above - Units: Arbitrary Units - Range: [-1, 1] - Description: Pairwise spectral correlation map per seeding case. Maps are computed over ensemble realizations (500 for simulations, 1000 for experiments). ### /correlation/Correlation_Wavelength - Units: Nanometers (nm) - Description: Wavelength axis for the correlation maps. ### /spectrum/Spectrum_Intensity_avg - Units: Arbitrary Units - Description: Averaged output spectrum per seeding case, calculated over ensemble realizations. ### /spectrum/Spectrum_Wavelength - Units: Nanometers (nm) - Description: Wavelength axis corresponding to the averaged spectra. ### /seed/Seed_parameters - 2 Seeds: [λ₁, λ₂, φ₁, φ₂] - 4 Seeds: [λ₁, λ₂, λ₃, λ₄, φ₁, φ₂, φ₃, φ₄] - Units: - Wavelengths (λ): nanometers (nm) - Phases (φ): radians (rad) - Description: Input seed configuration per case. --- ## Usage Tips You can load the datasets using: ### Python (h5py): import h5py f = h5py.File('your_dataset.h5', 'r') intensity = f['/spectrum/Spectrum_Intensity_avg'][:] correlation = f['/correlation/Correlation_Map'][:] seeds = f['/seed/Seed_parameters'][:] ### MATLAB: spectrum = h5read('your_dataset.h5', '/spectrum/Spectrum_Intensity_avg'); correlation = h5read('your_dataset.h5', '/correlation/Correlation_Map'); seeds = h5read('your_dataset.h5', '/seed/Seed_parameters'); --- ### License & Citation Please cite the corresponding paper (Y. Boussafa et al. Deep learning prediction of noise-driven nonlinear instabilities in fibre optics, submitted, 2025) and dataset DOI (10.5281/zenodo.15179897) when using this data in publications. --- For access to the full raw data (e.g., GNLSE realization, full DFT traces), please contact the authors. These datasets are downsampled and windowed to maintain ANN compatibility and reduce storage requirements.