﻿
# Supplementary code and data

Associated to the paper:
> Delforge, D., Muñoz-Carpena, R., , Van Camp, M. Vanclooster, M. (2020), A parsimonious empirical approach to streamflow recession analysis and forecasting (accepted at Water Resources Research - 29-01-2020)

## Code files

`edm_cmd.py`
: Python 3 implementation of the Convergent Cross Mapping/Simplex algorithm. Read the file's documentation for description and references. The code also allows to run EDM-Simplex from the command lines if the input files `streamflow_data.csv` and `recession_df.csv` are in the same directory:
 ```
 python edm_cmd.py m kn L tp tw Nsam ref h
 ```
where the arguments m, kn, L, tp, tw, Nsam, ref, h should be replaced by their values. See the `GSA_results.csv` description below for more information. 

`Example_notebook.ipynb`
: Jupyter notebook showing an example of the application of the EDM-Simplex method and an example of how to analyze the outputs of the global sensitivity analysis. Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text: [https://jupyter.org/](https://jupyter.org/). Jupyter is part of the Python scientific distribution provided by Anaconda: [https://www.anaconda.com/](https://www.anaconda.com/)

`Example_notebook.html`
: Html version of the notebook. The document can be read by a web browser.

## Data files

`streamflow_data.csv`
 : Daily streamflow data in m³/s. The first column is a timestamp (UTC+0), other columns are labeled by gauging stations' names: S1, S2, S3. This data is shared thanks to the courtesy of the Services Publics de Wallonie and is available under the Aqualim data portal (http://aqualim.environnement.wallonie.be). Aqualim reference codes of the 3 stations referred in the paper are L6310 (station S1), L6650 (station S2), and L6360 (station S3). 
 
`recession_df.csv`
: Dataframe of binary data labelling recession points (True values). It has 
- timestamp: timestamp (UTC+0) of the day indexing the data.
- EDM.S1: EDM recession points for station S1.
- EDM.S2: EDM recession points for station S2.
- EDM.S3: EDM recession points for station S3.
- KIR.S1: KIR recession points for station S1.
- KIR.S2: KIR recession points for station S2.
- KIR.S3: KIR recession points for station S3.
- VOG.S1: VOG recession points for station S1.
- VOG.S2: VOG recession points for station S2.
- VOG.S3: VOG recession points for station S3.
- BRU.S1: BRU recession points for station S1.
- BRU.S2: BRU recession points for station S2.
- BRU.S3: BRU recession points for station S3.
- DSF.S1: Decreasing streamflow points for station S1.
- DSF.S2: Decreasing streamflow points for station S2.
- DSF.S3: Decreasing streamflow points for station S3.

`GSA_results.csv`
 : Output data reporting the outputs of the global sensitivity analysis. It contains 110592 entries (36864 simulations per hydrograph series S1, S2, and S3) and 11 columns: 
- index: index of the simulation
- m: embedding dimension.
- kn: number of additionnal nearest-neighbor states such that the total number k = m+kn.
- L: size of the bootsrapped samples (library of states).
- tp: prediction horizon in days.
- tw: Theiler time exclusion windows in days. The actual window size is given by tw*2 + 1.
- Nsam: Number of bootstrap samples.
- REF: Integer referring to the recession extraction method. With respect to the paper, 1 is BRU, 2 is EDM, 3 is KIR, and 4 is VOG. 
- h: Integer defining the the size of the truncation of the head of recession segments (in days). 
- station: the code of the station for which the hydrograph recession is forecast (S1, S2 or S3).
- n_pred: total number of recession points beeing predicted
- n_lib: size of the total library (without considering the Theiler window) of potential nearest-neighbor states.
- nse: Nash and Sutcliffe Efficiency computed between the median EDM-Simplex output and the observations.

