Discovering stochastic dynamical equations from ecological time series data, together with an easy to use Python package.
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, B., Danny Raj, M., & Guttal, V., “Discovering stochastic dynamical equations from ecological time series data”, arXiv preprint arXiv:2205.02645, to appear in The American Naturalist.
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, David B., Danny Raj, M., & Guttal, V. (2024). PyDaddy: A Python Package for Discovering SDEs from Time Series Data (Version 1.1.1) [Computer software]. https://github.com/tee-lab/PyDaddy
PyDaddy is an open source package which is a key contribution of the manuscript Nabeel et al, arXiv:2205.02645. The basic scientific premise for this package is to discover the nature of stochasticity in ecological time series datasets. It is well known that the stochasticity can affect the dynamics of ecological systems in counter-intuitive ways. Without understanding the equations (typically, in the form of stochastic differential equations or SDEs, in short) that govern the dynamics of populations or ecosystems, it’s challenging to determine the impact of randomness on real datasets. In this manuscript and accompanying package, we introduce a methodology for discovering equations (SDEs) that transforms time series data of state variables into stochastic differential equations. This approach merges traditional stochastic calculus with modern equation-discovery techniques. We showcase the generality of our method through various applications and discuss its limitations and potential pitfalls, offering diagnostic measures to address these challenges.
PyDaddy is a comprehensive and easy to use python package to discover data-derived stochastic differential equations from time series data. PyDaddy takes the time series of state variable \(x\), scalar or 2-dimensional vector, as input and discovers an SDE of the form:
\[ \frac{dx}{dt} = f(x) + g(x) \cdot \eta(t) \]
where \(\eta(t)\) is Gaussian white noise. The function \(f\) is called the drift, and governs the deterministic part of the dynamics. \(g^2\) is called the diffusion and governs the stochastic part of the dynamics.
| An example summary plot generated by PyDaddy, for a vector time series dataset. |
PyDaddy also provides a range of functionality such as equation-learning for the drift and diffusion functions using sparse regresssion, a suite of diagnostic functions, etc.. For more details on how to use the package, check out the example notebooks and documentation.
The workflow of the package is summarised by the schematic given below - which is also the Fig 1 of the manuscript (https://arxiv.org/abs/2205.02645).
A detailed workflow of the package along with detailed instructions on various features are included as Supplementary Information Section S2 of the manuscript.
Detailed documentation of the PyDaddy package can be found at documentation.
| Schematic illustration of PyDaddy functionality. |
We provide a number of easy to use scripts for the ease of learning and using the package, and with an aim that our manuscript is easily reproducible.
First, we emphasise that PyDaddy can be executed online on Google Colab, without having to install it on your local machine. To run PyDaddy on Colab, open a notebook on Colab. Paste the following code on a notebook cell and run it:
%pip install git+https://github.com/tee-lab/PyDaddy.git
This sets up PyDaddy in the notebook environment.
There are several example scripts / Jupyter notebooks provided, which can be used to familiarize yourself with various features and functionalities of PyDaddy. These can be executed on Colab. In the list below, we mention the path to location of each notebook as well as a link to the google colab notebook; the latter does not require installing either python or package on your system.
(See below for Notebooks 7 and 8).
There are also two notebooks that use PyDaddy to discover SDEs from real-world datasets.
The zipped folder of codes and data is structured as follows:
PyDaddy is available both on PyPI and Anaconda Cloud, and can be installed on any system with a Python 3 environment. If you don’t have Python 3 installed on your system, we recommend using Anaconda or Miniconda. See the PyDaddy package documentation for detailed installation instructions.
To install the latest stable release version of PyDaddy, use:
pip install pydaddy
To install the latest development version of PyDaddy, use:
pip install git+https://github.com/tee-lab/PyDaddy.git
For more information about PyDaddy, check out the package documentation.
If you are using this package in your research, please cite the repository and the associated paper as follows:
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, David B., Danny Raj, M., & Guttal, V. (2024). PyDaddy: A Python Package for Discovering SDEs from Time Series Data (Version 1.1.1) [Computer software]. https://github.com/tee-lab/PyDaddy, DOI: To Do.
Nabeel, A., Karichannavar, A., Palathingal, S., Jhawar, J., Bruckner, B., Danny Raj, M., & Guttal, V., “Discovering stochastic dynamical equations from ecological time series data”, arXiv preprint arXiv:2205.02645, to appear in The American Naturalist.
This study was partially funded by Science and Engineering Research Board, Department of Science and Technology, Government of India to Vishwesha Guttal.
PyDaddy is distributed under the GNU General Public License v3.0.