pyesmda.ESMDA¶
- class pyesmda.ESMDA(obs: numpy.ndarray[Any, numpy.dtype[numpy.float64]], m_init: numpy.ndarray[Any, numpy.dtype[numpy.float64]], cov_obs: Union[numpy.ndarray[Any, numpy.dtype[numpy.float64]], scipy.sparse._csr.csr_matrix], forward_model: Callable[[...], numpy.ndarray[Any, numpy.dtype[numpy.float64]]], forward_model_args: Sequence[Any] = (), forward_model_kwargs: Optional[Dict[str, Any]] = None, n_assimilations: int = 4, cov_obs_inflation_factors: Optional[Sequence[float]] = None, cov_mm_inflation_factors: Optional[Sequence[float]] = None, dd_correlation_matrix: Optional[Union[numpy.ndarray[Any, numpy.dtype[numpy.float64]], scipy.sparse._csr.csr_matrix]] = None, md_correlation_matrix: Optional[Union[numpy.ndarray[Any, numpy.dtype[numpy.float64]], scipy.sparse._csr.csr_matrix]] = None, m_bounds: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, save_ensembles_history: bool = False, seed: Optional[int] = None, is_forecast_for_last_assimilation: bool = True, batch_size: int = 5000, is_parallel_analyse_step: bool = True)[source]¶
Bases:
object
Ensemble Smoother with Multiple Data Assimilation.
Implement the ES-MDA as proposed by Emerick, A. A. and A. C. Reynolds [Emerick and Reynolds, 2013, Emerick and Reynolds, 2013].
- Attributes
d_dim (int) – Number of observation values \(N_{obs}\), and consequently of predicted values.
obs (NDArrayFloat) – Obsevrations vector with dimensions (\(N_{obs}\)).
cov_obs (NDArrayFloat) – Covariance matrix of observed data measurement errors with dimensions (\(N_{obs}\), \(N_{obs}\)). Also denoted \(R\).
d_obs_uc (NDArrayFloat) – Vectors of pertubed observations with dimension (\(N_{e}\), \(N_{obs}\)).
d_pred (NDArrayFloat) – Vectors of predicted values (one for each ensemble member) with dimensions (\(N_{e}\), \(N_{obs}\)).
d_history (List[NDArrayFloat]) – List of vectors of predicted values obtained at each assimilation step.
m_prior – Vectors of parameter values (one vector for each ensemble member) used in the last assimilation step. Dimensions are (\(N_{e}\), \(N_{m}\)).
m_bounds (NDArrayFloat) – Lower and upper bounds for the \(N_{m}\) parameter values. Expected dimensions are (\(N_{m}\), 2) with lower bounds on the first column and upper on the second one.
m_history (List[NDArrayFloat]) – List of successive m_prior.
cov_md (NDArrayFloat) – Cross-covariance matrix between the forecast state vector and predicted data. Dimensions are (\(N_{m}, N_{obs}\)).
cov_obs (csr_matrix) – Autocovariance matrix of predicted data. Dimensions are (\(N_{obs}, N_{obs}\)).
cov_mm (NDArrayFloat) – Autocovariance matrix of estimated parameters. Dimensions are (\(N_{m}, N_{m}\)).
forward_model (callable) – Function calling the non-linear observation model (forward model) for all ensemble members and returning the predicted data for each ensemble member.
forward_model_args (Tuple[Any]) – Additional args for the callable forward_model.
forward_model_kwargs (Dict[str, Any]) – Additional kwargs for the callable forward_model.
n_assimilations (int) – Number of data assimilations (\(N_{a}\)).
cov_obs_inflation_factors (List[float]) – List of multiplication factor used to inflate the covariance matrix of the measurement errors.
cov_mm_inflation_factors (List[float]) – List of factors used to inflate the adjusted parameters covariance among iterations: Each realization of the ensemble at the end of each update step i, is linearly inflated around its mean. See [Anderson, 2007].
dd_correlation_matrix (Optional[Union[NDArrayFloat, csr_matrix]]) – Correlation matrix based on spatial and temporal distances between observations and observations \(\rho_{DD}\). It is used to localize the autocovariance matrix of predicted data by applying an elementwise multiplication by this matrix. Expected dimensions are (\(N_{obs}\), \(N_{obs}\)). A sparse matrix format can be provided to save some memory.
md_correlation_matrix (Optional[Union[NDArrayFloat, csr_matrix]]) – Correlation matrix based on spatial and temporal distances between parameters and observations \(\rho_{MD}\). It is used to localize the cross-covariance matrix between the forecast state vector (parameters) and predicted data by applying an elementwise multiplication by this matrix. Expected dimensions are (\(N_{m}\), \(N_{obs}\)). . A sparse matrix format can be provided to save some memory.
save_ensembles_history (bool) – Whether to save the history predictions and parameters over the assimilations.
rng (np.random.Generator) – The random number generator used in the predictions perturbation step.
is_forecast_for_last_assimilation (bool) – Whether to compute the predictions for the ensemble obtained at the last assimilation step.
batch_size (int) – Number of parameters that are assimilated at once. This option is available to overcome memory limitations when the number of parameters is large. In that case, the size of the covariance matrices tends to explode and the update step must be performed by chunks of parameters.
is_parallel_analyse_step (bool, optional) – Whether to use parallel computing for the analyse step if the number of batch is above one. The default is True.
n_batches (int) – Number of batches required during the update step.
- __init__(obs: numpy.ndarray[Any, numpy.dtype[numpy.float64]], m_init: numpy.ndarray[Any, numpy.dtype[numpy.float64]], cov_obs: Union[numpy.ndarray[Any, numpy.dtype[numpy.float64]], scipy.sparse._csr.csr_matrix], forward_model: Callable[[...], numpy.ndarray[Any, numpy.dtype[numpy.float64]]], forward_model_args: Sequence[Any] = (), forward_model_kwargs: Optional[Dict[str, Any]] = None, n_assimilations: int = 4, cov_obs_inflation_factors: Optional[Sequence[float]] = None, cov_mm_inflation_factors: Optional[Sequence[float]] = None, dd_correlation_matrix: Optional[Union[numpy.ndarray[Any, numpy.dtype[numpy.float64]], scipy.sparse._csr.csr_matrix]] = None, md_correlation_matrix: Optional[Union[numpy.ndarray[Any, numpy.dtype[numpy.float64]], scipy.sparse._csr.csr_matrix]] = None, m_bounds: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, save_ensembles_history: bool = False, seed: Optional[int] = None, is_forecast_for_last_assimilation: bool = True, batch_size: int = 5000, is_parallel_analyse_step: bool = True) None [source]¶
Construct the instance.
- Parameters
obs (NDArrayFloat) – Obsevrations vector with dimension \(N_{obs}\).
m_init (NDArrayFloat) – Initial ensemble of parameters vector with dimensions (\(N_{e}\), \(N_{m}\)).
cov_obs (NDArrayFloat) – Covariance matrix of observed data measurement errors with dimensions (\(N_{obs}\), \(N_{obs}\)). Also denoted \(R\). It can be a numpy array or a sparse matrix (scipy.linalg).
forward_model (callable) – Function calling the non-linear observation model (forward model) for all ensemble members and returning the predicted data for each ensemble member.
forward_model_args (Optional[Tuple[Any]]) – Additional args for the callable forward_model. The default is None.
forward_model_kwargs (Optional[Dict[str, Any]]) – Additional kwargs for the callable forward_model. The default is None.
n_assimilations (int, optional) – Number of data assimilations (\(N_{a}\)). The default is 4.
cov_obs_inflation_factors (Optional[Sequence[float]]) – Multiplication factor used to inflate the covariance matrix of the measurement errors. Must match the number of data assimilations (\(N_{a}\)). The default is None.
cov_mm_inflation_factors (Optional[Sequence[float]]) – List of factors used to inflate the adjusted parameters covariance among iterations: Each realization of the ensemble at the end of each update step i, is linearly inflated around its mean. Must match the number of data assimilations (\(N_{a}\)). See [Anderson, 2007]. If None, the default is 1.0. at each iteration (no inflation). The default is None.
dd_correlation_matrix (Optional[NDArrayFloat]) – Correlation matrix based on spatial and temporal distances between observations and observations \(\rho_{DD}\). It is used to localize the autocovariance matrix of predicted data by applying an elementwise multiplication by this matrix. Expected dimensions are (\(N_{obs}\), \(N_{obs}\)). The default is None.
md_correlation_matrix (Optional[NDArrayFloat]) – Correlation matrix based on spatial and temporal distances between parameters and observations \(\rho_{MD}\). It is used to localize the cross-covariance matrix between the forecast state vector (parameters) and predicted data by applying an elementwise multiplication by this matrix. Expected dimensions are (\(N_{m}\), \(N_{obs}\)). The default is None.
m_bounds (Optional[NDArrayFloat], optional) – Lower and upper bounds for the \(N_{m}\) parameter values. Expected dimensions are (\(N_{m}\), 2) with lower bounds on the first column and upper on the second one. The default is None.
save_ensembles_history (bool, optional) – Whether to save the history predictions and parameters over the assimilations. The default is False.
seed (Optional[int]) – Seed for the white noise generator used in the perturbation step. If None, the default
numpy.random.default_rng()
is used. The default is None.is_forecast_for_last_assimilation (bool, optional) – Whether to compute the predictions for the ensemble obtained at the last assimilation step. The default is True.
batch_size (int) – Number of parameters that are assimilated at once. This option is available to overcome memory limitations when the number of parameters is large. In that case, the size of the covariance matrices tends to explode and the update step must be performed by chunks of parameters. The default is 5000.
is_parallel_analyse_step (bool, optional) – Whether to use parallel computing for the analyse step if the number of batch is above one. It relies on concurrent.futures multiprocessing. The default is True.
Public methods summary
__delattr__
(name, /)Implement delattr(self, name).
__dir__
()Default dir() implementation.
__eq__
(value, /)Return self==value.
__format__
(format_spec, /)Default object formatter.
__ge__
(value, /)Return self>=value.
__getattribute__
(name, /)Return getattr(self, name).
__gt__
(value, /)Return self>value.
__hash__
()Return hash(self).
__init__
(obs, m_init, cov_obs, forward_model)Construct the instance.
__init_subclass__
This method is called when a class is subclassed.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(value, /)Return self!=value.
__new__
(**kwargs)__reduce__
()Helper for pickle.
__reduce_ex__
(protocol, /)Helper for pickle.
__repr__
()Return repr(self).
__setattr__
(name, value, /)Implement setattr(self, name, value).
__sizeof__
()Size of object in memory, in bytes.
__str__
()Return str(self).
__subclasshook__
Abstract classes can override this to customize issubclass().
_analyse
(inflation_factor)Analysis step of the ES-MDA.
_apply_bounds
(m_pred)Apply bounds constraints to the adjusted parameters.
Approximate the covariance matrices.
Forecast step of ES-MDA.
_get_batch_m_update
(index, inflation_factor)_local_analyse
(inflation_factor)Analysis step of the ES-MDA.
_pertrub
(inflation_factor)Perturbation of the observation vector step of ES-MDA.
Set the number of assimilations to perform.
_solve
()Solve the optimization problem with ES-MDA algorithm.
Solve the optimization problem with ES-MDA algorithm.
Set the inflation factors the covariance matrix of the measurement errors.
solve
()Solve the optimization problem with ES-MDA algorithm.
Methods definition
- _analyse(inflation_factor: float) numpy.ndarray[Any, numpy.dtype[numpy.float64]] [source]¶
Analysis step of the ES-MDA.
Update the vector of model parameters using
\[m^{l+1}_{j} = m^{l}_{j} + C^{l}_{MD}\left(C^{l}_{DD}+\alpha_{l+1} C_{D}\right)^{-1} \left(d^{l}_{uc,j} - d^{l}_{j} \right), \textrm{for } j=1,2,...,N_{e}.\]Notes
To avoid the inversion of \(\left(C^{l}_{DD}+\alpha_{l+1} C_{D}\right)\), the product \(\left(C^{l}_{DD}+\alpha_{l+1} C_{D}\right) ^{-1} \left(d^{l}_{uc,j} - d^{l}_{j} \right)\) is solved linearly as \(A^{-1}b = x\) which is equivalent to solve \(Ax = b\).
- _apply_bounds(m_pred: numpy.ndarray[Any, numpy.dtype[numpy.float64]]) numpy.ndarray[Any, numpy.dtype[numpy.float64]] [source]¶
Apply bounds constraints to the adjusted parameters.
- _approximate_covariance_matrices() None [source]¶
Approximate the covariance matrices.
The covariance matrices \(C^{l}_{MD}\) and \(C^{l}_{DD}\) are approximated from the ensemble in the standard way of EnKF [Evensen, 2007, Aanonsen et al., 2009]:
\[C^{l}_{MD} = \frac{1}{N_{e} - 1} \sum_{j=1}^{N_{e}}\left(m^{l}_{j} - \overline{m^{l}}\right)\left(d^{l}_{j} - \overline{d^{l}} \right)^{T}\]\[C^{l}_{DD} = \frac{1}{N_{e} - 1} \sum_{j=1}^{N_{e}}\left(d^{l}_{j} -\overline{d^{l}} \right)\left(d^{l}_{j} - \overline{d^{l}} \right)^{T}\]with \(\overline{m^{l}}\) and \(\overline{d^{l}}\), the the ensemble means, at iteration \(l\), of parameters and predictions, respectively.
If defined by the user, covariances localization is applied by element-wise multiplication (Schur product or Hadamard product) of the original covariance matrix and a distance dependent correlation function that smoothly reduces the correlations between points for increasing distances and cuts off long-range correlations above a specific distance:
\[\tilde{C}^{l}_{MD} = \rho_{MD} \odot C^{l}_{MD}\]\[\tilde{C}^{l}_{DD} = \rho_{DD} \odot C^{l}_{DD}\]with \(\odot\) the element wise multiplication.
- _forecast() None [source]¶
Forecast step of ES-MDA.
Run the forward model from time zero until the end of the historical period from time zero until the end of the historical period to compute the vector of predicted data
\[d^{l}_{j}=g\left(m^{l}_{j}\right),\textrm{for }j=1,2,...,N_{e},\]where \(g(·)\) denotes the nonlinear observation model, i.e., \(d^{l}_{j}\) is the \(N_{d}\)-dimensional vector of predicted data obtained by running the forward model reservoir simulation with the model parameters given by the vector \(m^{l}_{j}\) from time zero. Note that we use \(N_{d}\) to denote the total number of measurements in the entire history.
- _local_analyse(inflation_factor: float) numpy.ndarray[Any, numpy.dtype[numpy.float64]] [source]¶
Analysis step of the ES-MDA.
Update the vector of model parameters using
\[m^{l+1}_{j} = m^{l}_{j} + C^{l}_{MD}\left(C^{l}_{DD}+\alpha_{l+1} C_{D}\right)^{-1} \left(d^{l}_{uc,j} - d^{l}_{j} \right), \textrm{for } j=1,2,...,N_{e}.\]Notes
To avoid the inversion of \(\left(C^{l}_{DD}+\alpha_{l+1} C_{D}\right)\), the product \(\left(C^{l}_{DD}+\alpha_{l+1} C_{D}\right) ^{-1} \left(d^{l}_{uc,j} - d^{l}_{j} \right)\) is solved linearly as \(A^{-1}b = x\) which is equivalent to solve \(Ax = b\).
- _pertrub(inflation_factor: float) None [source]¶
Perturbation of the observation vector step of ES-MDA.
Perturb the vector of observations
\[d^{l}_{uc,j} = d_{obs} + \sqrt{\alpha_{l+1}}C_{D}^{1/2}Z_{d}, \textrm{for } j=1,2,...,N_{e},\]where \(Z_{d} \sim \mathcal{N}(O, I_{N_{d}})\).
Notes
To get reproducible behavior, use a seed when creating the ESMDA instance.
- set_cov_obs_inflation_factors(a: Optional[Sequence[float]]) None [source]¶
Set the inflation factors the covariance matrix of the measurement errors.
- property cov_mm: numpy.ndarray[Any, numpy.dtype[numpy.float64]]¶
Get the estimated parameters autocovariance matrix. It is a read-only attribute.
The covariance matrice \(C^{l}_{MM}\) is approximated from the ensemble in the standard way of EnKF [Evensen, 2007, Aanonsen et al., 2009]:
\[C^{l}_{MM} = \frac{1}{N_{e} - 1} \sum_{j=1}^{N_{e}}\left(m^{l}_{j} - \overline{m^{l}}\right)\left(m^{l}_{j} - \overline{m^{l}} \right)^{T}\]with \(\overline{m^{l}}\), the parameters ensemble means, at iteration \(l\).
- property cov_mm_inflation_factors: List[float]¶
Get the inlfation factors for the adjusted parameters covariance matrix.
Covariance inflation is a method used to counteract the tendency of ensemble Kalman methods to underestimate the uncertainty because of either undersampling, inbreeding, or spurious correlations [Todaro, 2021]. The spread of the ensemble is artificially increased before the assimilation of the observations, according to scheme introduced by [Anderson, 2007]:
\[m^{l}_{j} \leftarrow r^{l}\left(m^{l}_{j} - \frac{1}{N_{e}} \sum_{j}^{N_{e}}m^{l}_{j}\right) + \frac{1}{N_{e}}\sum_{j}^{N_{e}}m^{l}_{j}\]where \(r\) is the inflation factor for the assimilation step \(l\).
- property cov_obs: scipy.sparse._csr.csr_matrix¶
Get the observation errors covariance matrix.
- property cov_obs_inflation_factors: List[float]¶
Get the inlfation factors for the covariance matrix of the measurement errors.
Single and multiple data assimilation are equivalent for the linear-Gaussian case as long as the factor \(\alpha_{l}\) used to inflate the covariance matrix of the measurement errors satisfy the following condition:
\[\sum_{l=1}^{N_{a}} \frac{1}{\alpha_{l}} = 1\]In practise, \(\alpha_{l} = N_{a}\) is a good choice [Emerick and Reynolds, 2013].
- property dd_correlation_matrix: Optional[scipy.sparse._csr.csr_matrix]¶
Get the observations-observations localization matrix.
- property m_bounds: numpy.ndarray[Any, numpy.dtype[numpy.float64]]¶
Get the parameter errors covariance matrix.
- property md_correlation_matrix: Optional[scipy.sparse._csr.csr_matrix]¶
Get the parameters-observations localization matrix.