Privacy-Preserving Distributed Learning for Renewable Energy Forecasting

Data exchange between multiple renewable energy power plant owners can lead to an improvement in forecast skill thanks to the spatio-temporal dependencies in time series data. However, owing to business competitive factors, these different owners might be unwilling to share their data. In order to tackle this privacy issue, this paper formulates a novel privacy-preserving framework that combines data transformation techniques with the alternating direction method of multipliers. This approach allows not only to estimate the model in a distributed fashion but also to protect data privacy, coefficients and covariance matrix. Besides, asynchronous communication between peers is addressed in the model fitting, and two different collaborative schemes are considered: centralized and peer-to-peer. The results for a solar energy dataset show that the proposed method is robust to privacy breaches and communication failures, and delivers a forecast skill comparable to a model without privacy protection.


I. INTRODUCTION
T HE forecast skill of renewable energy sources (RES) has improved over the past two decades through R&D activities across the complete model chain, i.e., from numerical weather predictions (NWP) to statistical learning methods that convert weather variables into power forecasts [1]. The need to bring forecast skill to significantly higher levels is widely recognized in the majority of roadmaps that deal with high RES integration scenarios for the next decades. This is expected not only to facilitate RES integration in the system operation and electricity markets but also to reduce the need for flexibility and associated investment costs on remedies that aim to hedge RES Manuscript  variability and uncertainty like storage, demand response, and others.
In this context, intraday and hour-ahead electricity markets are becoming increasingly important to handle RES uncertainty and thus accurate hours-ahead forecasts are essential. Recent findings showed that feature engineering, combined with statistical models, can extract relevant information from spatially distributed weather and RES power time series and improve hours-ahead forecast skill [1]. Indeed, for very short-term lead times (from 15 minutes to 6 hours ahead), the vector autoregressive (VAR) model, when compared to univariate time series models, has shown competitive results for wind [2] and solar [3] power forecasting. Alternative models are also being applied to this problem, most notably deep learning techniques such as convolutional neural networks or long short-term memory networks [4]. While there may always be a debate about the interest and relevance of statistical modeling vs. machine learning approaches, VAR models have the advantages of flexibility, interpretability, acceptability by practitioners, as well as robustness in terms of forecast skill.
Data privacy is a critical barrier to the application of collaborative forecasting models. Although multivariate time series models offer forecast skill improvement, the lack of privacypreserving mechanisms makes data owners unwilling to cooperate. For instance, in the VAR model, the covariates are the lags of the target variable of each RES site, which means that agents (or data owners) cannot provide covariates without also providing their power measurements.
To the best of our knowledge, only three works have proposed privacy-preserving approaches for RES forecasting. Zhang and Wang described a privacy-preserving approach for wind power forecasting with off-site time series, which combined ridge linear quantile regression with alternating direction method of multipliers (ADMM) [9]. However, privacy with ADMM is not always guaranteed since it requires intermediate calculations, allowing the most curious competitors to recover the data at the end of several iterations [10]. Moreover, the central node can also recover the original and private data. Sommer et al. [11] considered an encryption layer, which consists of multiplying the data by a random matrix. However, the focus of this work was not data privacy, but rather online learning, and the private data are revealed to the central agent who performs intermediary 1949-3029 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
computations. Berdugo et al. described a method based on local and global analog-search (i.e., template matching) that uses solar power time series from neighboring sites [12]. However, agents only share reference time-stamps and normalized weights of the analogs identified by the neighbors, hence forecast error is only indirectly reduced. In this paper, we also use ADMM as a central framework for distributed learning and forecasting, in view of its flexibility in terms of communication setup for all agents involved, the possibility to add a privacy-preserving layer, as well as the promising resulting forecast skill documented in the literature.
A literature analysis in [10] of privacy-preserving techniques for VAR has grouped these techniques as (a) data transformation, such as generation of random matrices that pre-or postmultiply the data [13] or using principal component analysis with differential privacy [14], (b) secure multi-party computation, such as linear algebra protocols [15] or homomorphic encryption (encrypting the original data in a way that arithmetic operations in the public space do not compromise the encryption [16]), and (c) decomposition-based methods like the ADMM [17] or the distributed Newton-Raphson method [18]. The main conclusions were that data transformation requires a trade-off between privacy and accuracy, secure multi-party computations either result in computationally demanding techniques or do not fully preserve privacy in VAR models, and that decomposition-based methods rely on iterative processes and after a number of iterations, the agents have enough information to recover private data.
With our focus on privacy-preserving protocols for very short-term forecasting with the VAR model, the main research outcome from this work is a novel combination of data transformation and decomposition-based methods so that the VAR model is fitted in another feature space without decreasing the forecast skill (which contrasts with [12]). The main advantage of this combination is that the ADMM algorithm is not affected and therefore: (a) asynchronous communication between peers can be addressed while fitting the model; (b) a flexible privacy-preserving collaborative model can be implemented using two different schemes, centralized communication with a neutral node and peer-to-peer communication, and in a way that original data cannot be recovered by central node or peers (this represents a more robust approach compared to the ADMM implementation in [9], [11]).
The remaining of this paper is organized as follows: Section II describes the distributed learning framework. Section III describes the VAR model and coefficients' estimators. Section IV formulates a novel privacy-preserving LASSO-VAR model. Then, a case study with solar energy data is considered in Section V. The work concludes in Section VI.

II. DISTRIBUTED LEARNING FRAMEWORK
This section discusses the distributed learning framework that enables different agents or data owners (e.g., RES power plant, market players, forecasting service providers) to exploit geographically distributed time series data (power and/or weather measurements, NWP, etc.) and improve forecast skill while keeping data private. In this context, data privacy can either refer to commercially sensitive data from grid-connected RES power plants or personal data (e.g., under European Union General Data Protection Regulation) from households with RES technology. Distributed learning (or collaborative forecasting) means that instead of sharing their data, the model fitting problem is solved in a distributed manner. Two collaborative schemes are possible: centralized communication with a central node (central hub) and peer-to-peer communication (P2P).
In the central hub model, the scope of the calculations performed by the agents is limited by their local data and the only information transmitted to the central node is statistics, e.g., average values or local data multiplied by locally estimated coefficients. The central node is responsible for combining these local estimators and, when considering iterative solvers like ADMM, coordinating the individual optimization processes to solve the main optimization problem. The central node can be either a transmission/distribution system operator (TSO/DSO) or a forecasting service provider. The TSO or DSO could operate a platform that promotes collaboration between competitive RES power plants in order to improve the forecasting accuracy and reduce system balancing costs. On the other hand, the forecasting service provider could host the central node and make available APIs and protocols for information (not data) exchange between different data owners, during model fitting, and receives a payment for this service.
In the P2P, the agents equally conduct a local computation of their estimators, but share their information with peers, meaning that each agent is itself agent and central node. While P2P tends to be more robust (i.e., lower points of failure), it is usually difficult to make it as efficient as the central hub model in terms of communication costs -when considering n agents, each agent communicates with the remaining n−1.
The P2P model is suitable for data owners that do not want to rely (or trust) upon a neutral agent. Potential business models could be: P2P forecasting between prosumers or RES power plants [19]; smart cities characterized by an increasing number of sensors and devices installed at houses, buildings, and transportation network [20].
In order to make these collaborative schemes feasible, the following fundamental principles must be respected: (a) ensure improvement in forecast skill, compared to a scenario without collaboration; (b) guarantee data privacy, i.e., agents and the central node cannot have access to (or recover) original data; (c) consider synchronous and asynchronous communication between agents. The formulation that will be described in Section IV fully guarantees these three core principles.

III. BACKGROUND: VECTOR AUTOREGRESSIVE MODEL
This section summarizes the VAR model, as well as the most common model fitting algorithms. Throughout this paper, matrices are represented by bold uppercase letters, vectors by bold lowercase letters, and scalars by lowercase letters. Also, a = [a 1 , a 2 ] represents a column vector, while the columnwise operation between two vectors or matrices is denoted as [a, b] or [A, B], respectively.

A. VAR Model Formulation
Let {y t } T t=1 be an n-dimensional multivariate time series, where n is the number of data owners. Then, {y t } T t=1 follows a VAR model with p lags, denoted by VAR n (p), when for t = 1, . . . , T , where η = [η 1 , . . . , η n ] is the constant intercept (row) vector, η ∈ R n ; B ( ) represents the coefficient matrix at lag = 1, . . ., p, B ( ) ∈ R n×n , and the coefficient associated with lag of time series i, to estimate time series j, is at position (i, j) of B ( ) , for i, j = 1, . . ., n; and ε t = [ε 1,t , . . . , ε n,t ], ε t ∈ R n , denotes a white noise vector that is independent and identically distributed with mean zero and nonsingular covariance matrix. By simplification, y t is assumed to follow a centered process, η = 0, i.e., as a vector of zeros of appropriate dimension. A VAR n (p) model can be written in matrix form as where are obtained by joining the vectors row-wise, and define, respectively, the T × n response matrix, the np × n coefficient matrix, the T × np covariate matrix and the T × n error matrix, with z t = [y t−1 , . . . , y t−p ].

B. VAR Model Estimation
Usually, when the number of covariates, np, is substantially smaller than the records, T , the VAR model is estimated through the multivariate least squares, where . r represents both vector and matrix L r norms. However, as the number of data owners increases, as well as the number of lags, it becomes indispensable to use regularization techniques, such as LASSO, aiming to introduce sparsity into the coefficient matrix estimated by the model. In the standard LASSO-VAR approach, the coefficients are estimated bŷ where λ > 0 is a scalar penalty parameter. The LASSO penalty is convenient to use when handling high-dimensional data since the penalty function shrinks some of the coefficients to zero, performing variable selection. Instead of assuming that all lagged multivariate time series are contributing to the model, this framework extracts, with a small computational effort, the predictors with the strongest contribution to forecast the target variable. As showed in [6], [7], the introduction of sparsity in the model's coefficients can find sub-groups of spatio-temporal dependency between RES power plants, enabling the application of the LASSO-VAR model to a large spatial region. The outcome is an interpretable model, in terms of spatial and temporal dependency, which avoids noisy estimates and unstable forecasts. Some alternatives to LASSO are the partial spectral coherence with Bayesian information criterion [6] or a penalty term based on the correlation among the time series rate-of-change [21], among others.
Despite the many benefits, the LASSO regularization term makes the loss function in (4) non-differentiable, limiting the variety of optimization techniques that can be employed. In this domain, ADMM is a popular and computationally efficient technique allowing parallel estimation for data divided by records or features, which is an appealing property when designing a privacy-preserving approach.

1) Distributed ADMM and LASSO-VAR:
When defining a VAR model, each time series is collected by a specific data owner, meaning that data are divided by features, i.e., denote the target and covariate matrix for the i-th data owner, respectively. Furthermore, Fig. 1. Consequently, the problem in (4) can be re-written as This decomposition of the loss function allows parallel computation of B A i , being the ADMM solution provided by system of equations (6) -see [7], where 2) Privacy Issues: In the collaboration schemes of Section II, each agent determines and transmits (6a) and then it is up to the central agent or peers (depending on the adopted structure) to compute the quantities in (6b) and (6c). Although there is no direct exchange of private data, the computation of (6b) and (6c) provides indirect information about these data, meaning that confidentiality breaches can occur after a number of iterations. The term "confidentiality breach" means the reconstruction of the entire private dataset by another party.
To reduce the possibility of such confidentiality breaches, recent work combined distributed ADMM with differential privacy, which consists of adding random noise (with certain statistical properties) to the data itself or coefficients [22], [23]. However, these mechanisms can deteriorate the performance of the model even under moderate privacy guarantees [10].

IV. PRIVACY-PRESERVING DISTRIBUTED LASSO-VAR
This section describes the novel privacy-preserving collaborative forecasting method, which combines multiplicative randomization of the data (Section IV-A) with the distributed ADMM for the generalized LASSO-VAR model (Section IV-B). Communication issues (Section IV-E) are also addressed since they are common in distributed systems.

A. Data Transformation With Multiplicative Randomization
Multiplicative randomization of the data [24] consists of multiplying the data matrix X ∈ R T ×ns by full rank perturbation matrices. If the perturbation matrix M ∈ R T ×T pre-multiplies X, i.e., MX, the records are randomized. On the other hand, if perturbation matrix Q ∈ R ns×ns post-multiplies X, i.e., XQ, then the features are randomized. The challenges related to such transformations are two-fold: (i) M and Q are algebraic encryption keys, and consequently should be fully unknown by agents, (ii) data transformations need to preserve the relationship between the original time series. When X is divided by features, as is the case with matrices Z and Y when defining VAR models, Q can be constructed as a diagonal matrix -see (7), where matrices in diagonal, Q A i ∈ R s×s , are privately defined by agent i = 1, . . . , n. Then, agents post-multiply their data without sharing Q A i , since (7) Unfortunately, the same reasoning is not possible when defining M, because all elements of column j of M multiplies all elements of row j in X (containing data from every agent). Therefore, the challenge is to define a random matrix M, unknown but at the same time built by all agents.
We propose to define M as

Algorithm 1: Data Encryption.
Input from ith agent: if j > 1 then 7: shares where M A i ∈ R T ×T is privately defined by agent i. This means that Some linear algebra-based protocols exist for secure matricial product, but they were designed for matrices with independent observations and have proven to fail when applied to such matrices as Z and Y(see [10] for a proof). The calculation of MX A i is described in Algorithm 1.
The privacy of this protocol depends on r, which is chosen according to the number of unique values on X A i . The optimal value for r is discussed in Proposition 1 of Appendix B.

B. Formulation of the Collaborative Forecasting Model
When applying the ADMM algorithm, the protocol presented in the previous section should be applied to transform matrices Z and Y in such a way that: (i) the estimated coefficients do not coincide with the originals, instead they are a secret transformation of them, (ii) agents are unable to recover the private data through the exchanged information, and (iii) cross-correlations cannot be obtained, i.e., agents are unable to recover Z Z nor Y Y.
To fulfill these requirements, both covariate and target matrices are transformed through multiplicative noise. Both M and Q must be invertible, which is ensured if M A i and Q A i are invertible for i = 1, . . . , n.
1) Formulation: Let ZQ be the covariate matrix obtained through (7) and Y the target matrix. Covariate matrix ZQ is divided by features, and the optimization problem which allows recovering the solution of (5) is (12) After a litle algebra, the relation between the ADMM solution for (5) and (12) is suggesting coefficients privacy since the original B is no longer used. However, the limitations identified in a previous work [10] for (5) are valid for (12). That is, a curious agent can obtain both Y and ZQ, and because Y and Z share a large proportion of values, Z can also be recovered. Taking covariate matrix MZQ and target MY, the ADMM solution for the optimization problem Note that the orthogonality of M is necessary to ensure that, while computing B A i , We deal with this limitation by using Algorithm 2 summarizes our proposal for estimating a privacy-preserving LASSO-VAR model.
is obtained by adapting the protocol in (10)- (11). In this case, the value of r is more restrictive because we need to ensure that agent i does not obtain both Y A i M −1 and MY A i . Otherwise, the covariance and cross-correlation matrices are again vulnerable. Let us assume that Z A i and Q A i represent u unique unknown values and Y A i has v unique unknown values that are not in Z A i . Then, privacy is ensured by computing MZ A i Q A i and Q A i Z A i M −1 using the smallest natural number r such that √ T p − u<r<T /2 ∧ r > p, and then Algorithm 2: Synchronous Privacy-preserving LASSO-VAR.

Input: Randomized data MZ
2: for agent i = 1, . . . , n do 3: : end for 5: while stopping criteria not satisfied do 6: for agent i = 1, . . . , n do 7: while stopping criteria not satisfied do A i is shared with peers or central node, who computes (17)- (19), 16: k = k + 1 17: end while Proposition 2 in Appendix B for determination of the optimal r). Appendix C presents an analysis of the data privacy for scenarios without and with collusion between agents (data owners) during encrypted data exchange.
Finally, it is important to underline that Algorithm 2 can be applied to both central hub model and P2P model schemes without any modification -depending on who (central node or peers, respectively) receives MZ A i Q A i B k+1 A i and computes (17)- (19).

2) Malicious Agents:
The proposed approach assumes that agents should only trust themselves, requiring control mechanisms to detect when agents share wrong estimates of their coefficients, compromising the global model. Since MY and MZQB k can be known by agents without exposing private data, a malicious agent is detected through the analysis of the global error MY−MZQB k 2 2 . That is, during the iterative process, this global error should smoothly converge, as depicted in Fig. 2 (left plot), and the same is expected for the individual , ∀i. In the example of Fig. 2, two agents are assumed to add random noise to their coefficients. This results in the erratic curve for the global error shown in Fig. 2. An analysis of individual errors, in Fig. 2 (right plot), shows that all agents have smooth curves, except the two who shared distorted information.

C. Tuning of Hyper-Parameters
Since the ADMM solutions for (4) and (14) are the same, agents can tune hyper-parameters (ρ and λ) by applying common techniques, such as cross-validation grid-search, Nelder-Mead optimization, Bayesian optimization, etc., to minimize the loss function in (14). This requires the definition of fitting and validation datasets and corresponding encryption by Algorithm 1, taking into account that, for each fitting and validation pair, the matrix Q A i needs to be the same, but all the others should be changed to keep data private.

D. Computational Complexity
Typically, the computational complexity of an algorithm is estimated by the number of required floating-point operations (defined as one addition, subtraction, multiplication, or division of two floating-point numbers). When compared to the existing distributed ADMM literature applied to the LASSO-VAR model (e.g., [7], [25]), the computational complexity of the ADMM algorithm remains almost the same -only p 2 n extra floatingpoint operations come from considering A i in line 10 and 11 of Algorithm 2. However, there is also the computational cost related to the data transformation, performed before running the ADMM algorithm. Table I summarizes the floating-point operations necessary to encrypt the data matrices Z A i and Y A i . The computational time for such data encryption is expected to increase linearly with the number of agents, and quadratically with the number of records. A numerical analysis was performed by simulating data from VAR models with n ∈ {10, 100, 200, . . . , 1600}, T ∈ {10 000, 15 000} and p = 5. Fig. 3 summarizes the mean running times using an i7-8750H @ 2.20 GHz with 16 GB of RAM. To properly analyze the mean time per ADMM iteration, the computational times for the cycle between lines 6 to 15 of Algorithm 2 (coefficients' update) is measured assuming that the n agents update it in parallel. That said, considering for example a case with 10 000 records and 500 agents, the data encryption takes around 15 minutes, and then the Algorithm 2 takes around 10 seconds per iteration.

E. Asynchronous Communication
When applying the proposed method, the matrices (17)- (19) combine the solutions of all data owners, meaning that the "slowest" agent dictates the duration of each iteration. Since communication delays and failures may occur due to computation or communication issues, the proposed algorithm should be robust to this scenario. Otherwise, the convergence to the optimal solution may require too much time. The proposed approach deals with these issues by considering the last information sent by agents, but different strategies are followed according to the adopted collaborative scheme.
Regarding the centralized scheme, let Ω k i be the set of iterations for which agent i communicated its information, until current iteration k. After receiving the local contributions, central agent computes H k and U k , in (18)- (19), by using  (16). For the P2P approach, let Λ k i be the set of agents sharing information computed at iteration k, with agent i, i.e., Λ k i ={j : agent j sent MZ A j Q A j B k A j to agent i}. After computing and sharing After this extra communication round, agent i can obtain missing information when Λ k i = Λ k j , ∀i, j.

A. Data Description and Experimental Setup
The proposed algorithm is applied to forecast solar power up to 6 hours ahead. The data is publicly available in [10] and consists of hourly time series of solar power from 44 micro-generation units, located in a Portuguese city, and covers the period from February 1, 2011 to March 6, 2013. Since the VAR model requires the data to be stationary, the solar power is normalized through a clear sky model, which gives an estimate of the solar power in clear sky conditions at any given time [26]. This clear-sky model is fully data-driven and does not require any site-specific information (coordinates, rated power, etc.) since it estimates the clear-sky power time series exclusively from historical on-site power observations. Also, night-time hours are excluded by removing data for which the solar zenith angle is larger than 90. Based on previous work [3], a LASSO-VAR model to forecast y i,t+h at time t (using lags t − 1, t − 2 and t + h − 23) is evaluated with a sliding-window of one month and the model's fitting period consists of 12 months, h ≤ 6.
It is important to note that the LASSO-VAR model can be applied to both solar and wind power time series without any modification (see [7] for wind power forecasting). Nevertheless, a different set of lags should be selected for wind power. Furthermore, when compared to wind power, solar power forecasting is more challenging because the lags 1 and 2 are zero for the first daylight hours, i.e., there are fewer unknown data, and this makes it easier to recover original data. In our protocol, this means more restrictive values for u and v, which are crucial when defining r and r , as stated in Proposition 2.
To simulate the proposal, communication failures are modeled through Bernoulli random variables F it , with failure probability p i , F it ∼Bern(p i ), for each agent i=1, . . . , n at each communication time t.
The performance of the models is accessed through the normalized root mean squared error (NRMSE) calculated for agent i and lead-time h, with h=1, . . . , 6, as whereŷ i,t+h represents the forecast generated at time t. The ADMM process stops when all agents achieve where is the tolerance parameter.

B. Benchmark Models
The persistence and LASSO-autoregressive (LASSO-AR) models are implemented to assess the impact of collaboration over a model without collaboration. Two persistence models are considered:ŷ i,t+h =ŷ i,t (last measured power) andŷ i,t+h = y i,t+h−23 (power measured 24 hours before).
The analog method described in [12] was also implemented as a benchmark model because: (a) it is the only work in the RES forecasting literature that implements collaborative forecasting without data disclosure; (b) when the forecasting algorithm was designed, a trade-off between accuracy and privacy was necessary and the choice was privacy over accuracy.
Firstly, agent i searches the k situations most similar to the current power production values y i,t− +1 , . . . , y i,t . This similarity is measured through the Euclidean distance. Secondly, the k most similar situations (called analogs) are weighted according to the corresponding Euclidean distance. Agent i attributes the weight w A i (a) to the analog a. The forecast for h steps ahead is obtained by applying the computed weights on the h values registered immediately after the k analogs. The collaboration between agents requires the exchange of the time indexes for the selected analogs and corresponding weights. Two analogs belong to the same global situation if they occur at the same or at close timestamps. Agent i scores the analog a, observed at others' weights for close timestamps (21) where α is the weight given to neighbor information, j are the analogs from other agents, registered at timestamps t j , and I (t a , t j ) is the indicator function taking value 1 if |t j −t a | ≤ , with being the maximum time difference for two analogs to be considered part of the same global situation. The results in the next section will show that our approach does not degrade accuracy (the same results of a LASSO-VAR without privacy constraints are obtained), while offering robustness to data privacy.

C. Numerical Results
To access the quality of the proposed collaborative forecasting model, the synchronous LASSO-VAR is compared with benchmark models. Both central hub and P2P model have the same accuracy when considering synchronous communication.
The hyper-parameters ρ and λ were determined by crossvalidation (12 folds) in the initial model's fitting dataset, by considering the values of ρ, λ ∈ {0.5, 1, 2, 3, 4, 5, 10, 15, 20, 25}. Figure 4 illustrates the results in terms of NRMSE, for h = 1. Table II presents the NRMSE error for all agents, distinguishing between lead-times. In general, the smaller the forecasting horizon, the larger is the NRMSE improvement, i.e., (NRMSE Bench. − NRMSE V AR )/N RM SE Bench. · 100%. Besides, since the proposed LASSO-VAR and the LASSO-AR models have similar NRMSE for h > 3, the Diebold-Mariano test [27] is applied to test the superiority of the proposal, assuming a confidence level of 5%. This test showed that the improvement is statistically significant for all horizons. It is important to note that the decrease in the improvement is explained by the cross-correlation between the geographically distributed time series data. Since the dataset  is from a small municipality in Portugal, it is expected that the highest improvement occurs for the first lead times (in particular the first one), where the cross-dependencies between time series have the most effect. However, this depends on the geographical layout and distance between power plants. For instance, in [7], the results for wind power plants show the highest improvement for the second lead time; in the test case of western Denmark [28], the highest cross-dependency between two groups of wind farms was observed for lag two. Fig. 5 depicts the relative improvement in terms of NRMSE for the 44 agents. According to the Diebold-Mariano test, the LASSO-VAR model outperforms benchmarks in all lead-times for at least 25 of the 44 agents. Indeed, some agents contribute to improving the competitors' forecast without having a benefit to their own forecasting accuracy. Then, even if privacy is ensured, such agents can be unwilling to collaborate, which motivates data monetization through data markets [29].
For asynchronous communication, equal failure probabilities p i are assumed for all agents. Since a specific p i can generate various distinct failure sequences, 20 simulations were performed for each p i , p i ∈ {0.1, 0.3, 0.5, 0.7, 0.9}. Table III shows the mean NRMSE improvement for different failure probabilities p i , i = 1, . . . , n. In general, the greater the p i the smaller the improvement. Despite the model's accuracy decreases slightly, the LASSO-VAR model continues to outperform the AR model for both collaborative schemes, which demonstrates high robustness to communication failures. Fig. 6 depicts the evolution of the loss while fitting the LASSO-VAR model, considering p i ∈ {0.5, 0.9}. For the centralized approach, the loss tends to stabilize around larger values. In general, the results are better for the P2P scheme since in the centralized approach if an agent fails the algorithm proceeds with no chance of obtaining its information. In P2P, this agent   Table IV presents the mean running times and the number of iterations of both non-distributed and distributed approaches. The proposed schemes require larger execution times since they require estimating B k A i through a second ADMM cycle (Algorithm 2). However, the non-distributed LASSO-VAR requires more iterations to converge ( =5×10 −4 ).

VI. CONCLUSION
RES forecast skill can be improved by combining data from multiple geographical locations. One of the simplest and most effective collaborative models for very short-term forecasts is the vector autoregressive model. However, different data owners might be unwilling to share their time series data. In order to ensure data privacy, this work combined the advantages of the ADMM decomposition method with data encryption through linear transformations of data. It is important to underlines that the coefficients matrix obtained with the privacy-preserving protocol is the same one obtained without any privacy protection.
This novel method also included an asynchronous distributed ADMM algorithm, making it possible to update the forecast model based on information from a subset of agents and improve the computational efficiency of the proposed model. The mathematical formulation is flexible enough to be applied in two different collaboration schemes (central hub model and P2P) and paved the way for learning models distributed by features, instead of observations.
The results obtained for a solar energy dataset show that the privacy-preserving LASSO-VAR model delivers a forecast skill comparable to a model without privacy protection and outperformed a state-of-the-art method based on analog search. Furthermore, it exhibited high robustness to communication failures, in particular for the P2P scheme.
Two aspects not addressed in this paper were uncertainty forecasting and application to non-linear models (and consequently longer lead times), which we plan to investigate in a forthcoming work. Nevertheless, uncertainty forecast can be readily generated by transforming original data using a logitnormal distribution [6]. The proposed privacy-preserving protocol can be applied to non-linear regression by extending the additive model structure to a multivariate setting [30] or by local linear smoothing [31]. However, the extension to other non-linear multivariate models, such as long short-term memory networks and variants, requires further researcher and significant changes in the protocol. For instance, the rectifier (ReLU), which is an activation function commonly applied in neural networks and defined as f (x) = max(0, x), has the problem that f (MZQB) = Mf (ZQB).

APPENDIX A STANDARD ADMM AND LASSO-VAR
The ADMM solution for (4) is obtained by splitting the B variable into two variables (B and H) and adding the constraint H = B, Then, based on the augmented Lagrangian of (22), the solution is provided by the following system of equations -see [7], where U is the scaled dual variable associated with the constraint H = B, I is the identity matrix with proper dimension, and S λ/ρ is the soft thresholding operator. Consequently, (6a) can be estimated by adapting (22),

APPENDIX B OPTIMAL VALUE OF r
Proposition 1: Let X A i ∈ R T ×s be the sensible data from agent i, with u unique values, and M A j ∈ R T ×T be the private encryption matrix from agent j. If agents compute M A j X A i applying the protocol in (10)-(11), then two invertible matrices D A i ∈ R r×r and C A i ∈ R T ×(r−s) are generated by agent i and data privacy is ensured for √ T s − u < r < T. Proof: ×r and does not know X A i ∈ R T ×s , C A i ∈ R T ×r−s and D A i ∈ R r×r . Although X A i ∈ R T ×s , we assume this matrix has u unique values whose positions are known by all agents -when defining a VAR model with p consecutive lags Z A i has T +p−1 unique values, see Fig. 1 -meaning there are fewer values to recover.
Given that, agent j receives T r values and wants to determine u + T (r − s) + r 2 . The solution of the inequality T r < u + T (r − s) + r 2 , in r, determines that data from agent i is protected when r > √ T s − u.
Proposition 2: Let X A i ∈ R T ×s and G A i ∈ R T ×g be private data matrices, such that X A i has u unique values to recover and G A i has v unique values that are not in X A i . Assume the protocol in (10)-(11) is applied to compute MX A i , X A i M −1 and MG A i , with M as defined in (8). Then, to ensure privacy while computing MX A i and X A i M −1 , the protocol requires √ T s − u < r < T/2 ∧ r > s. (26) In addition, to compute MG A i , the protocol should take Proof: i) To compute MX A i , the i-th agent shares ii) X A i M −1 is computed using the matrix W A i defined before. Since M −1 = M −1 A n . . . M −1 A 1 , the n-th agent computes W A i M −1 A n . Then, the process repeat until the 1-st agent receives Again, the j-th agent receives T r values related to the unknown data from the i-th agent.
In summary, the n-th agent receives T r values and unknowns u + T (r − s) + r 2 (from X A i , C A i , D A i ). The solution for T r<u + T (r − s) + r 2 allows to infer that X A i is protected if On the other hand, the i-th agent receives 2T r values (MW A i , W A i M −1 ) and unknowns T 2 from M ⇒ r<T /2.
iii) Finally, to compute MG A i , the i-th agent should define new matrices C A i ∈ R T ×(r −g) and D A i ∈ R r ×r sharing The computation of MW provides T r new values, meaning that after computing MX A i , X A i M −1 and MG A i , the n-th agent has T r + T r values and does not know u + T (r − s) The solution of the inequality T r + T r < u + T (r − s) + r 2 + v + T (r − g) + r 2 allows to infer that r > T s − u − r 2 − v + T g > √ T g − v. On the other hand, the i-th agent receives 2T r + T r and does not know T 2 , meaning that r < T − 2r.

APPENDIX C PRIVACY ANALYSIS
The proposed approach requires agents to encrypt their data and then exchange that encrypted data. This appendix section analyzes the global exchange of information. First, we show that the proposed privacy protocol is secure in a scenario without collusion, i.e., no alliances between agents (data owners) to determine the private data. Then, we analyze how many agents have to collude for a privacy breach to occur.

A. No Collusion Between Agents
While encrypting sensible data X A i ∈R T ×s and G A i ∈R T ×g such that X A i has u unique values to recover and G A i has v unique values that are not in X A i , the 1-st agent obtains ∀i, which provides 2nT r + nT r values. At this stage, the agent does not know T 2 M + (n − 1)u values. Then, while fitting the LASSO-VAR model, the 1-st agent can recover MX ∈ R T ×ns and MG ∈ R T ×ng , as shown in [10]. That said, the 1-st agent receives 2nT r + nT r + nT s + nT g, and a confidentiality breach occurs if T (2nr + nr + ns + ng) ≥ T 2 + (n − 1)[u + v + T (r − s) + r 2 + T (r − g) + r 2 ]. After a little algebra, it is possible to verify that taking (26), ∃r in (27), such as the previous inequality is not satisfied.

B. Collusion Between Agents
A set of agents C can come together to recover the data of the remaining competitors. This collusion assumes that such agents are willing to share their private data. Let c be the number of agents colluding. In this scenario, the objective is to determine M ∈ R T ×T , knowing MW A i ∈ R T ×r , W A i M −1 ∈ R r×T , MW A i ∈ R T ×r , MX A i ∈ R T ×s , and MG A i ∈ R T ×g , i ∈ C.
Mathematically, it means that colluders can recover T 2 values by solving cT (r + r + r + s + g) equations, which is only possible for c ≥ T 2r+r +s+g .