ISAAC, SUPSI
ISAAC, SUPSI
IDSIA, SUPSI
IDSIA, SUPSI
2025-07-08
Nonlinear forecast reconciliation becomes necessary when the relationships among variables are governed by non-additive constraints — for instance, ratios, maxima, or physical laws (power flow equations) in energy systems.
These nonlinear structures frequently arise in energy networks, transport systems, and engineering domains, where ensuring physically consistent forecasts improves operational reliability and supports better planning and control.
We demonstrate some empirical results and a comparison between 4 nonlinear reconciliation methods with the following simulation study.
Let, \(B_1\) and \(B_2\) be the two independent (bottom) time series related to the single upper \(U\) in the following way: \[ U=B_1^2+B_2^2 \]
Data Generation: The synthetic data is generated by sampling \(B_1\) and \(B_2\) as stationary \(AR(1)\) processes,i.e., with \(0<\phi_1,\phi_2<1\),
\[
b_{1,t} = \phi_1b_{1,t-1}+\epsilon_{1,t}\\
b_{2,t} = \phi_2b_{2,t-1}+\epsilon_{2,t}
\]
\(\epsilon_{1,t}\) and \(\epsilon_{2,t}\) are independent random noises.
After a \(80-20\) train and test split, we independently generate base forecasts for each series using a Random Forest Regressor with \(L_1\) loss criterion.
These provide deterministic forecasts, so we obtain the base forecast distribution using a residual bootstrap technique.
The non-linear bottom-up approach is the most easy and straightforward way to reconcile a base forecast distribution and thus serves as a natural baseline to compare the other methods.
The function \(f_u(.)\) is directly applied to all the independent (bottom) level base forecast samples.
\(\boldsymbol{\tilde{u}}^{(i)}_{t+h}=f_u(\boldsymbol{\hat{b}}^{(i)}_{t+h})\)
\(\boldsymbol{\tilde{b}}^{(i)}_{t+h}=\boldsymbol{\hat{b}}^{(i)}_{t+h}\)
Thus, \(\boldsymbol{\tilde{y}}^{(i)}_{t+h} = [\boldsymbol{\tilde{u}}^{(i)T}_{t+h},\boldsymbol{\tilde{b}}^{(i)T}_{t+h}]^T\) are samples from the reconciled distribution \(\tilde{\pi}\).
The projection method is the extension of the (Hyndman et al. 2011) for non-linear surfaces, and follows the same extension of reconciling each sample point in the probabilistic case as done in (Panagiotelis et al. 2023).
The optimization problem here becomes,
\[
\boldsymbol{\tilde{y}}_{t+h} = \arg\min_\boldsymbol{y}(\boldsymbol{y}-\boldsymbol{\hat{y}}_{t+h})^T(\boldsymbol{y}-\boldsymbol{\hat{y}}_{t+h})
\]
subject to the constraint: \(g(\boldsymbol{y})=0\).
For linear reconciliation the constraint is \(C\boldsymbol{y}=0\) and a closed-form solution exists as \(\boldsymbol{\tilde{y}}_{t+h} = S (S^T S)^{-1}S^T{\boldsymbol{\hat{y}}_{t+h}}\), \(S\) being the summing matrix.
For nonlinear reconciliation such a closed form solution is not available and we adapt a Newton-Raphson approach to obtain the minima by finding a stationary point on the Lagrangian: \({\small \mathcal{L}= (\boldsymbol{y}-\boldsymbol{\hat{y}}_{t+h})^T(\boldsymbol{y}-\boldsymbol{\hat{y}}_{t+h}) + \lambda^ Tg(\boldsymbol{y})}\)
For linear reonciliation through projections, it was shown that a reduction in the score is guaranteed against the base forecast by using Pythagorous theorem.
Does such a result exist in the nonlinear case? Let’s have a look at our example.
We observe similar results in the probabilistic setting as well.
This importance sampling algorithm comes as a natural extension to the BUIS algorithm developed in (Zambon, Azzimonti, and Corani 2024) for the linear reconciliation.
Starting with the base forecast distribution for the independent (bottom) level i.e., \(\hat\nu_b\) as the proposal distribution, and \(\tilde\nu\) as the target distribution, this method adapts reconciliation via conditioning.
The algorithm keeps the assumption of the conditional independence between the independent (bottom) and dependent (upper) levelsi.e., \(\hat{\boldsymbol{B}}_{t+h}\) and \(\hat{\boldsymbol{U}}_{t+h}\) are conditionally independent.
After reconciling via conditioning, the samples out of the reconciled distribution are obtained using the importanace sampling approach.
Sampling: Draw samples \(\{b^{(i)}\}_{i=1}^N\) from \(\hat\nu_b\) .
Compute weights: For each sample \(b^{(i)}\), compute the importance weight:
\[
w^{(i)} = \frac{\hat{\pi}(f_u(b^{(i)}), b^{(i)})}{\hat{\pi}_b(b^{(i)})} = \hat{\pi}_u(f_u(b^{(i)}))
\]
Normalization: Normalize the weights:
\[
\tilde{w}^{(i)} = \frac{w^{(i)}}{\sum_{j=1}^N w^{(j)}}
\]
Approximation: The reconciled distribution \(\tilde\nu\) is approximated by the weighted samples \(\{ (b^{(i)}, \tilde{w}^{(i)}) \}_{i=1}^N\).
In (Corani et al. 2020), a correspondence was shown between the Bayesian forecast reconciliation and Linear-Guassian Kalman filter.
We propose a similar idea for the nonlinear case, where we use an unscented Kalman filter (UKF) to obtain the posterior distribution (reconciled) from the prior (base) belief.
Basic assumption: independent (bottom) time series are jointly Gaussian \[ p(\boldsymbol{B}_{t+h}|[\boldsymbol{b}_1,\boldsymbol{b}_2,...,\boldsymbol{b}_t]) = \mathcal{N}(\boldsymbol{\hat{b}}_{t+h},\boldsymbol{\hat{\Sigma}}_{B,h}) \] \[ \boldsymbol{\hat{U}}_{t+h}=f_u(\boldsymbol{B}_{t+h})+\boldsymbol{\varepsilon}^u_{t+h}, \] \[ \boldsymbol{\varepsilon}^u_{t+h} \sim \mathcal{N}(\boldsymbol{0},\boldsymbol{\hat{\Sigma}}_{U,h}), \]
To obtain: posterior reconciled distribution, \(p(\boldsymbol{B}_{t+h}|\boldsymbol{\hat{U}}_{t+h}=\boldsymbol{\hat{u}}_{t+h})\).
The reconciled mean and covariance of the posterior independent (bottom) series is approximately a multivariate Gaussian distribution, with
Mean:\[ \boldsymbol{\tilde{b}}_{t+h} = \boldsymbol{\hat{b}}_{t+h} + K(\boldsymbol{\hat{u}}_{t+h}-\boldsymbol{\hat{u}}^-_{t+h}) \]
Covariance matrix: \[ \boldsymbol{\tilde{\Sigma}}_{B,h} = \boldsymbol{\hat{\Sigma}}_{B,h} - KS_uK^T \]
A sample-based propagation strategy is applied to the reconciled independent (bottom) samples to derive an empirical distribution for the dependent (upper) levels, since no close-form solution is available and results can vary vastly with each different choice of \(f_u\).
We compare the accuracy of the reconciliation methods against the base forecast distribution as well as the baseline reconciliation method i.e., the NL-BU.
The energy score is used for the first comparison being a proper score and a log-score is used to compare the reconciliation methods against the baseline as well since it is proper on the coherent surface.
We compute the log-score only for the independent level (bottoms).
For this simulation study, the UKF and the NL-OP showed significant increase in accuracy against both the base and the baseline forecasts.
| Reconciliation Method | ES | Improvements against base (%) | % of improvement points | LS |
|---|---|---|---|---|
| base (unreconciled) | 0.089598 | NA | NA | NA |
| NL-BU (baseline) | 0.090328 | -0.815% | 45.23% | -1.702547 |
| NL-BUIS | 0.089954 | -0.397% | 50.75% | -1.596634 |
| NL-OP | 0.080033 | 10.68% | 67.34% | -2.208334 |
| UKF | 0.078319 | 12.59% | 62.31% | -2.379933 |
Check the methods with some real-world used case.
Currently we are working with the Swiss mortality data, which serves as an excellent avenue for reconciliation with the ratio being the nonlinear function.
Mathematical proof for the error reduction theorem, which was found to exist for surfaces with constant curvature.
We have applied the projection method for the Swiss-Grid data where we tried to satisfy power-flow equations.
In the later case, an increase in accuracy was observed after reconciliation for small grids.
The proposal of a new scoring function based on Geodesic distance for measuring the accuracy is under study, since it looks potentially like a more reasonable choice to compare distances between coherent points.
Thank You
We obtain the minima by defining the Lagrangian: \({\small \mathcal{L}= (\boldsymbol{y}-\boldsymbol{\hat{y}}_{t+h})^TW_h(\boldsymbol{y}-\boldsymbol{\hat{y}}_{t+h}) + \lambda^ Tg(\boldsymbol{y})}\)
and then finding a stationary point defined by \(\partial_z \mathcal{L} = 0\) and \(\partial_{\lambda} \mathcal{L} =0\) using Newton-Raphson iterations.
In vector form this can be written as: \[ \textstyle G(\boldsymbol{y}, \lambda)=\left[ \begin{aligned} \partial_{\lambda} \mathcal{L}\\ \partial_{\boldsymbol{y}} \mathcal{L} \end{aligned}\right] = \left[ \begin{aligned} 2W(\boldsymbol{y}-&\boldsymbol{\hat{y}}_{t+h}) + J^T \lambda\\ &g(\boldsymbol{y}) \end{aligned}\right] = 0 \]
With the NR update \({\small G(\boldsymbol{y}_k, \lambda_k) = -J_F \left[\begin{aligned}d \boldsymbol{y}\\ d \lambda\end{aligned}\right]}\), by expliciatating the expression for \(G\) the displacement vector \([d \boldsymbol{y}^T, d \lambda^T]^T\) can be found by solving \[ \textstyle \left[ \begin{aligned} 2W(\boldsymbol{y}-&\boldsymbol{\hat{y}}_{t+h}) + J^T \lambda\\ &g(\boldsymbol{y}) \end{aligned}\right] = -\left[ \begin{aligned} 2W& \quad J^T \\ J& \quad0 \end{aligned}\right]\left[\begin{aligned}d \boldsymbol{y}\\ d \lambda\end{aligned}\right] \]
The first step of the UKF is to have a set of deterministically chosen sample points called Sigma points that exactly captures the mean and covariance of the prior Gaussian distribution for the independent (bottom).
Let \(\lambda = \alpha^2(n_b+\kappa)-n_b\), and \(\gamma = \sqrt{n_b+\lambda}\), where \(\alpha\) controls the dispersion of the Sigma points.
The Sigma points \(\chi_i\)s are then computed in the following way:
\[
\textstyle
\chi_0 = \boldsymbol{\hat{b}}_{t+h},\\
\chi^\space_i = \boldsymbol{\hat{b}}_{t+h} + \gamma[\sqrt{\boldsymbol{\hat{\Sigma}}_{B,h}}]_i, \space i=1(1)n_b,\\
\chi^\space_{i+n_b} = \boldsymbol{\hat{b}}_{t+h} - \gamma[\sqrt{\boldsymbol{\hat{\Sigma}}_{B,h}}]_i, \space i=1(1)n_b
\]
where \(\sqrt M\) is the cholesky decomposition of the matrix \(M\).
We move on to update the measurement mean, covariance and cross-covariance. Denote \(\boldsymbol{z}_i = f_u(\chi_i)\),
\[
\boldsymbol{\hat{u}}^-_{t+h} = \sum^{2n_b}_{i=0}W^i_m \boldsymbol{z}_i\\
S_u = \boldsymbol{\hat{\Sigma}}_{U,h} + \sum^{2n_b}_{i=0} W^i_c(\boldsymbol{z}_i-\boldsymbol{\hat{u}}^-_{t+h})(\boldsymbol{z}_i-\boldsymbol{\hat{u}}^-_{t+h})^T \\
P_{b,u} = \sum^{2n_b}_{i=0}W^i_c(\chi^\space_i-\boldsymbol{\hat{b}}_{t+h})(\boldsymbol{z}_i-\boldsymbol{\hat{u}}^-_{t+h})^T
\]
Thus, the Kalman gain is, \(K = P_{b,u}S^{-1}_u\) .
The reconciled mean and covariance of the posterior independent (bottom) series is approximately a multivariate Gaussian distribution, with
Mean:\[ \boldsymbol{\tilde{b}}_{t+h} = \boldsymbol{\hat{b}}_{t+h} + K(\boldsymbol{\hat{u}}_{t+h}-\boldsymbol{\hat{u}}^-_{t+h}) \]
Covariance matrix: \[ \boldsymbol{\tilde{\Sigma}}_{B,h} = \boldsymbol{\hat{\Sigma}}_{B,h} - KS_uK^T \]
A sample-based propagation strategy is applied to the reconciled independent (bottom) samples to derive an empirical distribution for the dependent (upper) levels, since no close-form solution is available and results can vary vastly with each different choice of \(f_u\).