Power Transformer Stability #440

veni-vidi-vici-dormivi · 2024-05-06T15:39:51Z

I played around a bit with our power transformer. Try the following example:

import numpy as np
import scipy
import matplotlib.pyplot as plt
from mesmer.mesmer_m.power_transformer import PowerTransformerVariableLambda, lambda_function

np.random.seed(0)
n_years = 10_000

# make skewed data
skew = -5
local_monthly_residuals = scipy.stats.skewnorm.rvs(skew, size=n_years)
yearly_T = np.random.randn(n_years)

# find coefficients
pt = PowerTransformerVariableLambda()
pt.coeffs_ = pt._yeo_johnson_optimize_lambda(local_monthly_residuals, yearly_T)

# transform data
lmbdaleft = lambda_function(pt.coeffs_, yearly_T)
transformed = pt._yeo_johnson_transform(local_monthly_residuals, lmbdaleft)

# plot
fig, axs = plt.subplots(3)
axs[0].hist(local_monthly_residuals, bins=100)
axs[0].set_title('original (left skewed) data, skew = {}'.format(round(scipy.stats.skew(local_monthly_residuals), 2)))
axs[1].hist(transformed, bins=100)
axs[1].set_title('transformed data, skew = {}'.format(round(scipy.stats.skew(transformed),2)))
axs[2].hist(local_monthly_residuals-transformed, bins=100)
axs[2].set_title('difference')
plt.tight_layout()
plt.show()

The Data is nicely de-skewed.

But let's try it with 250 data points, which is how many datapoints we have for each month and gridcell. The output looks like this:

The power transformer fails to normalize the data.

Arguably, our data is probably never skewed so heavily but I wanted to leave this here for future reference, in case we want to revisit this in the future.

edited to not have years x 12 since this is not what we have in the application

veni-vidi-vici-dormivi · 2024-05-06T15:57:48Z

Apparently it's not only the sample size but it is generally rather sensitive to initial conditions, similar things happen for some variations of the bounds, the first guess or the yearly_T.

mathause · 2024-05-06T17:47:43Z

Interesting - what coeffs do you get out?

veni-vidi-vici-dormivi · 2024-05-07T07:02:10Z

In the first case around (0, 0.1) and in the second (0.96, 0.013). In this case I set the bounds to (0, 1) for the first and to (-0.1, 0.1) for the second coefficient. First guess is (1,0). When I do the same without any bounds for example, also the example with 10.000 samples is much less normalized with coefficients coming out to be (0.6, 0.2). When I try with bounds of (0, 100) and (-100, 100) the data is more skewed after the transform than before and coefficients are (1.4, 96.2).

veni-vidi-vici-dormivi · 2024-05-07T14:29:50Z

Maybe we should check the scipy.optimize.OptimizeResult.success attribute...

veni-vidi-vici-dormivi · 2024-05-08T11:58:34Z

@mathause and I looked into this a little more.

The first interesting thing is that, as Mathias mentioned already, lambda does not need to be between 0 and 2. Thus, our lambda function is maybe not very well suited to estimate lambda. We should look into the possible benefits of making it a rational function (having possible $\xi_0 < 0$) or maybe a completely different lambda function.

Secondly, the definition of the loss function

mesmer/mesmer/mesmer_m/power_transformer.py

Lines 127 to 135 in 7e7ac26

    
           n_samples = local_monthly_residuals.shape[0] 
        
           loglikelihood = ( 
        
               -n_samples / 2 * np.log(transformed_local_monthly_resids.var()) 
        
           ) 
        
           loglikelihood += ( 
        
               (lambdas - 1) 
        
               * np.sign(local_monthly_residuals) 
        
               * np.log1p(np.abs(local_monthly_residuals)) 
        
           ).sum()

is not intuitively clear to us. It is however the same as in sklearn. But in our opinion not the same as in the original paper:

mathause · 2024-05-08T16:00:26Z

Additional comments:

a unexplained function is passed as jacobian to the minimize function for the power transformer - should that be removed?

mesmer/mesmer/mesmer_m/power_transformer.py

Line 156 in 7e7ac26

jac=rosen_der,
$\lambda$ is modeled as a logistic function of the local global T. The logistic function restricts $\lambda \in [0, 2]$ which may not be the correct choice as $\lambda$ can be any real number - see the discussion in Fix bounds for lambda optimization #427 (comment)

mathause · 2024-05-14T18:31:47Z

Can you try your example with n_years = 250 again, commenting the jac=rosen_der part again? That just worked for me.

veni-vidi-vici-dormivi · 2024-05-15T09:26:34Z

True, works for me too. But it's at least not a universal fix because when I do it with skew = 5 it does not work anymore.

mathause · 2024-05-15T14:07:31Z

Can you try with the following diff (from main):

diff --git a/mesmer/mesmer_m/power_transformer.py b/mesmer/mesmer_m/power_transformer.py
index d746921..a3020cc 100644
--- a/mesmer/mesmer_m/power_transformer.py
+++ b/mesmer/mesmer_m/power_transformer.py
@@ -144,7 +144,7 @@ class PowerTransformerVariableLambda(PowerTransformer):
         local_yearly_T = local_yearly_T[~np.isnan(local_yearly_T)]
 
         # choosing bracket -2, 2 like for boxcox
-        bounds = np.c_[[0, -0.1], [1, 0.1]]
+        bounds = np.c_[[0, -0.1], [np.inf, 0.1]]
         # first guess is that data is already normal distributed
         first_guess = np.array([1, 0])
 
@@ -153,7 +153,7 @@ class PowerTransformerVariableLambda(PowerTransformer):
             first_guess,
             bounds=bounds,
             method="SLSQP",
-            jac=rosen_der,
+            # jac=rosen_der,
         ).x
 
     def _yeo_johnson_transform(self, local_monthly_residuals, lambdas):

veni-vidi-vici-dormivi · 2024-05-15T14:46:19Z

Aha! I had already tried the infinite upper bound but not without rosen_der that seems to work well! (The infinite upper bound should have been implemented in #427)

veni-vidi-vici-dormivi · 2024-05-23T13:50:32Z

I actually had a feeling that the PRs above are not the last straw here. Turns out that using method = "SLSQP" leads to numerical instabilities in the coefficients (between operating systems, see #430). It is here in the code:

mesmer/mesmer/mesmer_m/power_transformer.py

Lines 152 to 157 in 7377946

    
           return minimize( 
        
               _neg_log_likelihood, 
        
               first_guess, 
        
               bounds=bounds, 
        
               method="SLSQP", 
        
           ).x

The scipy optimize.minimize documentation says that bounds can be used with methods Nelder-Mead, L-BFGS-B, TNC, SLSQP, Powell, trust-constr, and COBYLA. However, SLSQP, trust-constr, and COBYLA are later listed under "constrained minimization". The others are listed under "bound-constrained minimization". I think the other methods are preferred when one passes an actual constraint to the constraint parameter of optimize.minimize.

Concerning the bound-constraint methods, Nelder-Mead is listed as the most robust one but "if numerical computation of derivative can be trusted, other algorithms using the first and/or second derivatives information might be preferred for their better performance in general".

I am not exactly sure if this is the case here. We minimize the negative loglikelihood which takes in the variance of the transformed residuals (a constant), the original residuals and lambda (derived from a logistic function). In general, it is not guaranteed that the negloglikelihood is twice differentiable see here.

I tried the other options and

TNC is the slowest of the four options,
L-BFGS-B is also slower than Nelder-Mead (this would be the default option)
Powell is faster than Nelder-Mead (also does not need a differantiable function)

All of them are actually faster than SLSQP. Moreover, testing on the coarse grid data I get several warnings with SLSQP that the coefficients went outside the bounds, which does not seem to be a problem with the other methods.

I think our best option for now is to use Nelder-Mead because it seems to be the safest/most robust one judging from the documentation and it solves the instability issue. Of course it would be better to read into all of this and make a more informed choice but at the moment I think it is not worth the time.

mathause · 2024-05-23T15:42:11Z

Thanks for looking into this! I tried to figure out the difference between bounds and constraints. If I understand correctly, bounds restrict the values individual params can take, while constraints restrict the values of all params (e.g. x[0] + x[1] < 50 or so). Se we need bounds but not constraints (although I think bounds could be expressed as constraints).

Not for now, but one thing we could play around with: would it be more efficient to reformulate the logistic regression such that no bounds are needed? I.e. write it as $2 / (1 + \exp(- (\beta_0 + \beta_1 \cdot x)))$ instead of $2 / (1 + \beta_0 \cdot \exp(\beta_1 \cdot x))$ and then use a unbounded optimization? (As this formulation ensures the expression cannot be negative).

veni-vidi-vici-dormivi · 2024-05-23T15:53:28Z

That's a very interesting point! I'll leave the issue open for now.

veni-vidi-vici-dormivi added wontfix This will not be worked on topic-MESMER_M labels May 6, 2024

veni-vidi-vici-dormivi changed the title ~~Estimation of coefficients for power transformer depends heavily on sample size~~ Estimation of coefficients for power transformer depends heavily on initial conditions May 6, 2024

mathause mentioned this issue May 8, 2024

Fix bounds for lambda optimization #427

Merged

veni-vidi-vici-dormivi removed the wontfix This will not be worked on label May 16, 2024

This was referenced May 16, 2024

remove jac=rosen_der for from power transformer optimization #447

Merged

adjust bounds of power transformer fit #446

Merged

veni-vidi-vici-dormivi mentioned this issue May 23, 2024

change optimize.minimize method from SLSQP to Nelder-Mead #455

Merged

veni-vidi-vici-dormivi changed the title ~~Estimation of coefficients for power transformer depends heavily on initial conditions~~ Power Transformer Stability May 24, 2024

veni-vidi-vici-dormivi added the enhancement New feature or request label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Power Transformer Stability #440

Power Transformer Stability #440

veni-vidi-vici-dormivi commented May 6, 2024 •

edited by mathause

Loading

veni-vidi-vici-dormivi commented May 6, 2024

mathause commented May 6, 2024

veni-vidi-vici-dormivi commented May 7, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented May 7, 2024

veni-vidi-vici-dormivi commented May 8, 2024

mathause commented May 8, 2024

mathause commented May 14, 2024

veni-vidi-vici-dormivi commented May 15, 2024

mathause commented May 15, 2024

veni-vidi-vici-dormivi commented May 15, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented May 23, 2024

mathause commented May 23, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented May 23, 2024

Power Transformer Stability #440

Power Transformer Stability #440

Comments

veni-vidi-vici-dormivi commented May 6, 2024 • edited by mathause Loading

veni-vidi-vici-dormivi commented May 6, 2024

mathause commented May 6, 2024

veni-vidi-vici-dormivi commented May 7, 2024 • edited Loading

veni-vidi-vici-dormivi commented May 7, 2024

veni-vidi-vici-dormivi commented May 8, 2024

mathause commented May 8, 2024

mathause commented May 14, 2024

veni-vidi-vici-dormivi commented May 15, 2024

mathause commented May 15, 2024

veni-vidi-vici-dormivi commented May 15, 2024 • edited Loading

veni-vidi-vici-dormivi commented May 23, 2024

mathause commented May 23, 2024 • edited Loading

veni-vidi-vici-dormivi commented May 23, 2024

veni-vidi-vici-dormivi commented May 6, 2024 •

edited by mathause

Loading

veni-vidi-vici-dormivi commented May 7, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented May 15, 2024 •

edited

Loading

mathause commented May 23, 2024 •

edited

Loading