-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Power Transformer Stability #440
Comments
Apparently it's not only the sample size but it is generally rather sensitive to initial conditions, similar things happen for some variations of the bounds, the first guess or the yearly_T. |
Interesting - what |
In the first case around (0, 0.1) and in the second (0.96, 0.013). In this case I set the bounds to (0, 1) for the first and to (-0.1, 0.1) for the second coefficient. First guess is (1,0). When I do the same without any bounds for example, also the example with 10.000 samples is much less normalized with coefficients coming out to be (0.6, 0.2). When I try with bounds of (0, 100) and (-100, 100) the data is more skewed after the transform than before and coefficients are (1.4, 96.2). |
Maybe we should check the |
@mathause and I looked into this a little more. The first interesting thing is that, as Mathias mentioned already, lambda does not need to be between 0 and 2. Thus, our lambda function is maybe not very well suited to estimate lambda. We should look into the possible benefits of making it a rational function (having possible Secondly, the definition of the loss function mesmer/mesmer/mesmer_m/power_transformer.py Lines 127 to 135 in 7e7ac26
is not intuitively clear to us. It is however the same as in sklearn. But in our opinion not the same as in the original paper: ![]() |
Additional comments:
|
Can you try your example with |
True, works for me too. But it's at least not a universal fix because when I do it with |
Can you try with the following diff (from main): diff --git a/mesmer/mesmer_m/power_transformer.py b/mesmer/mesmer_m/power_transformer.py
index d746921..a3020cc 100644
--- a/mesmer/mesmer_m/power_transformer.py
+++ b/mesmer/mesmer_m/power_transformer.py
@@ -144,7 +144,7 @@ class PowerTransformerVariableLambda(PowerTransformer):
local_yearly_T = local_yearly_T[~np.isnan(local_yearly_T)]
# choosing bracket -2, 2 like for boxcox
- bounds = np.c_[[0, -0.1], [1, 0.1]]
+ bounds = np.c_[[0, -0.1], [np.inf, 0.1]]
# first guess is that data is already normal distributed
first_guess = np.array([1, 0])
@@ -153,7 +153,7 @@ class PowerTransformerVariableLambda(PowerTransformer):
first_guess,
bounds=bounds,
method="SLSQP",
- jac=rosen_der,
+ # jac=rosen_der,
).x
def _yeo_johnson_transform(self, local_monthly_residuals, lambdas): |
Aha! I had already tried the infinite upper bound but not without rosen_der that seems to work well! (The infinite upper bound should have been implemented in #427) |
I actually had a feeling that the PRs above are not the last straw here. Turns out that using mesmer/mesmer/mesmer_m/power_transformer.py Lines 152 to 157 in 7377946
The scipy optimize.minimize documentation says that Concerning the bound-constraint methods, Nelder-Mead is listed as the most robust one but "if numerical computation of derivative can be trusted, other algorithms using the first and/or second derivatives information might be preferred for their better performance in general". I am not exactly sure if this is the case here. We minimize the negative loglikelihood which takes in the variance of the transformed residuals (a constant), the original residuals and lambda (derived from a logistic function). In general, it is not guaranteed that the negloglikelihood is twice differentiable see here. I tried the other options and
All of them are actually faster than I think our best option for now is to use Nelder-Mead because it seems to be the safest/most robust one judging from the documentation and it solves the instability issue. Of course it would be better to read into all of this and make a more informed choice but at the moment I think it is not worth the time. |
Thanks for looking into this! I tried to figure out the difference between bounds and constraints. If I understand correctly, bounds restrict the values individual params can take, while constraints restrict the values of all params (e.g. Not for now, but one thing we could play around with: would it be more efficient to reformulate the logistic regression such that no bounds are needed? I.e. write it as |
That's a very interesting point! I'll leave the issue open for now. |
I played around a bit with our power transformer. Try the following example:
The Data is nicely de-skewed.
But let's try it with 250 data points, which is how many datapoints we have for each month and gridcell. The output looks like this:
![image](https://private-user-images.githubusercontent.com/112418493/328422973-7e43d7d3-8807-420e-b641-05eb3065665f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNjUwNDUsIm5iZiI6MTczOTM2NDc0NSwicGF0aCI6Ii8xMTI0MTg0OTMvMzI4NDIyOTczLTdlNDNkN2QzLTg4MDctNDIwZS1iNjQxLTA1ZWIzMDY1NjY1Zi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMlQxMjUyMjVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iMjZmNGQ0YzM2MjhkOTEzMjFlNjVjNWQxMmY4YTJlY2ExMmIzZWJlOWUzYTk0ZTJkMmJjNjk3NGQ0NDQwZTZlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.sm4M-ivPOLkx1CPth5D0b32GVz9ZG8EFZt_ytE5dtAQ)
The power transformer fails to normalize the data.
Arguably, our data is probably never skewed so heavily but I wanted to leave this here for future reference, in case we want to revisit this in the future.
edited to not have years x 12 since this is not what we have in the application
The text was updated successfully, but these errors were encountered: