Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MESMER-X: better loss function for other coefficients #582

Open
veni-vidi-vici-dormivi opened this issue Dec 17, 2024 · 0 comments
Open

Comments

@veni-vidi-vici-dormivi
Copy link
Collaborator

To find a good first guess for parameters other than location and scale we use the following loss function:

def fg_fun_others(self, x_others, margin0=0.05):
# preparing support
x = np.copy(self.fg_coeffs)
x[self.fg_ind_others] = x_others
distrib = self.expr_fit.evaluate(x, self.data_pred)
bot, top = distrib.support()
val_bot = np.min(self.data_targ - bot)
val_top = np.min(top - self.data_targ)
# preparing margin on support
m = np.mean(self.data_targ)
s = np.std(self.data_targ - m)
# optimization
if val_bot < 0:
# limit of val_bottom --> 0- = 1/margin0*s
return np.exp(-val_bot) * 1 / (margin0 * s)
elif val_top < 0:
# limit of val_top --> 0+ = 1/margin0*s
return np.exp(-val_top) * 1 / (margin0 * s)
else:
return 1 / (val_bot + margin0 * s) + 1 / (val_top + margin0 * s)

First, we calculate the differences between all samples and the bounds of the distributions support. Of these differences we take the minimum values. Take for example a truncated normal distribution:

import numpy as np
import scipy

rng = np.random.default_rng(0)
n = 251
pred = np.ones(n)

loc = 0.
scale = 0.1
a = -1.2 # nr of stds from loc at which to truncate
b = 1.2
targ = scipy.stats.truncnorm.rvs(loc=loc, scale=scale, a=a, b=b, size=n, random_state=rng)

The parameters a and b give you the upper and lower bound of the distribution (in terms of numbers ob stds) and thus define the so called support of the distribution. To optimize for a and b we calculate how far the most extreme samples lie from the bounds of the distribution that results from the chosen parameters.

If the smallest sample is smaller than the lower bound of the resulting distribution the difference we calculated (var_bot) will be negative. In this case our loss is $\propto$np.exp(-val_bot) (I am ignoring the margin for now for simplicity). This should result in a being adjusted such that this difference will decrease exponentially and the lower bound will get closer to this most extreme sample, until this sample will be inside the support of the distribution. The same is true for the upper bound. These are the first two cases in the code block above.

If both differences are positive, meaning that all samples are already within the support of the distribution, our loss is 1 / (val_bot + margin0 * s) + 1 / (val_top + margin0 * s) meaning that larger differences lead to less loss. This means we actually will optimize for the support of the distribution being as large as possible.

Now I see several problems with this:

  1. The definition with if cases is not optimal. (1) As long as val_bot is negative, coefficients influencing only val_top do not have any effect. Sometimes only the bottom support (the first if statement) will be optimized since this is the first condition we check. (2) Jumping between cases can lead to local minima, for example if within three iterations we switch from case 1 to 2 and back to 1 it might happen that the loss for case 2 is smaller than the loss for case 1 in the other two iterations so that the minimization will terminate (this happened for the truncated normal I tested).
  2. I am not sure about optimizing for a large support. For a GEV (shape parameter also controls upper or lower bound on support) this might be less sensitive but for the case of the truncated normal it is essential to find these bounds (I see that that might not be a use case for us). Moreover, a loss function without any local or global minimum as this one can lead / leads? to non-convergence.

I want to propose a new loss function that is continuous and does not favor large support, namely simply a quadratic function in two dimensions, i.e. a parabola. This would optimize for the distances of the most extreme samples to the bounds to become 0 (+ some margin maybe?), thus making the bounds to the distribution close to the most extreme samples.

This would look as follows:

if bot == -np.inf:
   return worst_diff_top**2
elif top == np.inf:
    return worst_diff_bot**2
else:
    return worst_diff_bot**2 + worst_diff_top**2

It is again three cases, since if one of the distribution sides is infinite, we should only optimize the other end. I see that this can still lead to jumping between cases and potential local minima arising from that but I would hope that it is much rarer as I would expect that one bound being infinite is a rather "stable" property of a distribution. I tested this for the GEV and the truncnorm and it works more reliably than the old version (if not always better in the sense that the coefficients come out closer to the real ones since some local minima were apparently pretty good).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant