You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, we calculate the differences between all samples and the bounds of the distributions support. Of these differences we take the minimum values. Take for example a truncated normal distribution:
importnumpyasnpimportscipyrng=np.random.default_rng(0)
n=251pred=np.ones(n)
loc=0.scale=0.1a=-1.2# nr of stds from loc at which to truncateb=1.2targ=scipy.stats.truncnorm.rvs(loc=loc, scale=scale, a=a, b=b, size=n, random_state=rng)
The parameters a and b give you the upper and lower bound of the distribution (in terms of numbers ob stds) and thus define the so called support of the distribution. To optimize for a and b we calculate how far the most extreme samples lie from the bounds of the distribution that results from the chosen parameters.
If the smallest sample is smaller than the lower bound of the resulting distribution the difference we calculated (var_bot) will be negative. In this case our loss is $\propto$np.exp(-val_bot) (I am ignoring the margin for now for simplicity). This should result in a being adjusted such that this difference will decrease exponentially and the lower bound will get closer to this most extreme sample, until this sample will be inside the support of the distribution. The same is true for the upper bound. These are the first two cases in the code block above.
If both differences are positive, meaning that all samples are already within the support of the distribution, our loss is 1 / (val_bot + margin0 * s) + 1 / (val_top + margin0 * s) meaning that larger differences lead to less loss. This means we actually will optimize for the support of the distribution being as large as possible.
Now I see several problems with this:
The definition with if cases is not optimal. (1) As long as val_bot is negative, coefficients influencing only val_top do not have any effect. Sometimes only the bottom support (the first if statement) will be optimized since this is the first condition we check. (2) Jumping between cases can lead to local minima, for example if within three iterations we switch from case 1 to 2 and back to 1 it might happen that the loss for case 2 is smaller than the loss for case 1 in the other two iterations so that the minimization will terminate (this happened for the truncated normal I tested).
I am not sure about optimizing for a large support. For a GEV (shape parameter also controls upper or lower bound on support) this might be less sensitive but for the case of the truncated normal it is essential to find these bounds (I see that that might not be a use case for us). Moreover, a loss function without any local or global minimum as this one can lead / leads? to non-convergence.
I want to propose a new loss function that is continuous and does not favor large support, namely simply a quadratic function in two dimensions, i.e. a parabola. This would optimize for the distances of the most extreme samples to the bounds to become 0 (+ some margin maybe?), thus making the bounds to the distribution close to the most extreme samples.
It is again three cases, since if one of the distribution sides is infinite, we should only optimize the other end. I see that this can still lead to jumping between cases and potential local minima arising from that but I would hope that it is much rarer as I would expect that one bound being infinite is a rather "stable" property of a distribution. I tested this for the GEV and the truncnorm and it works more reliably than the old version (if not always better in the sense that the coefficients come out closer to the real ones since some local minima were apparently pretty good).
The text was updated successfully, but these errors were encountered:
To find a good first guess for parameters other than location and scale we use the following loss function:
mesmer/mesmer/mesmer_x/train_l_distrib_mesmerx.py
Lines 1173 to 1193 in 72d83a5
First, we calculate the differences between all samples and the bounds of the distributions support. Of these differences we take the minimum values. Take for example a truncated normal distribution:
The parameters a and b give you the upper and lower bound of the distribution (in terms of numbers ob stds) and thus define the so called support of the distribution. To optimize for a and b we calculate how far the most extreme samples lie from the bounds of the distribution that results from the chosen parameters.
If the smallest sample is smaller than the lower bound of the resulting distribution the difference we calculated ($\propto$
var_bot
) will be negative. In this case our loss isnp.exp(-val_bot)
(I am ignoring the margin for now for simplicity). This should result in a being adjusted such that this difference will decrease exponentially and the lower bound will get closer to this most extreme sample, until this sample will be inside the support of the distribution. The same is true for the upper bound. These are the first two cases in the code block above.If both differences are positive, meaning that all samples are already within the support of the distribution, our loss is
1 / (val_bot + margin0 * s) + 1 / (val_top + margin0 * s)
meaning that larger differences lead to less loss. This means we actually will optimize for the support of the distribution being as large as possible.Now I see several problems with this:
val_bot
is negative, coefficients influencing onlyval_top
do not have any effect. Sometimes only the bottom support (the first if statement) will be optimized since this is the first condition we check. (2) Jumping between cases can lead to local minima, for example if within three iterations we switch from case 1 to 2 and back to 1 it might happen that the loss for case 2 is smaller than the loss for case 1 in the other two iterations so that the minimization will terminate (this happened for the truncated normal I tested).I want to propose a new loss function that is continuous and does not favor large support, namely simply a quadratic function in two dimensions, i.e. a parabola. This would optimize for the distances of the most extreme samples to the bounds to become 0 (+ some margin maybe?), thus making the bounds to the distribution close to the most extreme samples.
This would look as follows:
It is again three cases, since if one of the distribution sides is infinite, we should only optimize the other end. I see that this can still lead to jumping between cases and potential local minima arising from that but I would hope that it is much rarer as I would expect that one bound being infinite is a rather "stable" property of a distribution. I tested this for the GEV and the truncnorm and it works more reliably than the old version (if not always better in the sense that the coefficients come out closer to the real ones since some local minima were apparently pretty good).
The text was updated successfully, but these errors were encountered: