Alpha normalize: [0, 1] linear transformation of density ratio to [alpha, 1] range. #20

mierzejk · 2023-08-24T18:57:05Z

Synopsis

The pull request introduces a new function: helpers.alpha_normalize(values: ndarray, alpha: float) -> ndarray that changes the lower 0 bound (infimum) of the input values argument in the range of [0, 1] to [alpha, 1] by applying a nearly$^1$ linear transformation.

Rationale

There are many possible scenarios where the estimated density ratio is further process on the logarithmic scale, such as increasing numerical stability by replacing a product of conditionally independent variates with their sum, or a quotient with the difference, or for the sake of clarity when plotting respective probability density functions. Alpha-relative density ratio estimator yields results in the [0, alpha^-1] boundary, and as long as 0 is not in the logarithmic domain, it must be handled prior to applying the logarithmic transformation. The alpha_normalize function does exactly that by applying the following linear transformation to input values in the range of [0, 1]:

$$x' = (1 - alpha) * x + alpha = x + alpha * (1 - x)$$

where alpha is the normalization term, a small real number (technically rational, because it is implemented as a floating point number).

The [0, 1] range where the function is applied has been selected to modify the estimated density ration as little as possible, especially so the upper alpha^-1 supremum is not changed, as it would contravene the properties of alpha-relative density ratio estimator. Additionally, log(1)=0 on the logarithmic scale might attribute specific qualities in case of some models (Probabilistic Record Linkage, for instance), hence input values are not modified in (and beyond) this point.

Implementation specifics

To preserve vital estimator properties, there are 2 invariants always met by the function results:

The number of unique values must not changed.
The order of output values remains the same as in the input argument (as determined by numpy.argsort).

$^1$ These invariants account for the nearly linear transformation. Due to floating‑point arithmetic, if two consecutive input numbers that are not equal are to be transformed to the same outcome, in order to satisfy the constraints in the list above and yield different values as well, the numpy.nextafter is employed in the direction of 0.
By virtue of this implementation approach, in extreme cases there is a possibility of output values that:

are less than alpha, or
are nonpositive.

Should it happen, a relevant warning is issued.

`DensityRatio.alpha_normalize` function

The function simply calls helpers.alpha_normalize passing its alpha field value as the second argument (normalization term). It leads to the [alpha, alpha^-1] boundary of the estimated density ratio values, which transforms to [log(alpha), -log(alpha)] on the logarithmic scale, with log(1)=0 being exactly in the middle. Such range can render some probability density graphs (e.g. Fellegi Sunter log likelihood) really lucid and comprehensible.

Closing remarks

Both alpha_normalize functions are supplementary and optional, users can do without them and stick to the raw compute_density_ratio outcome. The only place the alpha_normalize function has been introduce in the original code flow is the alpha_KL_divergence function. Because the divergence numerator makes use of numpy.log, should 0 occur in the estimated density ratio, the invalid -inf Kullback–Leibler divergence value is returned.

…to replace 0 is symmetrical to alpha^-1 with respect to the natural logarithm.

…alize

…ogging.

mierzejk · 2023-08-27T22:12:51Z

Please note that all commits covered by this pull request are also included in #23 Aggregated pull request.

mierzejk added 6 commits August 23, 2023 12:15

alpha_normalize: normalize densratio values < 1 so the minimum value …

391da6d

…to replace 0 is symmetrical to alpha^-1 with respect to the natural logarithm.

Always print 'Normalized vector contains…' warnings.

fe01dda

Refactoring of warning. Add tests.

9e08899

Refactoring of warnings. Add tests.

5f7d387

Merge remote-tracking branch 'origin/alpha_normalize' into alpha_norm…

a9228d9

…alize

KL_divergence numerator guard.

451d657

mierzejk marked this pull request as draft August 24, 2023 19:25

Handle alpha_KL_divergence '[not calculated]' str value for verbose l…

8dedd4d

…ogging.

mierzejk marked this pull request as ready for review August 24, 2023 19:47

mierzejk marked this pull request as draft August 27, 2023 19:55

Update README with section on DensityRatio.alpha_normalize function.

330342b

mierzejk marked this pull request as ready for review August 27, 2023 21:27

mierzejk mentioned this pull request Aug 27, 2023

Aggregated pull request #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alpha normalize: [0, 1] linear transformation of density ratio to [alpha, 1] range. #20

Alpha normalize: [0, 1] linear transformation of density ratio to [alpha, 1] range. #20

mierzejk commented Aug 24, 2023 •

edited

Loading

mierzejk commented Aug 27, 2023

Alpha normalize: [0, 1] linear transformation of density ratio to [alpha, 1] range. #20

Are you sure you want to change the base?

Alpha normalize: [0, 1] linear transformation of density ratio to [alpha, 1] range. #20

Conversation

mierzejk commented Aug 24, 2023 • edited Loading

Synopsis

Rationale

Implementation specifics

DensityRatio.alpha_normalize function

Closing remarks

mierzejk commented Aug 27, 2023

mierzejk commented Aug 24, 2023 •

edited

Loading

`DensityRatio.alpha_normalize` function