-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] pseudo-p significance calculation #281
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #281 +/- ##
=======================================
- Coverage 73.6% 72.3% -1.4%
=======================================
Files 24 25 +1
Lines 3316 3378 +62
Branches 520 529 +9
=======================================
Hits 2441 2441
- Misses 708 770 +62
Partials 167 167
|
OK, done here on the logic & implementation. Thank you @JosiahParry for getting the ball rolling here 😄 Very much appreciated! I've re-implemented the percentile-based two-sided test from scratch using I don't think we should expose the The percentile will always equal the folded version for symmetric distributions, but the folded version becomes a directed test as skew increases. I think that the (over+under)/all is also the intended estimand of the If other maintainers approve these four options ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sweet. presume the stuff in main gets moved to a test file or something but this is great
If other maintainers approve these four options (greater, lesser, two-sided, and directed) for the user classes and a folded option for this function only (for replication/testing purposes) I can start propagating this across the user classes.
+1
percentile = (reference_distribution < test_stat).mean(axis=1) | ||
bounds = np.column_stack((1 - percentile, percentile)) * 100 | ||
bounds.sort(axis=1) | ||
lows, highs = np.row_stack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be able to be vectorised, but I couldn't quickly figure that out. the following did not generate the same results as below:
stats.scoreatpercentile(reference_distribution, bounds, axis=1)
I am still working on this, but I recall now why the implementation of an "alternative" argument was a bit trickier than I expected... because we allow for the user to "discard" the random replicates, rather than store them, we have to push the significance logic all the way down to the conditional randomization It seems clear to me that
If 1-4 are correct, this means we don't need to change any of the numba code. The correction can be calculated as So, if we implement our current test for local stats without flipping (as greater), generate 1-p_sim (as lesser), and implement the above correction for the two-sided test ( Is that OK w/ other maintainers? |
This is too much stats for me to say anything useful. |
Same for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the topic is over my head, my approval is based on a general review.
One further wrinkle as well: some global Moran tests support directed testing a binary option already. Notably, if For us to roll-out the testing across all the classes, we need to consider if this option should be deprecated in favor of an explicit "alternative" option? Right now, there's no way to force a direction on these tests. |
I think this is OK. One thing to check is if:
|
Sure, that is what I initially thought & what @JosiahParry suggested. The reason why I'm thinking it's actually Thinking another way, in the percentile-based version of the test, you compute the percentile the simulation code at the end of |
@ljwolf I was looking at the discussions in this PR and the other related issue. The correction for the two-sided test |
Thank you for the explanation @ljwolf. I think I'm almost there/onboard! It's worth calling out explicitly this formula can result in a p-value > 1.0 which should also be handled e.g. p_sim = 0.65
nsim = 999
(p_corrected = (2*p_sim - (1/(nsim + 1))))
#> [1] 1.299
if (p_corrected > 1) {
1.0
} else {
p_corrected
}
#> [1] 1 Additionally, would you mind elaborating why it is calc_p_sim <- function(p_sim, nsim) {
(p_corrected = (2*p_sim - (1/(nsim + 1))))
if (p_corrected > 1) {
1.0
} else {
p_corrected
}
}
calc_p_sim(0.05, 49)
#> [1] 0.08
calc_p_sim(0.05, 99)
#> [1] 0.09
calc_p_sim(0.05, 999)
#> [1] 0.099 |
the directed p-value is half of the two-sided p-value, and corresponds to running the | ||
lesser and greater tests, then picking the smaller significance value. This is not advised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will be untrue if the adjustment is added
Pseudo-p values are calculated using the formula (M + 1) / (R + 1). Where R is the number of simulations | ||
and M is the number of times that the simulated value was equal to, or more extreme than the observed test statistic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: describe the adjustment here
One of 'two-sided', 'lesser', or 'greater'. Indicates the alternative hypothesis. | ||
- 'two-sided': the observed test-statistic is an extreme value of the reference distribution. | ||
- 'lesser': the observed test-statistic is small relative to the reference distribution. | ||
- 'greater': the observed test-statistic is large relative to the reference distribution. | ||
- 'directed': the observed test statistic is either small or large reltaive to the reference distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This states that the valid arguments are 'two-sided', 'lesser', or 'greater'
however there is also a directed
argument documented as well—though it is unclear to me how directed and two-sided are different.
The formula
This is based on the equation for the two-sided case @ljwolf please correct me if I'm wrong |
Yep, exactly right! @JosiahParry, think about it in terms of the percentiles, maybe that will be clearer. The correction term is to adjust for double counting the "observed" stat- if you just did 2*p_sim, that computes the number of "outside" simulated statistics, plus the observed stat twice (as if we see it at both percentile p and 1-p) However, we actually only see the observed stat once at percentile p---the two tailed boundary in the "other" tail is derived from 1-p. Like, imagine the number of simulations is 100. Sort them from smallest to largest. If 20 simulations are as big as the test statistic, then the test statistic is at the 80th percentile. At the (100-80)th percentile, there are n*(1-.8) smaller observations. So, we see 20 more extreme large observations, 1 test statistic, and 20 more extreme smaller observations as if the test statistic were in the other tail. 2p_sim(n+1) would count 20 more extreme large stats, 20 more extreme small stats, and two test statistics-one too many. So, we need to deflate that count by 1, which deflates the p value by 1/(n+1). |
for some reason this keeps reminding me of Downey's Euro Problem example, probably just because it reminds me "the statistician"'s default test is always two-sided |
This PR drafts a function
calculate_significance()
to provide a consistent way to calculate pseudo-p values from a reference distribution.It is based on the discussion at #199