-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/new default xi #800
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small change to make sure the unit test is using the new defaults defined in the code rather than overwriting them.
@@ -493,7 +493,7 @@ | |||
'parameters': { | |||
'min_samples': 'auto', | |||
'max_eps': None, | |||
'xi': 0.11, | |||
'xi': 0.15, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just remove 'xi'
from this dictionary altogether, that way it is testing against the default as set in the code. Same for all the other instances below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all instances of specifying xi in this test, such that the default is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added the value of xi back in to the comparison dictionary, as otherwise it was failing the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem was found where the new default xi
was not being implemented properly in the unit tests. Something is not being passed as expected. Therefore changed back to specifying xi
in the unit tests for now, but the underlying problem needs investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes all look good, merge when you are ready.
Changed the default value of xi used in sklearn's OPTIC algorithm from 0.05 to 0.15. The value of xi approximately controls the size of the clusters, with a small xi leading to larger clusters and a larger xi leading to smaller clusters. While 0.05 is the standard value, as recommended in the original OPTICS paper, this value can incorrectly include obvious outliers when the size of each cluster is very small, as often occurs in Zooniverse projects.
The value of 0.15 was chosen after tests with the real data from PRINT project found that obvious outliers (by visual inspection) where identified, while minimising the differences with the previous value of 0.05.
Note that this branch uses the updated OPTICS algorithm, where the
_predecessor_correction
function had a bug fixed. This bug in fact inadvertently helped remove outliers, including in the unit tests used here. This is how the problem with a too low xi value for the use case here was first identified.