Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/new default xi #800

Merged
merged 6 commits into from
Feb 3, 2025
Merged

Conversation

Jacks0nJ
Copy link
Collaborator

Changed the default value of xi used in sklearn's OPTIC algorithm from 0.05 to 0.15. The value of xi approximately controls the size of the clusters, with a small xi leading to larger clusters and a larger xi leading to smaller clusters. While 0.05 is the standard value, as recommended in the original OPTICS paper, this value can incorrectly include obvious outliers when the size of each cluster is very small, as often occurs in Zooniverse projects.

The value of 0.15 was chosen after tests with the real data from PRINT project found that obvious outliers (by visual inspection) where identified, while minimising the differences with the previous value of 0.05.

Note that this branch uses the updated OPTICS algorithm, where the _predecessor_correction function had a bug fixed. This bug in fact inadvertently helped remove outliers, including in the unit tests used here. This is how the problem with a too low xi value for the use case here was first identified.

@Jacks0nJ Jacks0nJ requested a review from CKrawczyk January 20, 2025 17:47
Copy link
Collaborator

@CKrawczyk CKrawczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small change to make sure the unit test is using the new defaults defined in the code rather than overwriting them.

@@ -493,7 +493,7 @@
'parameters': {
'min_samples': 'auto',
'max_eps': None,
'xi': 0.11,
'xi': 0.15,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just remove 'xi' from this dictionary altogether, that way it is testing against the default as set in the code. Same for all the other instances below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed all instances of specifying xi in this test, such that the default is used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added the value of xi back in to the comparison dictionary, as otherwise it was failing the unit tests.

Copy link
Collaborator Author

@Jacks0nJ Jacks0nJ Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem was found where the new default xi was not being implemented properly in the unit tests. Something is not being passed as expected. Therefore changed back to specifying xi in the unit tests for now, but the underlying problem needs investigating.

Copy link
Collaborator

@CKrawczyk CKrawczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes all look good, merge when you are ready.

@Jacks0nJ Jacks0nJ merged commit 1415dd9 into feature/update-dependencies Feb 3, 2025
4 checks passed
@Jacks0nJ Jacks0nJ deleted the feature/new_default_xi branch February 3, 2025 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants