Skip to content

Commit ad56b1d

Browse files
authored
Implement SparseKDE, QuickShift and add H2O-BLYP-Piglet dataset (#222)
* Add the class `SparseKDE` is located at `src/skmatter/utils/_sparsekde.py`. It mitigates the high cost of doing KDE for large datasets by doing KDE for selected data points (e.g. grid points sampled by farthest point-sampling). This class takes the original dataset as a parameter and fits the model using the sampled grid points. The corresponding tests can be found in `tests/test_neighbors.py`. * Add the class `QuickShift` in `src/skmatter/clustering/_quick_shift.py` implementing the quick shift clustering algorithm with corresponding tests in `tests/test_clustering.py`. * Add H2O-BLYP-Piglet dataset containing 27233 hydrogen bond with 3D descriptor and weights. The corresponding tests can be found in `tests/test_datasets.py` * Add two auxiliary functions of `effdim` and `oas` stored in `src/skmatter/utils/_sparsekde.py` with corresponding tests in `tests/test_neighbors.py`. * Add two distance metrics compatible with PBC, `pairwise_euclidean_distances` and `pairwise_mahalanobis_distances`, are realized and stored in `src/skmatter/metrics/pairwise.py` with corresponding tests in `tests/test_metrics.py`.
1 parent 3c784c9 commit ad56b1d

29 files changed

+2251
-1
lines changed

CHANGELOG

+7
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,13 @@ The rules for CHANGELOG file:
1616
- Updating ``FPS`` to allow a numpy array of ints as an initialize parameter (#145)
1717
- Supported Python versions are now ranging from 3.9 - 3.12.
1818
- Updating ``skmatter.datasets`` submodule to support sklearn 1.5.0 (#229)
19+
- Add `SparseKDE` class (#222)
20+
- Add `QuickShift` class (#222)
21+
- Add an example on how to conduct PAMM algorithm with `SparseKDE` and `QuickShift`
22+
(#222)
23+
- Add H2O-BLYP-Piglet dataset (#222)
24+
- Add two distance metrics that support the periodic boundry condition,
25+
`periodic_pairwise_euclidean_distances` and `pairwise_mahalanobis_distances` (#222)
1926

2027
0.2.0 (2023/08/24)
2128
------------------

docs/src/bibliography.rst

+6
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ References
66
"Principal covariates regression: Part I. Theory", Chemom. intell. lab. syst. 14
77
(1992) 155-164 https://doi.org/10.1016/0169-7439(92)80100-I
88
9+
.. [Gasparotto2014]
10+
Piero Gasparotto, Michele Ceriotti,
11+
"Recognizing molecular patterns by machine learning: An agnostic structural
12+
definition of the hydrogen bond", J. Chem. Phys., 141 (17): 174110.
13+
https://doi.org/10.1063/1.4900655.
14+
915
.. [Imbalzano2018]
1016
Giulio Imbalzano, Andrea Anelli, Daniele Giofré,Sinja Klees, Jörg Behler, and
1117
Michele Ceriotti, “Automatic selection of atomic fingerprints and reference

docs/src/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
"sphinx_toggleprompt",
5555
]
5656

57-
example_subdirs = ["pcovr", "selection", "regression", "reconstruction"]
57+
example_subdirs = ["pcovr", "selection", "regression", "reconstruction", "neighbors"]
5858
sphinx_gallery_conf = {
5959
"filename_pattern": "/*",
6060
"examples_dirs": [f"../../examples/{p}" for p in example_subdirs],

docs/src/references/clustering.rst

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Clustering
2+
==========
3+
4+
.. automodule:: skmatter.clustering
5+
6+
.. _quick-shift-api:
7+
8+
Quick Shift
9+
------------
10+
11+
.. autoclass:: skmatter.clustering.QuickShift

docs/src/references/datasets.rst

+2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ Datasets
55

66
.. include:: ../../../src/skmatter/datasets/descr/degenerate_CH4_manifold.rst
77

8+
.. include:: ../../../src/skmatter/datasets/descr/h2o-blyp-piglet.rst
9+
810
.. include:: ../../../src/skmatter/datasets/descr/nice_dataset.rst
911

1012
.. include:: ../../../src/skmatter/datasets/descr/who_dataset.rst

docs/src/references/index.rst

+2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ API Reference
1010
preprocessing
1111
selection
1212
linear_models
13+
clustering
1314
decomposition
1415
metrics
16+
neighbors
1517
datasets
1618
utils

docs/src/references/metrics.rst

+15
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,18 @@ Component-wise Prediction Rigidity
4040
----------------------------------
4141

4242
.. autofunction:: skmatter.metrics.componentwise_prediction_rigidity
43+
44+
45+
.. _pairwise-euclidian-api:
46+
47+
Pairwise Euclidean Distances
48+
----------------------------
49+
50+
.. autofunction:: skmatter.metrics.periodic_pairwise_euclidean_distances
51+
52+
.. _pairwise-mahalanobis-api:
53+
54+
Pairwise Mahalanobis Distance
55+
-----------------------------
56+
57+
.. autofunction:: skmatter.metrics.pairwise_mahalanobis_distances

docs/src/references/neighbors.rst

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Neighbors
2+
=========
3+
4+
.. automodule:: skmatter.neighbors
5+
6+
.. _sparse-kde-api:
7+
8+
Sparse Kernel Density Estimation
9+
--------------------------------
10+
11+
.. autoclass:: skmatter.neighbors.SparseKDE
12+
:show-inheritance:
13+
14+
.. automethod:: fit
15+
.. automethod:: score_samples
16+
.. automethod:: score

docs/src/references/utils.rst

+11
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,14 @@ Random Partitioning with Overlaps
3030
---------------------------------
3131

3232
.. autofunction:: skmatter.model_selection.train_test_split
33+
34+
35+
Effective Dimension of Covariance Matrix
36+
----------------------------------------
37+
38+
.. autofunction:: skmatter.utils.effdim
39+
40+
Oracle Approximating Shrinkage
41+
------------------------------
42+
43+
.. autofunction:: skmatter.utils.oas

docs/src/tutorials.rst

+1
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@
66
examples/selection/index
77
examples/regression/index
88
examples/reconstruction/index
9+
examples/neighbors/index

examples/neighbors/README.rst

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Neighbors
2+
=========

0 commit comments

Comments
 (0)