Some general points about sample handling #459
Labels
Discussion
Issues to be discussed by the contributors
Infrastructure: Miscellaneous
General infrastructure issues
SOAP BPNN
SOAP BPNN experimental architecture
1. System indexing
Currently, a prediction by
SoapBpnn
produces a TensorMap where the system indices run from0, ..., N_batch
. These indices do not match the actual system indices of the systems as they are defined in the dataset.This is not especially problematic, but I noticed that the metadata checks in
TensorMapLoss
only checks the length of the samples as opposed to full equivalence, where for properties and components this is checked:To have a full equivalence, one could simply re-index the "system" sample dimension of the model predictions to match the indices of the systems in the batch. This would not be particularly heavy computationally, but might break things if the fomer convention is assumed in other places in the code.
2. Selected samples/atoms
I originally noticed the above problem when attempting to make predictions on a subset of samples using a
SOAP-BPNN
model for spherical targets. Currently, the forward methods for classesSoapBpnn
andTensorBasis
have an argumentselected_atoms
, which is passed to the SOAP calculator asselected_samples
. However, this parameter is not currently exposed to the user.This also links to issue #458 . If the spherical target is an electron density whereby independent models per block are initialised, one could in principle compute SOAP descriptors with different hypers per atom type (and utilising the
selected_samples
functionality offeatomic
to only compute, for instance, descriptors for nitrogen) that are then passed through different models with separate weights, before being joined at the output level to form a prediction on the full basis.3. Dataloader joining
The dataloaders used in the SOAP-BPNN
trainer
module uses the metatensor-learnDataloader
class. These accept keyword argumentsjoin_kwargs
that are passed to themetatensor.join
operation when minibatches are compiled. Currently, thesejoin_kwargs
are not exposed to the user, nor are any passed by default.For instance, model predictions carry around the sample dimension "
tensor
" as a consequence of the parameterremove_tensor_name=True
not being passed to the dataloader injoin_kwargs
. This also relates to point 1 above, as the samples metadata between target and prediction cannot be directly compared.Further to this, and I will use again the example of the electron density expanded on a basis, the
different_keys="union"
parameter ofmetatensor.join
is a useful argument to have control over. Often, the basis set definition between systems in a minibatch is not necessarily consistent if the atom types present in the systems is different. In this case, in order to compile a minibatch between such system, a union of the keys (assumed as for instance["o3_lambda", "o3_sigma", "center_type"]
) is required.The text was updated successfully, but these errors were encountered: