Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exported nanopet models from output/ directory yield inconsistent predictions. #495

Open
bananenpampe opened this issue Feb 19, 2025 · 2 comments
Labels
Bug Something isn't working NanoPET Nanopet model experimental architecture

Comments

@bananenpampe
Copy link
Collaborator

Metatrain-version: Metatrain version: 0.1.dev321+g028efa5

I have trained a nanopet model following the example in the mtt examples folder:

# architecture used to train the model
architecture:
  name: experimental.nanopet
  training:
    num_epochs: 50 # a very short training run
    log_interval: 1
    checkpoint_interval: 10


# Mandatory section defining the parameters for system and target data of the
# training set
training_set:
  systems: "qm9_reduced_100.xyz" # file where the positions are stored
  targets:
    energy:
      key: "U0" # name of the target value
      unit: "eV" # unit of the target value

test_set: 0.1 # 10 % of the training_set are randomly split and taken for test set
validation_set: 0.1 # 10 % of the training_set are randomly split and for validation

After the training is completed, the model gets exported to model.pt and model.ckpt.
When calling mtt eval model.pt eval.yam I obtain the following evaluation errors:

[2025-02-19 13:06:02][INFO] - Package directory: /home/kellner/packages/metatrain/src/metatrain
[2025-02-19 13:06:02][INFO] - Working directory: /home/kellner/example
[2025-02-19 13:06:02][INFO] - Metatrain version: 0.1.dev321+g028efa5
[2025-02-19 13:06:02][INFO] - Executed command: mtt eval model.pt eval.yaml
[2025-02-19 13:06:02][INFO] - Setting up evaluation set.
[2025-02-19 13:06:02][INFO] - Evaluating dataset
[2025-02-19 13:06:03][WARNING] - No forces found in section 'U0'.
[2025-02-19 13:06:03][WARNING] - No stress found in section 'U0'.
[2025-02-19 13:06:03][INFO] - Running on device cuda with dtype torch.float64
[2025-02-19 13:06:07][INFO] - energy RMSE (per atom): 1.6784 meV, energy MAE (per atom): 1.3701 meV
[2025-02-19 13:06:07][INFO] - evaluation time: 0.47 s [0.5323 ± 0.2316 ms per atom]

When manually exporting one of the training checkpoints via:
mtt export ./outputs/2025-02-19/13-05-18/model_40.ckpt -o from_output_dir.pt
And then evaluating the model:
mtt eval from_output_dir.pt eval.yaml

I obtain these evaluation errors:

[2025-02-19 13:06:59][INFO] - Package directory: /home/kellner/packages/metatrain/src/metatrain
[2025-02-19 13:06:59][INFO] - Working directory: /home/kellner/example
[2025-02-19 13:06:59][INFO] - Metatrain version: 0.1.dev321+g028efa5
[2025-02-19 13:06:59][INFO] - Executed command: mtt eval from_output_dir.pt eval.yaml
[2025-02-19 13:07:00][INFO] - Setting up evaluation set.
[2025-02-19 13:07:00][INFO] - Evaluating dataset
[2025-02-19 13:07:00][WARNING] - No forces found in section 'U0'.
[2025-02-19 13:07:00][WARNING] - No stress found in section 'U0'.
[2025-02-19 13:07:00][INFO] - Running on device cuda with dtype torch.float64
[2025-02-19 13:07:04][INFO] - energy RMSE (per atom): 22589.3 meV, energy MAE (per atom): 21523.5 meV
[2025-02-19 13:07:04][INFO] - evaluation time: 0.47 s [0.5331 ± 0.2332 ms per atom]
@bananenpampe bananenpampe added the Bug Something isn't working label Feb 19, 2025
@bananenpampe
Copy link
Collaborator Author

The same thing happens when I export the final model.ckpt manually:
mtt export model.ckpt -o from_output_dir.pt

[2025-02-19 13:13:52][INFO] - Package directory: /home/kellner/packages/metatrain/src/metatrain
[2025-02-19 13:13:52][INFO] - Working directory: /home/kellner/example
[2025-02-19 13:13:52][INFO] - Metatrain version: 0.1.dev321+g028efa5
[2025-02-19 13:13:52][INFO] - Executed command: mtt eval from_output_dir.pt eval.yaml
[2025-02-19 13:13:52][INFO] - Setting up evaluation set.
[2025-02-19 13:13:52][INFO] - Evaluating dataset
[2025-02-19 13:13:52][WARNING] - No forces found in section 'U0'.
[2025-02-19 13:13:52][WARNING] - No stress found in section 'U0'.
[2025-02-19 13:13:52][INFO] - Running on device cuda with dtype torch.float64
[2025-02-19 13:13:56][INFO] - energy RMSE (per atom): 22590.3 meV, energy MAE (per atom): 21524.5 meV
[2025-02-19 13:13:56][INFO] - evaluation time: 0.47 s [0.5307 ± 0.2303 ms per atom]

@bananenpampe
Copy link
Collaborator Author

I have a suspicion that this is somewhere in the compisiton models, as for scalar atom wise properties without composition models this is not an issue

@Luthaf Luthaf added the NanoPET Nanopet model experimental architecture label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working NanoPET Nanopet model experimental architecture
Projects
None yet
Development

No branches or pull requests

2 participants