Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot assign partial charges to molecules with 1024+ atoms #164

Closed
mattwthompson opened this issue Dec 9, 2024 · 2 comments · Fixed by #165
Closed

Cannot assign partial charges to molecules with 1024+ atoms #164

mattwthompson opened this issue Dec 9, 2024 · 2 comments · Fixed by #165

Comments

@mattwthompson
Copy link
Member

mattwthompson commented Dec 9, 2024

Expected behavior

I'm not sure exactly what I expected for a maximum molecule size, but I would guess the network can handle molecules much larger than 1000 molecules.

Actual behavior

As a non-expert of this codebase, it looks like this error stems from checking a lookup table which is keyed by InChI. The toolkit cannot create InChI of molecules more than 1034 atoms long. There might be other errors later in the process, I have yet to dig into this.

openforcefield/openff-toolkit#1977

Code to reproduce the behavior

import time

import numpy

import openff.nagl
import openff.nagl_models
from openff.toolkit import Molecule
from openff.toolkit.utils.nagl_wrapper import NAGLToolkitWrapper
from openff.toolkit.utils.exceptions import EmptyInChiError

print(f"{openff.nagl.__version__=}")
print(f"{openff.nagl_models.__version__=}")


molecule = Molecule.from_smiles(341 * "C")

molecule.assign_partial_charges(
    partial_charge_method="openff-gnn-am1bcc-0.1.0-rc.3.pt",
    toolkit_registry=NAGLToolkitWrapper(),
)

Holy traceback, batman!

openff.nagl.__version__='0.5.0'
openff.nagl_models.__version__='0.3.0'
Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)
---------------------------------------------------------------------------
EmptyInChiError                           Traceback (most recent call last)
Cell In[1], line 17
     12 print(f"{openff.nagl_models.__version__=}")
     15 molecule = Molecule.from_smiles(341 * "C")
---> 17 molecule.assign_partial_charges(
     18     partial_charge_method="openff-gnn-am1bcc-0.1.0-rc.3.pt",
     19     toolkit_registry=NAGLToolkitWrapper(),
     20 )

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/topology/molecule.py:2687, in FrozenMolecule.assign_partial_charges(self, partial_charge_method, strict_n_conformers, use_conformers, toolkit_registry, normalize_partial_charges)
   2685 elif isinstance(toolkit_registry, ToolkitWrapper):
   2686     toolkit_wrapper: ToolkitWrapper = toolkit_registry
-> 2687     toolkit_wrapper.assign_partial_charges(  # type: ignore[attr-defined]
   2688         self,
   2689         partial_charge_method=partial_charge_method,
   2690         use_conformers=use_conformers,
   2691         strict_n_conformers=strict_n_conformers,
   2692         normalize_partial_charges=normalize_partial_charges,
   2693         _cls=self.__class__,
   2694     )
   2695 else:
   2696     raise InvalidToolkitRegistryError(
   2697         f"Invalid toolkit_registry passed to assign_partial_charges."
   2698         f"Expected ToolkitRegistry or ToolkitWrapper. Got  {type(toolkit_registry)}"
   2699     )

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/nagl_wrapper.py:142, in NAGLToolkitWrapper.assign_partial_charges(self, molecule, partial_charge_method, use_conformers, strict_n_conformers, normalize_partial_charges, _cls)
    136     raise ChargeMethodUnavailableError(
    137         f"Charge model {partial_charge_method} not supported by "
    138         f"{self.__class__.__name__}."
    139     ) from error
    141 model = GNNModel.load(model_path, eval_mode=True)
--> 142 charges = model.compute_property(
    143     molecule,
    144     as_numpy=True,
    145     readout_name="am1bcc_charges",
    146     check_domains=True,
    147     error_if_unsupported=True,
    148 )
    150 molecule.partial_charges = Quantity(
    151     charges.astype(float),
    152     unit.elementary_charge,
    153 )
    155 if normalize_partial_charges:

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/nagl/nn/_models.py:402, in GNNModel.compute_property(self, molecule, readout_name, as_numpy, check_domains, error_if_unsupported, check_lookup_table)
    362 def compute_property(
    363     self,
    364     molecule: "Molecule",
   (...)
    369     check_lookup_table: bool = True
    370 ):
    371     """
    372     Compute the trained property for a molecule.
    373
   (...)
    400     result: torch.Tensor or numpy.ndarray
    401     """
--> 402     properties = self.compute_properties(
    403         molecule=molecule,
    404         as_numpy=as_numpy,
    405         check_domains=check_domains,
    406         error_if_unsupported=error_if_unsupported,
    407         check_lookup_table=check_lookup_table
    408     )
    409     if readout_name is None:
    410         if len(properties) == 1:

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/nagl/nn/_models.py:201, in GNNModel.compute_properties(self, molecule, as_numpy, check_domains, error_if_unsupported, check_lookup_table)
    197 fragments, all_indices = split_up_molecule(molecule)
    198 # TODO: this assumes atom-wise properties
    199 # we should add support for bond-wise/more general properties
--> 201 results = [
    202     self._compute_properties(
    203         fragment,
    204         as_numpy=as_numpy,
    205         check_domains=check_domains,
    206         error_if_unsupported=error_if_unsupported,
    207         check_lookup_table=check_lookup_table,
    208     )
    209     for fragment in fragments
    210 ]
    212 # combine the results
    213 combined_results = {}

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/nagl/nn/_models.py:202, in <listcomp>(.0)
    197 fragments, all_indices = split_up_molecule(molecule)
    198 # TODO: this assumes atom-wise properties
    199 # we should add support for bond-wise/more general properties
    201 results = [
--> 202     self._compute_properties(
    203         fragment,
    204         as_numpy=as_numpy,
    205         check_domains=check_domains,
    206         error_if_unsupported=error_if_unsupported,
    207         check_lookup_table=check_lookup_table,
    208     )
    209     for fragment in fragments
    210 ]
    212 # combine the results
    213 combined_results = {}

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/nagl/nn/_models.py:289, in GNNModel._compute_properties(self, molecule, as_numpy, check_domains, error_if_unsupported, check_lookup_table)
    287 for property_name in expected_value_keys:
    288     try:
--> 289         value = self._check_property_lookup_table(
    290             molecule=molecule,
    291             readout_name=property_name,
    292         )
    293     except KeyError as e:
    294         logger.info(
    295             f"Could not find property in lookup table: {e}"
    296         )

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/nagl/nn/_models.py:359, in GNNModel._check_property_lookup_table(self, molecule, readout_name)
    336 """
    337 Check if the molecule is in the property lookup table.
    338
   (...)
    355     if the molecule is not in the property lookup table
    356 """
    358 table = self.lookup_tables[readout_name]
--> 359 return table.lookup(molecule)

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/nagl/lookups.py:141, in AtomPropertiesLookupTable.lookup(self, molecule)
    121 """
    122 Look up the property value for a molecule
    123
   (...)
    137     If the property value cannot be found for this molecule
    138 """
    139 from openff.toolkit.topology import Molecule
--> 141 inchi_key = molecule.to_inchi(fixed_hydrogens=True)
    142 try:
    143     entry = self.properties[inchi_key]

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/topology/molecule.py:1760, in FrozenMolecule.to_inchi(self, fixed_hydrogens, toolkit_registry)
   1732 """
   1733 Create an InChI string for the molecule using the requested toolkit backend.
   1734 InChI is a standardised representation that does not capture tautomers unless specified using the fixed
   (...)
   1756      If an invalid object is passed as the toolkit_registry parameter
   1757 """
   1759 if isinstance(toolkit_registry, ToolkitRegistry):
-> 1760     inchi = toolkit_registry.call(
   1761         "to_inchi", self, fixed_hydrogens=fixed_hydrogens
   1762     )
   1763 elif isinstance(toolkit_registry, ToolkitWrapper):
   1764     toolkit = toolkit_registry

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/toolkit_registry.py:266, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    264             for exception_type in raise_exception_types:
    265                 if isinstance(e, exception_type):
--> 266                     raise e
    267             errors.append((toolkit, e))
    269 # No toolkit was found to provide the requested capability
    270 # TODO: Can we help developers by providing a check for typos in expected method names?

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/toolkit_registry.py:262, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    260 method = getattr(toolkit, method_name)
    261 try:
--> 262     return method(*args, **kwargs)
    263 except Exception as e:
    264     for exception_type in raise_exception_types:

File ~/micromamba/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/openeye_wrapper.py:1942, in OpenEyeToolkitWrapper.to_inchi(self, molecule, fixed_hydrogens)
   1939     inchi = oechem.OEMolToSTDInChI(oemol)
   1941 if len(inchi) == 0:
-> 1942     raise EmptyInChiError(
   1943         "OEChem failed to generate an InChI for the molecule."
   1944     )
   1946 return inchi

EmptyInChiError: OEChem failed to generate an InChI for the molecule.

Current environment

  • Which version are you using? (run python -c "import openff.nagl; print(openff.nagl.__version__)") 0.5.0
  • Which version of Python (python -V)? 3.11
  • Which operating system? macOS 15.1.1
  • What is the output of pip list?
  • If you use conda, what is the output of conda list? See linked toolkit issue
@mattwthompson mattwthompson changed the title InChI reliance caps the size of molecules that can be charged Cannot assign partial charges to molecules with 1024+ atoms Dec 9, 2024
@lilyminium
Copy link
Collaborator

Huh, I didn't realize this would be an issue. Thank you -- will put in a workaround for now. However InChi might have just been the wrong choice of key if this can't generalize.

@mattwthompson
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants