SMILES tokenizer ids #2

elemets · 2020-09-23T10:46:35Z

Hi,

Thanks for this project, it looks like it could be really helpful. Sorry if this is a stupid question but I was wondering, once I've tokenized a set of SMILES using the pre-trained SMILES model how would I get the token ids?

Thanks
A

lianghsun · 2021-01-12T03:21:17Z

There is a way you can get the ID for each SMILES

test_spe_word = spe.tokenize(...)
for word in test_spe_word.split(' '):
    print(spe.bpe_codes[spe.bpe_codes_reverse[word]]) # output ID

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMILES tokenizer ids #2

SMILES tokenizer ids #2

elemets commented Sep 23, 2020

lianghsun commented Jan 12, 2021

SMILES tokenizer ids #2

SMILES tokenizer ids #2

Comments

elemets commented Sep 23, 2020

lianghsun commented Jan 12, 2021