Using Wav2Vec2ProcessorWithLM with kenlm resulting in mismatch 70 != 72 vocab size #4

daniel8an · 2022-10-20T18:13:12Z

Hi,

I was trying to integrate my trained kenlm model along wav2vec2-large-xlsr-53-th and encountered mismatch in vocab size 70 != 72.
The mismatched characters are the start and end tokens ~~and~~ .
When trying to load some other pre-trained model I haven't encountered this issue.

I have built the decoder with build_ctcdecoder from pyctcdecode

Any leads? Thank you very much

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Wav2Vec2ProcessorWithLM with kenlm resulting in mismatch 70 != 72 vocab size #4

Using Wav2Vec2ProcessorWithLM with kenlm resulting in mismatch 70 != 72 vocab size #4

daniel8an commented Oct 20, 2022

Using Wav2Vec2ProcessorWithLM with kenlm resulting in mismatch 70 != 72 vocab size #4

Using Wav2Vec2ProcessorWithLM with kenlm resulting in mismatch 70 != 72 vocab size #4

Comments

daniel8an commented Oct 20, 2022