how can corenlp handle non-ascii string? #31

wxbks · 2015-09-08T18:15:15Z

I put the word 'Víctor' into corenlp.parse. 'Víctor' contains non-ascii character. I would like to get the lemma of 'Víctor'. But when I put corenlp.parse('Víctor'). It gives error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128).

How can I change corenlp setting, so corenlp can handle non-ascii string?

cuzzo · 2016-02-21T04:13:19Z

Hey @sicongkuang,

You could try using something like unidecode on your input string firstt. At least, I ran into a similar error and that fixed the problem.

Hope it works for you.

Cheers,

wxbks · 2016-02-22T09:05:49Z

Hey @cuzzo ,
Thank you so much! Yes, that solved my problem. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how can corenlp handle non-ascii string? #31

how can corenlp handle non-ascii string? #31

wxbks commented Sep 8, 2015

cuzzo commented Feb 21, 2016

wxbks commented Feb 22, 2016

how can corenlp handle non-ascii string? #31

how can corenlp handle non-ascii string? #31

Comments

wxbks commented Sep 8, 2015

cuzzo commented Feb 21, 2016

wxbks commented Feb 22, 2016