You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related to #167 lt-proc -we analysis currently gives up after a certain amount of states. But if we're doing case-insensitive matching, could it not fallback to trying a match on the lowercased word on reaching MAX_COMBINATIONS exceeded? That seems like it might catch the 90% case:
$ echo HJERTERYTMEOVERVÅKNING |lt-proc -we nob.automorf.bin
Warning: matching case-sensitively since processor state size >= 65536
Warning: compoundAnalysis's MAX_COMBINATIONS exceeded for 'HJERTERYTMEOVERVÅKNING' gave up at char 15 'V'.^HJERTERYTMEOVERVÅKNING/*HJERTERYTMEOVERVÅKNING$$ echo hjerterytmeovervåkning |lt-proc -we nob.automorf.bin^hjerterytmeovervåkning/hjerterytmeovervåkning<n><m><sg><ind>/hjerterytmeovervåkning<n><f><sg><ind>$$ echo HJERTEOVERVÅKNING |lt-proc -we nob.automorf.binWarning: matching case-sensitively since processor state size >= 65536Warning: compoundAnalysis's MAX_COMBINATIONS exceeded for'HJERTEOVERVÅKNING'
gave up at char 15 'N'.
^HJERTEOVERVÅKNING/*HJERTEOVERVÅKNING$
$ echo hjerteovervåkning |lt-proc -we nob.automorf.bin
^hjerteovervåkning/hjerte<n><nt><sg><ind><cmp>+overvåkning<n><m><sg><ind>/hjerte<n><nt><sg><ind><cmp>+overvåkning<n><f><sg><ind>$
It might lead to incomplete analyses, if e.g. «HJERTE» was in the dictionary as an <np> then we miss out on /HJERTE<np><cmp>+overvåkning<n><m><sg><ind>, but I don't think it would lead to (otherwise) wrong analyses.
The text was updated successfully, but these errors were encountered:
I could see perhaps an issue getting compounds of proper names that we wouldn't want. But it might be a good idea to implement it, apply it to a corpus and see what happens.
Related to #167
lt-proc -we
analysis currently gives up after a certain amount of states. But if we're doing case-insensitive matching, could it not fallback to trying a match on the lowercased word on reaching MAX_COMBINATIONS exceeded? That seems like it might catch the 90% case:It might lead to incomplete analyses, if e.g. «HJERTE» was in the dictionary as an
<np>
then we miss out on/HJERTE<np><cmp>+overvåkning<n><m><sg><ind>
, but I don't think it would lead to (otherwise) wrong analyses.The text was updated successfully, but these errors were encountered: