We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inspired by a SO post, I propose to make textstat_editdist(), whose output will be similar to
textstat_editdist()
toks <- tokens(data_corpus_inaugural) dat <- data.frame(feature = types(toks), dist = stringdist::stringdist("the", types(toks))) head(dat[order(dat$dist),], 10) feature dist 3 the 0 188 they 1 343 he 1 347 The 1 387 them 1 548 then 1 4223 she 1 4261 tie 1 7087 She 1 7683 thee 1
stringidist seems fairly fast
stringidist
> microbenchmark::microbenchmark( + stringdist::stringdist("the", head(types(toks) , 1000)), + stringdist::stringdist("the", head(types(toks) , 10000)) + ) Unit: microseconds expr min lq mean median uq max neval stringdist::stringdist("the", head(types(toks), 1000)) 241.5 258.95 306.87 271.85 315.95 946.1 100 stringdist::stringdist("the", head(types(toks), 10000)) 564.1 621.00 702.29 651.65 686.40 1708.9 100 ```r Seems to work with non-ASCII characters
stringdist::stringdist("世界人権宣言", "世界平和宣言") [1] 2
The text was updated successfully, but these errors were encountered:
koheiw
No branches or pull requests
Inspired by a SO post, I propose to make
textstat_editdist()
, whose output will be similar tostringidist
seems fairly fastThe text was updated successfully, but these errors were encountered: