Fast Keyword extraction using Aho–Corasick algorithm and Tries.
Flash is a Golang reimplementation of Flashtext,
This is meant to be used when you have a large number of words that you want to:
- extract from text
- search and replace
Flash is meant as a replacement for Regex, which in such cases can be extremely slow.
import "github.com/dav009/flash"
words := flash.NewKeywords()
words.Add("New York")
words.Add("Hello")
words.Add("Tokyo")
foundKeywords := words.Extract("New York and Tokyo are Cities")
fmt.Println(foundKeywords)
// [New York, Tokyo]
As a reference using go-flash with 10K keywords in a 1000 sentence text, took 7.3ms, while using regexes took 1minute 37s.
Sentences | Keywords | String.Contains | Regex | Go-Flash |
---|---|---|---|---|
1000 | 10K | 1.0035s | 1min 37s | 2.72ms |
This is a toy-project for me to get more familiar with Golang Please be-aware of potential issues.