Skip to content
/ flash Public

Golang Keyword extraction/replacement Datastructure using Tries instead of regexes

Notifications You must be signed in to change notification settings

dav009/flash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flash

Fast Keyword extraction using Aho–Corasick algorithm and Tries.

Flash is a Golang reimplementation of Flashtext,

This is meant to be used when you have a large number of words that you want to:

  • extract from text
  • search and replace

Flash is meant as a replacement for Regex, which in such cases can be extremely slow.

Usage

import "github.com/dav009/flash"

words := flash.NewKeywords()
words.Add("New York")
words.Add("Hello")
words.Add("Tokyo")
foundKeywords := words.Extract("New York and Tokyo are Cities")
fmt.Println(foundKeywords)
// [New York, Tokyo]

Benchmarks

As a reference using go-flash with 10K keywords in a 1000 sentence text, took 7.3ms, while using regexes took 1minute 37s.

Sentences Keywords String.Contains Regex Go-Flash
1000 10K 1.0035s 1min 37s 2.72ms

Warning

This is a toy-project for me to get more familiar with Golang Please be-aware of potential issues.

About

Golang Keyword extraction/replacement Datastructure using Tries instead of regexes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages