gse

Go 语言高效分词, 支持英文、中文、日文等

词典用双数组 trie（Double-Array Trie）实现，分词器算法为基于词频的最短路径加动态规划, 以及 DAG 和 HMM 算法分词.

支持 HMM 分词, 使用 viterbi 算法.

支持普通、搜索引擎、全模式、精确模式和 HMM 模式多种分词模式，支持用户词典、词性标注，可运行 JSON RPC 服务。

分词速度单线程 9.2MB/s，goroutines 并发 26.8MB/s. HMM 模式单线程分词速度 3.2MB/s.（双核4线程 Macbook Pro）。

QQ 讨论群: 120563750 (仅用于讨论)

Binding:

gse-bind, binding JavaScript and other, support more language.

安装/更新

go get -u github.com/go-ego/gse

Build-tools

go get -u github.com/go-ego/re

re gse

创建一个新的 gse 程序

$ re gse my-gse

re run

运行我们刚刚创建的应用程序, CD 到程序文件夹并执行:

$ cd my-gse && re run

使用

package main

import (
	"fmt"

	"github.com/go-ego/gse"
)

var seg gse.Segmenter

func cut() {
	text := "你好世界, Hello world."

	hmm := seg.Cut(text, true)
	fmt.Println("cut use hmm: ", hmm)

	hmm = seg.CutSearch(text, true)
	fmt.Println("cut search use hmm: ", hmm)

	hmm = seg.CutAll(text)
	fmt.Println("cut all: ", hmm)
}

func segCut() {
	// 分词文本
	tb := []byte("山达尔星联邦共和国联邦政府")

	// 处理分词结果
	// 支持普通模式和搜索模式两种分词，见代码中 ToString 函数的注释。
	// 搜索模式主要用于给搜索引擎提供尽可能多的关键字
	fmt.Println("输出分词结果, 类型为字符串, 使用搜索模式: ", seg.String(tb, true))
	fmt.Println("输出分词结果, 类型为 slice: ", seg.Slice(tb))

	segments := seg.Segment(tb)
	// 处理分词结果
	fmt.Println(gse.ToString(segments))

	text1 := []byte("上海地标建筑, 东方明珠电视台塔上海中心大厦")
	segments1 := seg.Segment([]byte(text1))
	fmt.Println(gse.ToString(segments1, true))
}

func main() {
	// 加载默认字典
	seg.LoadDict()
	// 载入词典
	// seg.LoadDict("your gopath"+"/src/github.com/go-ego/gse/data/dict/dictionary.txt")

	cut()

	segCut()
}

自定义词典分词示例

package main

import (
	"fmt"

	"github.com/go-ego/gse"
)

func main() {
	var seg gse.Segmenter
	seg.LoadDict("zh,testdata/test_dict.txt,testdata/test_dict1.txt")

	text1 := []byte("所以, 你好, 再见")
	fmt.Println(seg.String(text1, true))

	segments := seg.Segment(text1)
	fmt.Println(gse.ToString(segments))
}

中文分词示例

日文分词示例

Authors

The author is vz
Maintainers
Contributors

License

Gse is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), thanks for sego and jieba.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_zh.md

README_zh.md

gse

Binding:

安装/更新

Build-tools

re gse

re run

使用

Authors

License

Files

README_zh.md

Latest commit

History

README_zh.md

File metadata and controls

gse

Binding:

安装/更新

Build-tools

re gse

re run

使用

Authors

License