GitHub - zhongbin1/bert_tokenization_for_java: This is a java version of Chinese tokenization descried in BERT.

This is a java version of Chinese tokenization descried in BERT, including basic tokenization and wordpiece tokenization.

Motivation

In production, we usually deploy the BERT related model by tensorflow serving for high performance and flexibility. However, our application may not developed by python. Hence, we have to rewrite the tokenization module.

Usage

Just run Demo.java, you can get result. Now, it support single and pair sentence both.

Moreover, for Chinese natural language processing, we add full turn to half angle and uppercase to lowercase operation.

Reporting issues

Please let me know, if you encounter any problems.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
BasicTokenizer.java		BasicTokenizer.java
Demo.java		Demo.java
FullTokenizer.java		FullTokenizer.java
LICENSE		LICENSE
Preprocess.java		Preprocess.java
README.md		README.md
WordpieceTokenizer.java		WordpieceTokenizer.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motivation

Usage

Reporting issues

About

Releases

Packages

Contributors 2

Languages

License

zhongbin1/bert_tokenization_for_java

Folders and files

Latest commit

History

Repository files navigation

Motivation

Usage

Reporting issues

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages