Introduction

chardet: Go character encoding detector

Introduction

This is a Go port of the python's chardet library. Much respect and appreciation to the original authors for their excellent work.

chardet is a character encoding detector library written in Go. It helps you automatically detect the character encoding of text content.

Installation

To install chardet, use go get:

go get github.com/wlynxg/chardet

Supported Encodings & Languages

Support Encodings:

Expand the list of supported encodings

Ascii
UTF-8
UTF-8-SIG
UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32BE
UTF-32LE
GB2312
HZ-GB-2312
SHIFT_JIS
Big5
Johab
KOI8-R
TIS-620
MacCyrillic
MacRoman
EUC-TW
EUC-KR
EUC-JP
CP932
CP949
Windows-1250
Windows-1251
Windows-1252
Windows-1253
Windows-1254
Windows-1255
Windows-1256
Windows-1257
ISO-8859-1
ISO-8859-2
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-8859-13
ISO-2022-CN
ISO-2022-JP
ISO-2022-KR
X-ISO-10646-UCS-4-3412
X-ISO-10646-UCS-4-2143
IBM855
IBM866

Support Languages:

Expand the list of supported languages

- Chinese - Japanese - Korean - Hebrew - Russian - Greek - Bulgarian - Thai - Turkish

Usage

Basic Usage

The simplest way to use chardet is with the Detect function:

package main

import (
	"fmt"
	"github.com/wlynxg/chardet"
)

func main() {
	data := []byte("Your text data here...")
	result := chardet.Detect(data)
	fmt.Printf("Detected result: %+v\n", result) 
    //Output: Detected result: {Encoding:Ascii Confidence:1 Language:}
}

Advanced Usage

For handling large amounts of text, you can use the detector incrementally. This allows the detector to stop as soon as it reaches sufficient confidence in its result.

package main

import (
	"fmt"
	"github.com/wlynxg/chardet"
)

func main() {
	// Create a detector instance
	detector := chardet.NewUniversalDetector(0)
	// Process text in chunks
	chunk1 := []byte("First chunk of text...")
	chunk2 := []byte("Second chunk of text...")
	detector.Feed(chunk1)
	detector.Feed(chunk2)
	// Get the result
	result := detector.GetResult()
	fmt.Printf("Detected result: %+v\n", result)
	// Output: Detected result: {Encoding:Ascii Confidence:1 Language:}
}

Processing Multiple Files

You can reuse the same detector instance for multiple files by using the Reset() method:

package main

import (
	"fmt"
	"os"
	"github.com/wlynxg/chardet"
)

func main() {
	detector := chardet.NewUniversalDetector(0)
	files := []string{"file1.txt", "file2.txt"}
	for _, file := range files {
		detector.Reset()
		data, err := os.ReadFile(file)
		if err != nil {
			continue
		}
		detector.Feed(data)
		result := detector.GetResult()
		fmt.Printf("File %s encoding: %+v\n", file, result)
	}
}

License

chardet is licensed under the MIT License, 100% free and open-source, forever.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
cda		cda
consts		consts
probe		probe
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chardet.go		chardet.go
detector.go		detector.go
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chardet: Go character encoding detector

Introduction

Installation

Supported Encodings & Languages

Usage

Basic Usage

Advanced Usage

Processing Multiple Files

License

About

Releases 1

Packages

Languages

License

wlynxg/chardet

Folders and files

Latest commit

History

Repository files navigation

chardet: Go character encoding detector

Introduction

Installation

Supported Encodings & Languages

Usage

Basic Usage

Advanced Usage

Processing Multiple Files

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages