Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

cometkim / unicode-segmenter Public

Notifications You must be signed in to change notification settings
Fork 1
Star 69

Code
Issues 2
Pull requests 1
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Releases: cometkim/unicode-segmenter

Releases · cometkim/unicode-segmenter

[email protected]

06 Mar 19:56

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected] Latest

Latest

Minor Changes

21cd789: Removed deprecated APIs
- searchGrapheme in unicode-segmenter/grapheme
- takeChar and takeCodePoint in unicode-segmenter/utils
Which are used internally before, but never from outside.
483d258: Reduced bundle size, while keeping the best perf

Some details:
- Refactored to use the same code path internally as possible.
- Removed pre-computed jump table, the optimization were compensated for by other perf improvements.
- Previous array layout to avoid accidental de-opt turned out to be overkill. The regular tuple array is well optimized, so I fall back to using good old plain binary search.
- Some experiments like new encoding and eytzinger layout for more aggressive improvements, but no success.

Assets 2

Loading

malangcat reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

[email protected]

07 Dec 18:45

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Patch Changes

a5f486f: Fix bloat in the NPM package.

package.tgz was mostly bloated by CommonJS interop and sourcemap.

However, sourcemap isn't necessary here as it uses sources as is,
and the CommonJS shouldn't be different.

Now fixed by simpler transpilation for CommoJS entries, and removed sourcemap files.
Also removed inaccessible entries.

So the unpacked total package size has been down to 135 KB from 250 KB

Note: Node.js v22 will stabilize require(ESM), which will allow CommonJS projects to use this package without having to maintain separate entries. I'm very excited about that, and looking forward to it becoming more "common". The first major release may consider ending support for CommonJS entries and TypeScript's "Node" resolution.

Assets 2

Loading

All reactions

[email protected]

29 Nov 17:09

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Patch Changes

94ed937: Improved perf and bundle size a bit

It seems using TypedArray isn't helpful,
and deref many prototypes may cause deopt.

Array is good enough while it ensures it's packed.
de71269: Update Intl type definition

Assets 2

Loading

All reactions

[email protected]

24 Nov 03:23

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Patch Changes

9d688d8: grapheme: rename countGrapheme() to countGraphemes(). existing name is deprecated alias.
be49399: grapheme: Add splitGraphemes() utility
5e86659: grapheme: add more detail to API JSDoc

Assets 2

Loading

All reactions

[email protected]

02 Nov 21:20

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Minor Changes

ffb41fb: Code size is signaficantly reduced, minified JS now works in half

There are also some performance improvements.
Not that much, but getting improvement on size without giving it up is a huge win.
- Compress Unicode data more in Base36
- Changed the internal representation into TypedArray to improve its access pattern.
- Shrank the grapheme lookup table size.
  This does not impact performance except for some edges like Hindi and Demonic, but it does reduce the bundle size.
9e0feca: Update to Unicode® 16.0.0

Assets 2

Loading

All reactions

[email protected]

02 Sep 18:07

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Patch Changes

3665cf7: Fix Hindi text segmentation

Assets 2

Loading

All reactions

[email protected]

01 Sep 03:56

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Minor Changes

73f5e6b: Significantly reduced bundle size by compressing data table. So the grapheme segmentation library is only takes 6.6kB (gzip) or 4.4kB (brotli)!

Patch Changes

b045320: Fix isSMP, and add more plane utils (isSIP, isTIP, isSSP)

Assets 2

Loading

malangcat reacted with heart emoji

All reactions

❤️ 1 reaction

1 person reacted

[email protected]

05 Jul 05:54

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Patch Changes

447b484: Fix polyfill to do not override existing, and also to be assigned as non-enumerable

Assets 2

Loading

All reactions

[email protected]

14 Jun 02:26

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Patch Changes

04fe2fc: Fix sourcemap reference error
- Include missing sourcemap files for transformed cjs entries
- Remove unnecessary transforms for esm entries and remove source map reference

Assets 2

Loading

All reactions

[email protected]

13 Jun 19:29

[email protected]

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

[email protected]

Minor Changes

657e31a: semi-breaking: removed _cat from grapheme cluster segments because it was useless

Instead, added _catBegin and _catEnd as beginning/end category of segments, which are possibly useful to infer applied boundary rules.

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.