Skip to content

Commit 667c741

Browse files
committed
types-grammar: tweaked note about Twitter and Unicode length counting
1 parent c1e01fc commit 667c741

File tree

2 files changed

+8
-6
lines changed

2 files changed

+8
-6
lines changed

types-grammar/ch1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ For example, the Unicode code point `127878` (hexadecimal `1F386`) is `🎆` (fi
262262
263263
This has implications on the length of strings, because a single visible character like the `🎆` fireworks symbol, when in a JS string, is a counted as 2 characters for the purposes of the string length!
264264
265-
We'll revisit Unicode characters in a bit, and then cover more accurately computing string length in Chapter 2.
265+
We'll revisit Unicode characters in a bit, and then cover the challenges of computing string length in Chapter 2.
266266
267267
### Escape Sequences
268268

types-grammar/ch2.md

+7-5
Original file line numberDiff line numberDiff line change
@@ -249,13 +249,13 @@ thumbsDown.length; // 4 -- oops!
249249

250250
As you can see, these are two distinct code-points (not a surrogate pair) that, by virtue of their ordering and adjacency, cause the computer's Unicode rendering to draw the thumbs-down symbol but with a darker skin tone than its default. The computed string length is thus `2`.
251251

252-
| WARNING: |
253-
| :--- |
254-
| As a Twitter user, you might expect to be able to put 280 thumbs-down emoji into a single tweet, since it looks like a single character. But Twitter counts each such emoji as two characters, so you only get 140. Surprisingly, twitter counts the `"👎"` (default thumbs-down), `"👎🏾"` (dark-skin tone thumbs-down), and even the `"👩‍👩‍👦‍👦"` (family emoji grapheme cluster) all as two characters each, even though their string lengths (from JS's perspective) are `2`, `4`, and `7`, respectively. Twitter must have some sort of custom Unicode handling implemented in the tools. |
255-
256252
It would take replicating most of a platform's complex Unicode rendering logic to be able to recognize such clusters of code-points as a single "character" for length-counting sake. There are libraries that purport to do so, but they're not necessarily perfect, and they come at a hefty cost in terms of extra code.
257253

258-
Counting the "length" of a string to match our human intuitions is a remarkably challenging task. We can get acceptable approximations in many cases, but there's plenty of other cases that confound our programs.
254+
| NOTE: |
255+
| :--- |
256+
| As a Twitter user, you might expect to be able to put 280 thumbs-down emoji into a single tweet, since it looks like a single character. Twitter counts the `"👎"` (default thumbs-down), the `"👎🏾"` (medium-dark-skintone thumbs-down), and even the `"👩‍👩‍👦‍👦"` (family emoji grapheme cluster) all as 2 characters each, even though their respective string lengths (from JS's perspective) are `2`, `4`, and `7`; thus, you can only fit half the number of emojis (140 instead of 280) in a tweet. In fact, Twitter implemented this change in 2018 to specifically level the counting of all Unicode characters, at 2 characters per symbol. [^TwitterUnicode] That was a welcomed change for Twitter users, especially those who want to use emoji characters that are most representative of intended gender, skintone, etc. Still, it *is* curious that the choice was made to count the symbols as 2 characters each, instead of the more intuitive 1 character each. |
257+
258+
Counting the *length* of a string to match our human intuitions is a remarkably challenging task, perhaps more of an art than a science. We can get acceptable approximations in many cases, but there's plenty of other cases that may confound our programs.
259259

260260
### String Concatenation
261261

@@ -332,3 +332,5 @@ The following string utility functions are proviced directly on the `String` obj
332332
## Number Behaviors
333333

334334
// TODO
335+
336+
[^TwitterUnicode]: "New update to the Twitter-Text library: Emoji character count"; Andy Piper; Oct 2018; https://twittercommunity.com/t/new-update-to-the-twitter-text-library-emoji-character-count/114607 ; Accessed July 2022

0 commit comments

Comments
 (0)