Skip to content

Commit fd72dda

Browse files
committed
upgrade to Unicode 8.0.0
1 parent 73dadc4 commit fd72dda

File tree

11 files changed

+25079
-22656
lines changed

11 files changed

+25079
-22656
lines changed

Doc/library/stdtypes.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,7 @@ Notes:
354354
The numeric literals accepted include the digits ``0`` to ``9`` or any
355355
Unicode equivalent (code points with the ``Nd`` property).
356356

357-
See http://www.unicode.org/Public/7.0.0/ucd/extracted/DerivedNumericType.txt
357+
See http://www.unicode.org/Public/8.0.0/ucd/extracted/DerivedNumericType.txt
358358
for a complete list of code points with the ``Nd`` property.
359359

360360

Doc/library/unicodedata.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515

1616
This module provides access to the Unicode Character Database (UCD) which
1717
defines character properties for all Unicode characters. The data contained in
18-
this database is compiled from the `UCD version 7.0.0
19-
<http://www.unicode.org/Public/7.0.0/ucd>`_.
18+
this database is compiled from the `UCD version 8.0.0
19+
<http://www.unicode.org/Public/8.0.0/ucd>`_.
2020

2121
The module uses the same names and symbols as defined by Unicode
2222
Standard Annex #44, `"Unicode Character Database"
@@ -166,6 +166,6 @@ Examples:
166166

167167
.. rubric:: Footnotes
168168

169-
.. [#] http://www.unicode.org/Public/7.0.0/ucd/NameAliases.txt
169+
.. [#] http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt
170170
171-
.. [#] http://www.unicode.org/Public/7.0.0/ucd/NamedSequences.txt
171+
.. [#] http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt

Doc/reference/lexical_analysis.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ The Unicode category codes mentioned above stand for:
311311
* *Nd* - decimal numbers
312312
* *Pc* - connector punctuations
313313
* *Other_ID_Start* - explicit list of characters in `PropList.txt
314-
<http://www.unicode.org/Public/7.0.0/ucd/PropList.txt>`_ to support backwards
314+
<http://www.unicode.org/Public/8.0.0/ucd/PropList.txt>`_ to support backwards
315315
compatibility
316316
* *Other_ID_Continue* - likewise
317317

@@ -727,4 +727,4 @@ occurrence outside string literals and comments is an unconditional error::
727727

728728
.. rubric:: Footnotes
729729

730-
.. [#] http://www.unicode.org/Public/7.0.0/ucd/NameAliases.txt
730+
.. [#] http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt

Doc/whatsnew/3.5.rst

+7
Original file line numberDiff line numberDiff line change
@@ -755,6 +755,13 @@ urllib
755755
control the encoding of query parts if needed. (Contributed by Samwyse and
756756
Arnon Yaari in :issue:`13866`.)
757757

758+
unicodedata
759+
-----------
760+
761+
* The :mod:`unicodedata` module now uses data from `Unicode 8.0.0
762+
<http://unicode.org/versions/Unicode8.0.0/>`_.
763+
764+
758765
wsgiref
759766
-------
760767

Lib/test/test_unicodedata.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
class UnicodeMethodsTest(unittest.TestCase):
2222

2323
# update this, if the database changes
24-
expectedchecksum = '618e2c1a22ee79d2235319709f16c50f987ee21f'
24+
expectedchecksum = '5971760872b2f98bb9c701e6c0db3273d756b3ec'
2525

2626
def test_method_checksum(self):
2727
h = hashlib.sha1()
@@ -81,7 +81,7 @@ class UnicodeFunctionsTest(UnicodeDatabaseTest):
8181

8282
# Update this if the database changes. Make sure to do a full rebuild
8383
# (e.g. 'make distclean && make') to get the correct checksum.
84-
expectedchecksum = '585302895deead0c1c8478c51da9241d4efedca9'
84+
expectedchecksum = '5e74827cd07f9e546a30f34b7bcf6cc2eac38c8c'
8585
def test_function_checksum(self):
8686
data = []
8787
h = hashlib.sha1()

Misc/NEWS

+2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Release date: 2015-07-05
1010
Core and Builtins
1111
-----------------
1212

13+
- Upgrade to Unicode 8.0.0.
14+
1315
- Issue #24345: Add Py_tp_finalize slot for the stable ABI.
1416

1517
- Issue #24400: Introduce a distinct type for PEP 492 coroutines; add

Modules/unicodedata.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -921,10 +921,11 @@ is_unified_ideograph(Py_UCS4 code)
921921
{
922922
return
923923
(0x3400 <= code && code <= 0x4DB5) || /* CJK Ideograph Extension A */
924-
(0x4E00 <= code && code <= 0x9FCC) || /* CJK Ideograph */
924+
(0x4E00 <= code && code <= 0x9FD5) || /* CJK Ideograph */
925925
(0x20000 <= code && code <= 0x2A6D6) || /* CJK Ideograph Extension B */
926926
(0x2A700 <= code && code <= 0x2B734) || /* CJK Ideograph Extension C */
927-
(0x2B740 <= code && code <= 0x2B81D); /* CJK Ideograph Extension D */
927+
(0x2B740 <= code && code <= 0x2B81D) || /* CJK Ideograph Extension D */
928+
(0x2B820 <= code && code <= 0x2CEA1); /* CJK Ideograph Extension E */
928929
}
929930

930931
/* macros used to determine if the given code point is in the PUA range that

Modules/unicodedata_db.h

+1,389-1,311
Large diffs are not rendered by default.

Modules/unicodename_db.h

+22,024-20,540
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)